Skip to content

Caching directories between runs of a task #230

@vito

Description

@vito

Common use case: fetching dependencies, syncing BOSH blobs, etc.

Caching the directories that these fetch into and update would dramatically speed up a bunch of builds. Our own ATC JS build spends 99% of its time just downloading npm packages.

Proposal:

Add a cache field to task configs, like so:

---
platform: linux

inputs:
- name: my-release

cache:
- path: my-release/.blobs

run:
  path: my-release/ci/scripts/create-release

Then, given I have a pipeline like so:

jobs:
- name: make-release
  plan:
  - get: my-release
  - task: create-release
    file: my-release/ci/create-release.yml

This would cache the directory my-release/.blobs between runs of that specific task in its job's build plan. So, the cache lookup key would be something like team-id+pipeline-id+job-name+task-name.

Notes:

  • This should also have the guarantee that two concurrent builds of the same job do not pollute each others' caches. There should be some sort of copy-on-write semantics, such that each job gets its own copy of the cache (initially empty), and at the end all other caches are marked "stale" and are set to expire.
  • Assumes tools will be durable to the directory being initially present + empty on the initial cache run. I think this should be fine. Without this it'll be very annoying to orchestrate.
  • The caching is for purely ephemeral data, so it doesn't sacrifice Concourse's "not being a source of truth" principle.
  • Has the same cache warming semantics as gets, i.e. it may take a bit for the cache to warm across the workers; it does not influence worker placement.
  • Has no effect on one-off builds, as there is not enough information to scope/correlate the caches (compared to a job build).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions