-
-
Notifications
You must be signed in to change notification settings - Fork 874
Closed
Description
Common use case: fetching dependencies, syncing BOSH blobs, etc.
Caching the directories that these fetch into and update would dramatically speed up a bunch of builds. Our own ATC JS build spends 99% of its time just downloading npm packages.
Proposal:
Add a cache field to task configs, like so:
---
platform: linux
inputs:
- name: my-release
cache:
- path: my-release/.blobs
run:
path: my-release/ci/scripts/create-releaseThen, given I have a pipeline like so:
jobs:
- name: make-release
plan:
- get: my-release
- task: create-release
file: my-release/ci/create-release.ymlThis would cache the directory my-release/.blobs between runs of that specific task in its job's build plan. So, the cache lookup key would be something like team-id+pipeline-id+job-name+task-name.
Notes:
- This should also have the guarantee that two concurrent builds of the same job do not pollute each others' caches. There should be some sort of copy-on-write semantics, such that each job gets its own copy of the cache (initially empty), and at the end all other caches are marked "stale" and are set to expire.
- Assumes tools will be durable to the directory being initially present + empty on the initial cache run. I think this should be fine. Without this it'll be very annoying to orchestrate.
- The caching is for purely ephemeral data, so it doesn't sacrifice Concourse's "not being a source of truth" principle.
- Has the same cache warming semantics as
gets, i.e. it may take a bit for the cache to warm across the workers; it does not influence worker placement. - Has no effect on one-off builds, as there is not enough information to scope/correlate the caches (compared to a job build).
waits, dwb, dsyer, monkey1016, simonvanderveldt and 56 more