Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: resources can declare a time-to-live if a certain version becomes "stale" but re-running in/get would refresh the cache #622

Closed
cjcjameson opened this issue Sep 1, 2016 · 4 comments

Comments

@cjcjameson
Copy link
Contributor

We suggest a new field in resource declaration, cache_refresh_interval:

- name: release
  type: s3
  cache_refresh_interval: <time>
  source:
    bucket: releases
    private: true
    regexp: directory_on_s3/release-(.*).tgz
    access_key_id: ACCESS-KEY
    secret_access_key: SECRET

The current behavior of baggage claim is to keep the latest version indefinitely on a worker. This value would be set on baggage claim to provide a general, back-up cache invalidation window.

Our situation that prompts this:
The S3 Resource has a private: true configuration. This means that when in runs, it generates a signed URL. The signed-ness of the URL is only valid for a certain window (currently 86400 seconds / 1 day in s3-resource; AWS caps the duration at 1 week).

Let's say there's a daily build. But the build's unit-tests have been failing for a few days, so it's been a while since we last published to s3. Then someone triggers a downstream build, which fetches the latest s3 resource, not to use the file itself, but to use the signed url. At that point, the signed URL will have expired if the downstream job's tasks get assigned to the worker that originally ran the in on the first pass of the pipeline for this version of the resource. On the other hand, if the task is run on a worker whose baggage claim has not yet ined this version of the resource, then it will generate a new signed URL at that point, and have a fresh 24 hours on the clock to use it.

We have considered other options that are S3-specific -- should we think harder there?

  • just use a longer TTL on S3 Signed URLs: if the artifact is a week old, maybe no one will want to use it anyway.
  • Reconfigure how the S3 resource provides you with a signed URL: make that part of the put action, to ensure that it gets run every time
  • Don't use the private: true feature; calculate a signed S3 URL ourselves in the body of our task
  • Have our task have some retry logic if the URL has expired; punt to manual calculation
  • Somehow contort the versioning semantic of S3 resources to include the signature / authentication (!!NO DEAR GOD!!)
  • Write a separate resource that fetches signed S3 urls, and that's all it does. It would add weight to pipelines, but it's strange that the S3 resource in serves up some content (a special URL) that's not under the umbrella of the versioning semantics.

Other options for our particular case:

  • Provide additional alerting in the body of our task's STDOUT to help people for the occasional times when this might come up.
  • Make sure that our iterations get faster so that this doesn't come up!

Alternative implementation suggestion: add a property to baggageclaim itself, which provides a global cache-invalidation time-to-live. Pretty drastic, but maybe it's reasonable?

Downside of this feature: abuse by people who are scared of stale caches, when actually the versioning semantic of their resource is strong & complete such that this wouldn't be necessary.

Open question: how does this apply to other resource types? Would it be generally applicable?

Sometimes resources are perfectly repeatable and strongly versioned. For everything else, there's occasional cache invalidation. 😃

@vito
Copy link
Member

vito commented Sep 3, 2016

Hmm. Having to set cache expiration on the resource feels like a leaky abstraction. Ideally there should be no configuration required for caching to work.

get calls are meant to be deterministic. Creating a signed URL involves a side-effect and is nondeterministic. So it shouldn't be done with get. I also don't generally like the idea of resources controlling their own caching. Resources should all fit the same mold, for more reasons than just caching.

How about something like:

- get: my-bucket-thing
- put: my-bucket-thing
  params: {sign: my-bucket-thing, expires_in: 24h}

The put would yield a version like {path: foo, expires_at: <timestamp>} (or version_id if using versioned_file). The get resulting from the put would generate the signed URL.

@concourse-bot
Copy link
Collaborator

Hi there!

We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.

The current status is as follows:

  • #129750293 Proposal: resources can declare a time-to-live if a certain version becomes "stale" but re-running in/get would refresh the cache

This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.

@cjcjameson
Copy link
Contributor Author

Yep, that makes sense to a 80th percentile Concourse user and above, and is probably the best possible use of the current abstractions. I've had a lot of difficulty explaining to newcomers why "get-put-task" is a thing with the Pool resource, so I'm hesitant because it affects the UX.

No-action is OK; the most useful thing for us to work on might just be parameterizing the timeout value on the S3 resource so that you can configure how long the signed URL stays for.

@vito
Copy link
Member

vito commented Oct 8, 2016

Closing this in favor of concourse/s3-resource#53

@vito vito closed this as completed Oct 8, 2016
vito added a commit that referenced this issue Mar 28, 2018
Submodule src/github.com/miekg/dns 0c23f842..22cb769f:
  > use a local variable to calculate rtt (#656)
  > [msg] Add UnpackRRWithHeader (#643)
  > fix: panicing on options parsing. (#642)
  > Test that Shutdown does not surface closed errors (#624)
  > Release 1.0.4
  > Fix for CVE-2017-15133 TCP DOS (#631)
  > Add dnscrypt-proxy and rpdns to the list of users (#628)
  > Fix TCP Shutdown 'use of closed network connection' (#623)
  > Release: plain push is also needed
  > Release 1.0.3
  > Ignore malformed UDP datagrams without headers (#622)
  > Fixes #613 & #619 (#621)
  > Revert "Fixes #613 (#617)" (#620)
  > ClassANY: don't convert CLASS255 to ANY (#618)
  > Fixes #613 (#617)
  > test: remove net tests (#616)
  > Release 1.0.1
  > Don't use untrusted lengths from Header to pre-allocate (#610)
  > Unpack: return header (#608)
  > Fix issue #605 (#606)
  > relative include: now tested! (#602)
  > Allow $INCLUDE to reference relative file (#598)
  > Add size-hex: modifier for len() (#599)
  > Some linter fixes from Go report card. (#601)
  > Add codecov yaml to not fail the build (#600)
  > Lint: use ignore-this on generated files (#596)
  > Add semver (#595)
  > Use and vendor golang.org/x/net/ipv4 and golang.org/x/net/ipv6 (#594)
  > EDNS0 client subnet: drop draft option (#589)
  > Add support for TKEY RRs (#567)
  > Modified clientconfig to match ndots0 (#590)
  > 458+dep (#591)
  > Include missing types when for DNSSEC sig verify (#587)
  > Implement CSYNC (#585)
  > Remove idn/ (#584)
  > Spelling fixes (#583)
  > Add fuzzing framework (#580)
  > Fuzzing the text parser: a few fixes (#579)
  > Test: rework concurrentExchange (#576)
  > TSIG name must be presented in canonical form (#574)
  > cleanup: remove debug.Printf from scanner (#573)
  > txt parser: fix goroutine leak (#570)
  > Server: drop inflight waitgroup (#561)
  > Tests: add ListenAndServe tests (#562)
  > Revert "server: drop graceful handling (#546)" (#560)
  > server: drop graceful handling (#546)
  > util.TrimDomainName() fails when origin doesn't end in dot (#559)
  > Tests updates (#556)
  > readme: small bunch updates (#554)
  > golint fixes (#553)
  > Add goreportcard badge (#552)
  > codecov: add shield to README (#551)
  > codecov: add test coverage (#550)
  > Cleanup: gofmt -w -s *.go (#548)
  > Test: remove all Logf/Log (#547)
  > Make compress generate output stable and edns.go formatting (#542)
  > Document SetTsig() needs to be final RRset change (#544)
  > Change quilt.io link in the README to kelda.io (#539)
  > Add Apex in Users (#538)
  > Optimize CompareDomainName (#535)
  > Allow parsing resolv.conf from io.Reader (#532)
  > Fix EDNS0_SUBNET compatibility with dig (#530)
  > Fix EDNS Extended RCODE (#517)
  > Fix IXFR may end prematurely (#512) (#507)
  > Simplify compressed length code (#527)
  > Go version bump in CI (#534)
  > Fix DialTimeout always using udp (#526)
  > Redesigned Client API around net.Dialer (#511)
  > Correctly set the Source IP to the received Destination IP (#524)
  > Correctly parse omitted TTLs and relative domains (#513)
  > Implement EDNS(0) Padding option code (#520)
  > scan: Fix $INCLUDE arguments to parseZone (#508)
  > Fix TSIG bug releated to ID substitution (#504)
  > variable shadowing of token (#503)
  > document RCodes from the IANA registry (#499)
  > Fix ignored err variables. (#498)
  > Add ExchangeContext methods. (#497)
  > xfr: return sane error when !RcodeSuccess (#496)
  > Added dnsperf (#494)
  > Added new user (#495)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants