Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upEquivalent resources defined across pipelines and teams should only correspond to a single version history #2386
Comments
vito
added
epic/spaces
enhancement
labels
Jul 16, 2018
vito
added this to Icebox
in Core
via automation
Jul 16, 2018
vito
moved this from Icebox
to Backlog
in Core
Jul 16, 2018
This comment has been minimized.
This comment has been minimized.
I really like this. Reducing the redundant check containers for a single resource would make me feel like a better netizen.
Is the idea here that even though new versions would show up, the check_interval would be respected for triggering new builds? I can see someone using different check_intervals across pipelines to create varying pipeline behaviours, and that would be silently lost here. One thing I was thinking of recently, and which I discussed with @xoebus in the past, is separating resource config from pipeline config. If multiple pipelines share the same resource config, why do I have to copy the configuration? This can become a chore when managing the same resource across many pipelines. It would be great if I could just configure resources kind of like a cloud_config and then just refer to them in jobs in pipelines by name. Then a check_every would only be specified once for a particular configuration, and there'd be no surprise about which is the canonical resource definition. |
This comment has been minimized.
This comment has been minimized.
I really hope people aren't relying on that.
Interesting but sounds like a separate issue. I don't think we're ready to tackle that as part of this - it kind of goes against the current approach which is to have pipelines be entirely self-contained. (It's a trade-off.) |
This comment has been minimized.
This comment has been minimized.
Hmm, one scary thing about this approach is that Similarly, who should own the Here's one approach: we could make the containers global, and then take the container out of commission once it's hijacked. This would still allow anyone to debug a failing |
This comment has been minimized.
This comment has been minimized.
Oof. This is actually really difficult with credential rotation (especially with automated rotation via credential managers). If someone rotates their password to Docker Hub or their private key for their Git repo, that shouldn't reset the entire history of the resource. But with versions tied directly to a resource config, it'll be a different resource config any time the credentials change. At the same time, we need the resource config to be fully resolved, because there should be some sort of authentication check before we share all the versions and especially the cached bits (which are also identified in part by resource config). Should we separate credentials from config? What would that look like? Would it require a change to the resource interface? I'm thinking a simple 'check access' call - maybe this could also be for validation prior to use? Back to the drawing board... |
This comment has been minimized.
This comment has been minimized.
Yet another concern: Maybe sharing the version history globally just isn't worth it. :/ The initial concern was regarding database storage now that I still think there are good things that can come out of this, but perhaps they can be prioritized differently:
|
This comment has been minimized.
This comment has been minimized.
What about having a single "real" check container that does the checking on the minimum configured interval. Then there are "fake" check containers that have their actual check_interval configured, and which are the containers you can hijack. They ask the real check container (which itself speaks "check") for new versions. Think of the "real" one as being Concourse's internal bookkeeping. |
This comment has been minimized.
This comment has been minimized.
@krishicks Yeah, that'd help make hijacking safer. (Though I'm not sure how the 'fake' container would delegate to the 'real' one). We could also do something like provision a fresh 'check' container when they try to hijack it. We could even run a one-off |
This comment has been minimized.
This comment has been minimized.
To balance out some of the pessimism in my recent comments, there's one big benefit to consolidating version history: this could dramatically reduce the load on external services caused by constant |
This comment has been minimized.
This comment has been minimized.
One way to mitigate the concerns with the Looking at the list of supported resource types, we'd benefit from shared history for the following:
The following resource types however wouldn't really benefit much from it at all, for different reasons:
So, maybe it could be an opt-in behavior via the All other behavior would be the same (i.e. however we handle |
This comment has been minimized.
This comment has been minimized.
Having checking performed globally across teams would likely make it easier to fix cases like #2010 - right now we have to make the RCCS, which is global, and then the WRCCS, which is per-team, but they're in a different transaction so there's a chance the RCCS goes away while we're trying to create the WRCCS. (The fairly long grace period before the RCCS actually goes away is supposed to make this unlikely, but there must be something else going on.) |
Jul 30, 2018
This was referenced
vito
added
the
epic/resources-v2
label
Jul 31, 2018
vito
added
the
efficiency
label
Aug 1, 2018
This comment has been minimized.
This comment has been minimized.
snegivulcan
commented
Aug 1, 2018
We are also very much interested in effectively reducing the number of times resource check ( docker_image in our case ) is performed. In our case we are running out of GCR (Google Cloud Repository ) new version check API quota, which is 3 per second per IP. We have a concourse cluster with 3 worker nodes, with a total of 24 pipelines. Each pipeline polls for 6 docker_image resources on avg and we have about 2 dozens docker_image resources in total. |
mhuangpivotal
moved this from Backlog
to In Flight
in Core
Aug 2, 2018
mhuangpivotal
added
the
bump/major
label
Aug 2, 2018
This comment has been minimized.
This comment has been minimized.
Brief notes on some initial changes we need to make during the spike. Migrations
(later)
Radar
note.: two resources that reference the same resource_config might have different check_every intervals - we need to either pick the lower or the bigger |
This comment has been minimized.
This comment has been minimized.
Side refactor: During the spike, we realized that the "UsedResourceConfig" object needs to have a connection and a lockFactory on it for us to do locking around the resource config. This made us revisit the whole structure around creating resource configs and the purpose behind these used resource configs and we have decided to do a refactor to make more clean abstractions between these old "ResourceConfigs" and "UsedResourceConfigs". "ResourceConfigs" objects are being used to construct a resource config in the database, which means that it really isn't a resource config itself but more of fields we need in order to construct one. This will be renamed to "ResourceConfigDescriptor". "UsedResourceConfigs" are used for when there is a "use" for a resource config in the database but it only really made sense when we still had the resource config uses table (which was removed). This will be renamed to "ResourceConfig". |
added a commit
to concourse/atc
that referenced
this issue
Aug 14, 2018
added a commit
that referenced
this issue
Aug 14, 2018
This comment has been minimized.
This comment has been minimized.
@marco-m Hmm I don't think this will actually help that case at all. Credential resolution is done prior to even determining whether to run the |
This comment has been minimized.
This comment has been minimized.
marco-m
commented
Oct 4, 2018
@vito so I was too optimistic! ;-) Time to go re-understand all this. |
vito commentedJul 16, 2018
•
edited
What challenge are you facing?
Today, pipeline resource versions are collected in to a
versioned_resources
table. This table was predated the "life" epic (#629). It contains the following schema:It points to
resource_id
, making this per-pipeline-resource. This means that multiple pipelines with the same resource configs will be redundantly collecting the same version/metadata information.A Modest Proposal
To be honest, this isn't a huge deal right now, aside from wasted database space and redundant checking. However if we make the relationship between a pipeline's resources and the abstract version history a bit tighter, there are actually a few benefits:
check
ing required across pipelines for equivalent resources.Implementation Notes
Enabling/Disabling versions
Enabling/disabling versions should remain scoped to pipeline resources, obviously. This can be done via a join table (
pipeline_resource_config_versions
or some such).Distinct check intervals
Now that we only check once per resource config, there's a little gotcha. Different pipelines can have varying
check_every
settings.Here's one idea: record
last_checked
on the resource config, and have each pipeline's radar component just check if thelast_checked
is >= their interval. So, we'll check at the fastest defined frequency. Pipelines with longercheck_intervals
will have versions show up more quickly than expected, but that really shouldn't matter.Pausing pipeline resources
Currently, users pause pipeline resources with the intended effect that no new versions are collected and used for later builds. This is really awkward when other pipelines result in checking the config anyway.
We could still support today's behavior by "faking it" and having pausing a resource really just 'pin' it to whatever the version was at the time. But actually, that sounds a lot like #1288. Maybe we should just implement that instead, and remove the resource pausing functionality?