Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never re-use check containers #3079

Open
vito opened this issue Jan 17, 2019 · 11 comments
Open

Never re-use check containers #3079

vito opened this issue Jan 17, 2019 · 11 comments
Labels
domain/core Includes scope pertaining to core Concourse functionality (i.e. pipelines, workflows). domain/runtime Includes scope pertaining to how Concourse actually "runs" things (i.e. containers, scheduling). epic
Projects

Comments

@vito
Copy link
Member

vito commented Jan 17, 2019

What challenge are you facing?

Check containers are re-used for up to an hour (by default) so that resources can re-use cached state that they fetched (i.e. so a fresh git clone doesn't have to run all the time).

We limit their re-use to an hour so that they get rebalanced across workers periodically. However that means every hour forces a fresh git clone, which makes people want to extend their lifetime further (#2988).

This also means that for a bunch of resources, there will always be a bunch of containers across all the workers, leading to desperate measures like #3054 and in general a ton of questions because people are almost always surprised to see so many check containers.

We also are super careful to lock and make sure only one ATC is using a container at a time.

And we're careful about how these containers are re-used (e.g. across teams).

What would make this better?

Let's not do that!

Can we instead create a one-time-use container every time we need to run a check, and use a volume to explicitly track the cached state between runs?

This volume could work like task caches - there would be one persisted on each worker that the check runs on. A copy of it would be mounted to the container, and that copy would become the new cache for subsequent runs.

Open questions:

  • How quickly can containers be created/deleted? There should be some way to make this fast as Diego and Garden teams benchmarked the hell out of it back in the day. But we should check and make sure, because this will result in a lot more creations/deletes.
  • How do we identify the GC/ownership of these containers now that they're one-off? We don't want them deleted while they're being used, but we do want them GC'd if there's some way to tell it's no longer in use.
  • How do we handle the lifecycle of the cache volumes?

Are you interested in implementing this yourself?

Yeah, actually, it sounds fun.

@vito vito added this to Icebox in Runtime via automation Jan 17, 2019
@jwntrs
Copy link
Contributor

jwntrs commented Jan 17, 2019

I really like this idea. However I think this might be a good idea to rethink the way we model volume uses since this change will likely involve yet another another foreign key on the volumes table.

At the moment, a volume (sometimes) belongs to a team, and defines its use through a foreign key reference (i.e. resource cache, task cache, etc). I see volumes as a lower level concept that shouldn't contain all this information. I think we should invert this relationship, and have the higher level concepts track both the team (where applicable) and volume foreign keys. This will make adding new volume uses much lighter weight, and let us query for the higher level objects, and join on the lower level objects.

@vito thoughts?

@vito
Copy link
Member Author

vito commented Jan 18, 2019

@pivotal-jwinters I wonder if the idea I threw around in #3025 (comment) meshes well with that.

At least for now I would rather do this within the bounds of the current architecture, just because there seems to be a clear enough path. But it helps to discuss this stuff early on for potential post-5.x/6.x architecture changes!

@topherbullock
Copy link
Member

Spiking on some wild wild thoughts around this:

  • Adding a field to the containers table to denote containers as "ignored" by the Container Collector, so that we can defer their lifecycle to the runtime ( via setting Garden container GraceTime )
  • Could we have a resource checking 'agent' container per worker and create "peas" ( container processes supplying their own rootfs (but sharing namespaces/cgroups with a sandbox container) ) for each check?
    • Does using 'peas' in Garden warm up any part of the creation of a container?
    • Are there other gains we could realize by sharing the container namespaces but running separate processes for checks??
    • Do peas support different bind-mounts?

@vito
Copy link
Member Author

vito commented Jan 22, 2019

Adding this to the Core project as well, just because a lot of the concerns/gotchas regarding global resources would be greatly mitigated if we were to do this. Implementation-wise, there are probably more runtimey concerns though (especially regarding the cache volume lifecycle and container scheduling).

Maybe it can be divided into separate steps that can be done by each team?

Step 1 would be to eliminate check sessions entirely and just create a new check container every time. This would obviously be a huge regression for big git repos and we shouldn't ship with only this much implemented, but maybe we can start to see how this simplifies things. We would still need to figure out a new lifecycle for these containers but it might be pretty simple.

Step 2 would be to measure container creation time to see how this change may impact worker load, and then see what we can do to mitigate any additional slowness. This testing should be done with cheap resources like time so that we're not measuring the known regression that Step 1 introduces.

Step 3 would then be to figure out the volume caching lifecycle, and how to schedule containers to best leverage it. I think this probably takes more thinking.

Step 1 could probably be easily done by Core, and steps 2 and 3 are probably better left to Runtime.

@topherbullock wdyt?

@topherbullock
Copy link
Member

topherbullock commented Jan 22, 2019 via email

@vito
Copy link
Member Author

vito commented Feb 7, 2019

Did a bit more thinking on this with @clarafu and @pivotal-jwinters - we like the idea of creating a one-off container and not bothering with a complicated lifecycle, and allowing Garden's GraceTime to clean it up if we fail to .Destroy it in the happy path.

One gotcha however is that the container also has an associated volume for its copy-on-write rootfs. BaggageClaim supports TTLs but it won't keep it alive for you as it has no idea that it's "in use". So we tried thinking of ways to avoid defining the lifecycle for this volume, since it'd be just about as complicated as it would be for containers.

One idea we had was to write a BaggageClaim image plugin for Guardian. Image plugins are executables that are configured in Guardian like so:

https://github.com/cloudfoundry/guardian/blob/6571dc2d8af81f200eef97bf7ea107437dbb9576/guardiancmd/command.go#L228-L236

We could ship Concourse with a BaggageClaim image plugin and auto-configure it when starting gdn via the binary. This way we can rely on Garden to destroy the image when the container goes away. I think this would actually be pretty easy, especially now that we have more control over how Guardian is configured and can auto-configure those flags. We could use this for all containers, and have one less thing to manage with GC.

For phase 1 (Core spike), I think we can just leave the volumes around forever for now, or give them some arbitrary TTL.

@clarafu
Copy link
Contributor

clarafu commented Feb 21, 2019

For the core spike, TODO list:

  • Write CreateCheckContainer which will create a one off container without saving it to the db and configure Garden's GraceTime
    • Pick worker with CheckContainerPlacementStrategy
    • Create the container's image through GetImage
    • Use image to create the container
    • Attach volumes to the container
  • Write Destroy check container
  • Add Create/Destroy check container and remove RCCS in Radar
    • Remove resource_config_check_session_id in containers table
    • Remove resource_config_check_sessions table
    • Remove RCCS code in ContainerOwner

@clarafu
Copy link
Contributor

clarafu commented Feb 27, 2019

The current approach to this issue:
Step 1. Runtime: write a new ephemeral check container creation and destroy method.

Step 2. Core: remove resource config check sessions and all it's associates within the database and code base.

Step 3. Runtime: measure container creation time with the new changes of ephemeral check containers to see how this change may impact worker load in a real deployment, and then see what we can do to mitigate any additional slowness. This testing should be done with cheap resources like time so that we're not measuring the known regression that Step 1 introduces.

Step 4. Runtime: would then be to figure out the volume caching lifecycle, and how to schedule containers to best leverage it. I think this probably takes more thinking.

@vito
Copy link
Member Author

vito commented Mar 4, 2019

A goal of this is to remove the lock around checking, though currently this also locks around saving the versions. It looks like that code is idempotent but we should probably verify this by hand to make sure no funny business happens. Since it's never been exercised otherwise.

@cirocosta
Copy link
Member

Maybe related:

I was looking at the "locks held" metric when a particular pipeline (strabo) kicked off, and it's quite interesting how that looks like:

screen shot 2019-03-03 at 8 57 58 pm

@ddadlani
Copy link
Contributor

Work on this has resumed in #3424

@ddadlani ddadlani removed this from Backlog in Runtime Jul 30, 2019
@vito vito added this to To do in Epic #3079 via automation Feb 18, 2020
@vito vito moved this from To do to End goals in Epic #3079 Feb 18, 2020
@vito vito added domain/core Includes scope pertaining to core Concourse functionality (i.e. pipelines, workflows). domain/runtime Includes scope pertaining to how Concourse actually "runs" things (i.e. containers, scheduling). epic and removed efficiency enhancement labels Feb 18, 2020
@vito vito moved this from End goals to Current iteration in Epic #3079 Feb 21, 2020
@vito vito added this to To do in Roadmap via automation May 8, 2020
@vito vito moved this from To do to Icebox in Roadmap May 8, 2020
@vito vito removed this from Current iteration in Epic #3079 May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain/core Includes scope pertaining to core Concourse functionality (i.e. pipelines, workflows). domain/runtime Includes scope pertaining to how Concourse actually "runs" things (i.e. containers, scheduling). epic
Projects
Roadmap
  
Icebox
Development

No branches or pull requests

6 participants