New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More explicit worker, container, volume lifecycles #629

Closed
vito opened this Issue Sep 2, 2016 · 26 comments

Comments

Projects
None yet
10 participants
@vito
Member

vito commented Sep 2, 2016

Goals

  • A worker should be able to remain registered even in a hostile network. Today's heartbeating approach results in ambiguity; it can either be gone because the network blipped or gone because it's never coming back. Workers should only go away by being explicitly unregistered.
  • A worker should remain registered when it's being updated in-place. Today if it's gone for long enough it'll become unregistered, and in many cases a worker missing will lead to failed builds.
  • A worker that is physically going away should enter a "draining" state, where we wait for all in-flight work to complete and stop scheduling new work on it.
  • Reduce network and database thrashing caused by heartbeating. This isn't going to scale very well as the number of containers and volumes increases.
  • Explicitly .Destroy volumes and containers so that we can see any errors that may result in leaking containers and volumes. Today a worker can be failing to reap containers or volumes, which can leak resources. Calling .Destroy increases visibility.
  • Define an explicit lifecycle for containers and volumes such that we can write a safe garbage collector.

Proposal

Workers would have the following states:

  • running - the "normal" state
  • landing - stop scheduling new work, wait for existing workloads to finish, then unregister
  • stalled - stop scheduling new work; the worker is temporarily unavailable but may come back with the same state as before

The following transitions would occur:

  • any -> running: a worker has joined the cluster
  • running -> landing: a worker is going to safely leave the cluster
  • landing -> (gone): all work scheduled on a worker has finished, and the worker has been removed automatically
  • running -> stalled: a worker is going to be updated in-place, or has failed to heartbeat, so expect connectivity issues
  • stalled -> running: a worker has come back and is in normal operations

The next question is how to remove heartbeating. Heartbeating basically gives us garbage collection "for free" - as long as nothing's using a container or volume, it'll go away on its own accord. So to replace that we'll need explicit reaping.

Containers and volumes would be correlated to wherever they're being used. For example, containers launched by a build would be related via a join table. This way we can know that if the build is no longer running, its containers are no longer needed. Containers used for checking resources would be kept so long as they haven't reached their "best if used by" date.

The other issue is that we don't want to destroy a container that's being hijacked. Currently hijacking heartbeats to the container just like anything else, so it'll stick around as long as the user's using it. Without heartbeating, we'll just have to mark the container as "hijacked", and ensure that for "hijacked" containers we set a TTL, rather than calling .Destroy. This way Garden's 'grace period' will take effect, and the countdown will begin once the user has left the container.

@vito vito added the bug label Sep 2, 2016

@concourse-bot

This comment has been minimized.

Show comment
Hide comment
@concourse-bot

concourse-bot Sep 2, 2016

Hi there!

We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.

The current status is as follows:

  • #129726011 Volumes that belonged to a container that no longer exists should be destroyed on the worker and reaped from the database
  • #129725995 Containers that belong to a build that is no longer running should be destroyed from the worker and reaped from the database...unless the build they belong to both 1. failed and 2. is the latest build of the job. (Today's behavior of the containerkeepaliver.)
  • #129726203 When a worker fails to heartbeat, it should enter stalled state, rather than be removed, with its connection info cleared out
  • #129726593 A BOSH-deployed worker that is going away should enter retiring state, and only leave once all work is completed
  • #131589631 fly workers should show the state of each worker
  • #129729921 Workers can be explicitly removed via the API
  • #129726475 A worker being BOSH updated in-place should drain its builds, but retain its volumes and containers when it returns
  • #129726699 A binary-deployed worker that is being updated in-place should enter the landing state, and only leave once all work is completed
  • #136711963 A binary-deployed worker that is going away should enter the retiring state, and only leave once all work is completed
  • #129726933 Cache volumes that are of older versions of a resource that are no longer in use should be destroyed from the worker and reaped from the database
  • #129727245 Containers that were used for resource checking should be destroyed on the worker and reaped from the database upon reaching their 'best if used by' date
  • #129726063 Containers that belonged to a worker that no longer exists should be reaped from the database
  • #129726083 Volumes that belonged to a worker that no longer exists should be reaped from the database
  • #129726125 A hijacked container should not be destroyed while the user is in it
  • #129726155 A hijacked container whose actual container on the worker is gone should be reaped from the database
  • #129725921 Containers should have indefinite TTLs
  • #135570549 Volumes should have indefinite TTLs
  • #129725981 Remove TTL/validity columns from fly volumes and fly containers
  • #140165187 reopened: More explicit worker, container, volume lifecycles

This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.

concourse-bot commented Sep 2, 2016

Hi there!

We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.

The current status is as follows:

  • #129726011 Volumes that belonged to a container that no longer exists should be destroyed on the worker and reaped from the database
  • #129725995 Containers that belong to a build that is no longer running should be destroyed from the worker and reaped from the database...unless the build they belong to both 1. failed and 2. is the latest build of the job. (Today's behavior of the containerkeepaliver.)
  • #129726203 When a worker fails to heartbeat, it should enter stalled state, rather than be removed, with its connection info cleared out
  • #129726593 A BOSH-deployed worker that is going away should enter retiring state, and only leave once all work is completed
  • #131589631 fly workers should show the state of each worker
  • #129729921 Workers can be explicitly removed via the API
  • #129726475 A worker being BOSH updated in-place should drain its builds, but retain its volumes and containers when it returns
  • #129726699 A binary-deployed worker that is being updated in-place should enter the landing state, and only leave once all work is completed
  • #136711963 A binary-deployed worker that is going away should enter the retiring state, and only leave once all work is completed
  • #129726933 Cache volumes that are of older versions of a resource that are no longer in use should be destroyed from the worker and reaped from the database
  • #129727245 Containers that were used for resource checking should be destroyed on the worker and reaped from the database upon reaching their 'best if used by' date
  • #129726063 Containers that belonged to a worker that no longer exists should be reaped from the database
  • #129726083 Volumes that belonged to a worker that no longer exists should be reaped from the database
  • #129726125 A hijacked container should not be destroyed while the user is in it
  • #129726155 A hijacked container whose actual container on the worker is gone should be reaped from the database
  • #129725921 Containers should have indefinite TTLs
  • #135570549 Volumes should have indefinite TTLs
  • #129725981 Remove TTL/validity columns from fly volumes and fly containers
  • #140165187 reopened: More explicit worker, container, volume lifecycles

This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.

@vito

This comment has been minimized.

Show comment
Hide comment
@vito

vito Sep 2, 2016

Member

Another problem this would solve is that volumes and containers in the database cannot be safely removed when their worker is gone, because we don't know if the worker will be coming back. This removes that ambiguity, and would allow us to garbage-collect any containers and volumes on record that refer to a worker that does not exist.

Member

vito commented Sep 2, 2016

Another problem this would solve is that volumes and containers in the database cannot be safely removed when their worker is gone, because we don't know if the worker will be coming back. This removes that ambiguity, and would allow us to garbage-collect any containers and volumes on record that refer to a worker that does not exist.

@concourse-bot concourse-bot added scheduled and removed unscheduled labels Sep 2, 2016

@vito

This comment has been minimized.

Show comment
Hide comment
@vito

vito Sep 2, 2016

Member

Oh geez this will make debug logs so much less chatty too.

Member

vito commented Sep 2, 2016

Oh geez this will make debug logs so much less chatty too.

@jchesterpivotal

This comment has been minimized.

Show comment
Hide comment
@jchesterpivotal

jchesterpivotal Oct 5, 2016

Contributor

I'm a big fan of state machines for comprehensibility and debuggability.

One thing I notice is that the hardest case is around intercepted or intercept-ready containers (ie failed builds).

I don't know if there's any thinking about relocating containers, but that'd make this easier. You'd be in a position to drain to existing works.

Though, now that I write this, I smell a juicy cascading failure condition for a group of workers operating close to maximum load.

Contributor

jchesterpivotal commented Oct 5, 2016

I'm a big fan of state machines for comprehensibility and debuggability.

One thing I notice is that the hardest case is around intercepted or intercept-ready containers (ie failed builds).

I don't know if there's any thinking about relocating containers, but that'd make this easier. You'd be in a position to drain to existing works.

Though, now that I write this, I smell a juicy cascading failure condition for a group of workers operating close to maximum load.

@vito

This comment has been minimized.

Show comment
Hide comment
@vito

vito Oct 5, 2016

Member

In principle relocating should work, given that:

  1. all gets are repeatable
  2. all puts are idempotent
  3. all tasks have no side-effects

1 and 2 are true, and 3 we try to convince people of. But yeah, the cascading failure mode is a bit worrying. I suppose there could be a limit.

We're already planning on implementing draining via the landing state, but it means we'll have to wait for existing builds to finish, rather than evacuating them. Evacuating could be an interesting alternative mode later on.

Member

vito commented Oct 5, 2016

In principle relocating should work, given that:

  1. all gets are repeatable
  2. all puts are idempotent
  3. all tasks have no side-effects

1 and 2 are true, and 3 we try to convince people of. But yeah, the cascading failure mode is a bit worrying. I suppose there could be a limit.

We're already planning on implementing draining via the landing state, but it means we'll have to wait for existing builds to finish, rather than evacuating them. Evacuating could be an interesting alternative mode later on.

@jchesterpivotal

This comment has been minimized.

Show comment
Hide comment
@jchesterpivotal

jchesterpivotal Oct 5, 2016

Contributor

Well relying on folks to produce idempotent tasks is a bit optimistic.

A while back I did one of my hand-waving suggestion for us to look at CRIU checkpoints as a way to relocate containers. I think that'd be a Garden responsibility, though.

Contributor

jchesterpivotal commented Oct 5, 2016

Well relying on folks to produce idempotent tasks is a bit optimistic.

A while back I did one of my hand-waving suggestion for us to look at CRIU checkpoints as a way to relocate containers. I think that'd be a Garden responsibility, though.

@vito

This comment has been minimized.

Show comment
Hide comment
@vito

vito Oct 5, 2016

Member

@jchesterpivotal Yep - which is why we're not starting with that. But it would be interesting to have an evacuatable: true or something on tasks. Or flip the default 'cause we're opinionated like that.

Member

vito commented Oct 5, 2016

@jchesterpivotal Yep - which is why we're not starting with that. But it would be interesting to have an evacuatable: true or something on tasks. Or flip the default 'cause we're opinionated like that.

@vito

This comment has been minimized.

Show comment
Hide comment
@vito

vito Oct 6, 2016

Member

Here's the plan for landing, explaining in terms of the binaries but this applies to BOSH as well:

concourse land-worker <worker-name> -> runs SSH command against a TSA which transitions it to landing via ATC

TSA then polls for the worker to be gone while it's in landing state, and will then exit any forward-worker or register-worker processes with exit status 0. This will then result in concourse worker exiting all on its own.

Member

vito commented Oct 6, 2016

Here's the plan for landing, explaining in terms of the binaries but this applies to BOSH as well:

concourse land-worker <worker-name> -> runs SSH command against a TSA which transitions it to landing via ATC

TSA then polls for the worker to be gone while it's in landing state, and will then exit any forward-worker or register-worker processes with exit status 0. This will then result in concourse worker exiting all on its own.

@vito

This comment has been minimized.

Show comment
Hide comment
@vito

vito Oct 11, 2016

Member

As an addendum to this, we've elected to keep the worker heartbeating, and upon the TTL elapsing, it will enter the stalled state automatically. We've also expanded the meaning of stalled to mean "stop scheduling work on the worker", rather than just a feedback indicator. This way if a worker hard-crashes it won't remain stuck in the pool forever causing problems.

Later we'll add a fly command to reap these workers, possibly in bulk.

Member

vito commented Oct 11, 2016

As an addendum to this, we've elected to keep the worker heartbeating, and upon the TTL elapsing, it will enter the stalled state automatically. We've also expanded the meaning of stalled to mean "stop scheduling work on the worker", rather than just a feedback indicator. This way if a worker hard-crashes it won't remain stuck in the pool forever causing problems.

Later we'll add a fly command to reap these workers, possibly in bulk.

@cjcjameson

This comment has been minimized.

Show comment
Hide comment
@cjcjameson

cjcjameson Nov 5, 2016

Contributor

In the meantime while this is worked on, we're likely to prepare a PR describing caveats and more explanation on container lifecycles. Would you like that to go into http://concourse.ci/architecture.html#section_architecture-worker, probably as a separate github issue, I assume?

Since we'll just be writing up our best understanding, we will welcome lots of feedback. But to start... to what extent should documentation of container lifecycles live in Garden docs vs. Concourse docs? Considerations:

  • Would what we write apply to other Garden use cases, like Diego?
  • How much verbosity do you want in your docs, vs. as links for further reading?
  • How much can we assume a BOSH deployed concourse? Will more documentation not be acceptable if it doesn't speak to the binary distribution?
Contributor

cjcjameson commented Nov 5, 2016

In the meantime while this is worked on, we're likely to prepare a PR describing caveats and more explanation on container lifecycles. Would you like that to go into http://concourse.ci/architecture.html#section_architecture-worker, probably as a separate github issue, I assume?

Since we'll just be writing up our best understanding, we will welcome lots of feedback. But to start... to what extent should documentation of container lifecycles live in Garden docs vs. Concourse docs? Considerations:

  • Would what we write apply to other Garden use cases, like Diego?
  • How much verbosity do you want in your docs, vs. as links for further reading?
  • How much can we assume a BOSH deployed concourse? Will more documentation not be acceptable if it doesn't speak to the binary distribution?
@vito

This comment has been minimized.

Show comment
Hide comment
@vito

vito Nov 8, 2016

Member

@MaikuMori No worries.

The stalled state is pretty much the same thing you suggested, only without the second TTL that causes it to go away for good (that's the problem today we're fixing - time isn't enough in most of today's use cases). It provides visibility into flaky network situations and/or workers being updated in-place.

Let's defer the discussion around ephemeral workers to #755 which is the actual feature that we'd have to implement. This issue is about fixing today's design to at least be good at one thing (stable worker lifecycle) rather than being bad at everything.

Member

vito commented Nov 8, 2016

@MaikuMori No worries.

The stalled state is pretty much the same thing you suggested, only without the second TTL that causes it to go away for good (that's the problem today we're fixing - time isn't enough in most of today's use cases). It provides visibility into flaky network situations and/or workers being updated in-place.

Let's defer the discussion around ephemeral workers to #755 which is the actual feature that we'd have to implement. This issue is about fixing today's design to at least be good at one thing (stable worker lifecycle) rather than being bad at everything.

@iSynaptic

This comment has been minimized.

Show comment
Hide comment
@iSynaptic

iSynaptic Nov 8, 2016

Has anyone evaluated Docker's SwarmKit/InfraKit or HashiCorp's Serf library to see how it fits into this enhancement?

iSynaptic commented Nov 8, 2016

Has anyone evaluated Docker's SwarmKit/InfraKit or HashiCorp's Serf library to see how it fits into this enhancement?

@jchesterpivotal

This comment has been minimized.

Show comment
Hide comment
@jchesterpivotal

jchesterpivotal Nov 8, 2016

Contributor

Is SwarmKit compatible with Garden?

InfraKit seems to be a cut-down BOSH, so I can folk making it an installation path.

Serf is a library for syncing data -- based on the experience of Diego, distributed data stores promise more than they deliver. Plain old databases work fine for any sensible scale -- Diego are team are past 100k containers under management on MySQL and PostgreSQL last I looked.

Contributor

jchesterpivotal commented Nov 8, 2016

Is SwarmKit compatible with Garden?

InfraKit seems to be a cut-down BOSH, so I can folk making it an installation path.

Serf is a library for syncing data -- based on the experience of Diego, distributed data stores promise more than they deliver. Plain old databases work fine for any sensible scale -- Diego are team are past 100k containers under management on MySQL and PostgreSQL last I looked.

@freelock

This comment has been minimized.

Show comment
Hide comment
@freelock

freelock Nov 8, 2016

Contributor

Coming here from #707... Question about if a heartbeat is missed/worker goes into Stalled state -- we have 2 workers, one untagged one that runs the check jobs, and a tagged worker in a different LAN. If there's an intermittent interruption to a heartbeat, I take it that the proposal is to make the worker "stalled".

If this happens, will the next jobs up fail with "no available workers?" Or can it wait a few minutes for the worker to respond before failing? Don't really want to have to have a bunch of redundant workers, we're a small shop...

Contributor

freelock commented Nov 8, 2016

Coming here from #707... Question about if a heartbeat is missed/worker goes into Stalled state -- we have 2 workers, one untagged one that runs the check jobs, and a tagged worker in a different LAN. If there's an intermittent interruption to a heartbeat, I take it that the proposal is to make the worker "stalled".

If this happens, will the next jobs up fail with "no available workers?" Or can it wait a few minutes for the worker to respond before failing? Don't really want to have to have a bunch of redundant workers, we're a small shop...

@vito

This comment has been minimized.

Show comment
Hide comment
@vito

vito Nov 8, 2016

Member

@iSynaptic @jchesterpivotal Serf is more about communicating membership and not a lot more. You may be thinking about Consul. That being said, I'm not interesting in tying Concourse to either out of the box; we should just provide the APIs necessary for you to use whatever cluster membership tool you want.

@freelock Haven't worked that out yet, but that sounds something we could do. Now that we know the worker isn't meant to be gone, we can be a bit more forgiving.

@cjcjameson I'm not sure what you mean by container lifecycles; maybe just open an issue with questions you either have for us or have the answer to yourself and we'll see where it can fit in to the docs? I would prefer verbiage that's not distribution-specific. This may just mean defining what exactly "restarting a worker" vs "recreating a worker" means so people can map it to however their Concourse is deployed. Also I prefer delegating to other docs as much as possible to lower the maintenance of our own (out-of-date docs can be worse than no docs at all).

Member

vito commented Nov 8, 2016

@iSynaptic @jchesterpivotal Serf is more about communicating membership and not a lot more. You may be thinking about Consul. That being said, I'm not interesting in tying Concourse to either out of the box; we should just provide the APIs necessary for you to use whatever cluster membership tool you want.

@freelock Haven't worked that out yet, but that sounds something we could do. Now that we know the worker isn't meant to be gone, we can be a bit more forgiving.

@cjcjameson I'm not sure what you mean by container lifecycles; maybe just open an issue with questions you either have for us or have the answer to yourself and we'll see where it can fit in to the docs? I would prefer verbiage that's not distribution-specific. This may just mean defining what exactly "restarting a worker" vs "recreating a worker" means so people can map it to however their Concourse is deployed. Also I prefer delegating to other docs as much as possible to lower the maintenance of our own (out-of-date docs can be worse than no docs at all).

@jchesterpivotal

This comment has been minimized.

Show comment
Hide comment
@jchesterpivotal

jchesterpivotal Nov 9, 2016

Contributor

You may be thinking about Consul.

I believe I might be :)

Contributor

jchesterpivotal commented Nov 9, 2016

You may be thinking about Consul.

I believe I might be :)

@jchesterpivotal

This comment has been minimized.

Show comment
Hide comment
@jchesterpivotal

jchesterpivotal Nov 23, 2016

Contributor

For pun consistency, running should be in_flight.

Alternatively stalled should be something like halted and landing should be something like halting.

Contributor

jchesterpivotal commented Nov 23, 2016

For pun consistency, running should be in_flight.

Alternatively stalled should be something like halted and landing should be something like halting.

@petemounce

This comment has been minimized.

Show comment
Hide comment
@petemounce

petemounce Feb 16, 2017

For pun consistency, running should be in_flight.

aloft?

petemounce commented Feb 16, 2017

For pun consistency, running should be in_flight.

aloft?

@concourse-bot concourse-bot added unscheduled and removed scheduled labels Feb 17, 2017

@concourse-bot

This comment has been minimized.

Show comment
Hide comment
@concourse-bot

concourse-bot Feb 17, 2017

Hello again!

All stories related to this issue have been accepted, so I'm going to automatically close this issue.

At the time of writing, the following stories have been accepted:

  • #129726011 Volumes that belonged to a container that no longer exists should be destroyed on the worker and reaped from the database
  • #129725995 Containers that belong to a build that is no longer running should be destroyed from the worker and reaped from the database...unless the build they belong to both 1. failed and 2. is the latest build of the job. (Today's behavior of the containerkeepaliver.)
  • #129726203 When a worker fails to heartbeat, it should enter stalled state, rather than be removed, with its connection info cleared out
  • #129726593 A BOSH-deployed worker that is going away should enter retiring state, and only leave once all work is completed
  • #131589631 fly workers should show the state of each worker
  • #129729921 Workers can be explicitly removed via the API
  • #129726475 A worker being BOSH updated in-place should drain its builds, but retain its volumes and containers when it returns
  • #129726699 A binary-deployed worker that is being updated in-place should enter the landing state, and only leave once all work is completed
  • #136711963 A binary-deployed worker that is going away should enter the retiring state, and only leave once all work is completed
  • #129726933 Cache volumes that are of older versions of a resource that are no longer in use should be destroyed from the worker and reaped from the database
  • #129727245 Containers that were used for resource checking should be destroyed on the worker and reaped from the database upon reaching their 'best if used by' date
  • #129726063 Containers that belonged to a worker that no longer exists should be reaped from the database
  • #129726083 Volumes that belonged to a worker that no longer exists should be reaped from the database
  • #129726125 A hijacked container should not be destroyed while the user is in it
  • #129726155 A hijacked container whose actual container on the worker is gone should be reaped from the database
  • #129725921 Containers should have indefinite TTLs
  • #135570549 Volumes should have indefinite TTLs
  • #129725981 Remove TTL/validity columns from fly volumes and fly containers
  • #140165187 reopened: More explicit worker, container, volume lifecycles

If you feel there is still more to be done, or if you have any questions, leave a comment and we'll reopen if necessary!

concourse-bot commented Feb 17, 2017

Hello again!

All stories related to this issue have been accepted, so I'm going to automatically close this issue.

At the time of writing, the following stories have been accepted:

  • #129726011 Volumes that belonged to a container that no longer exists should be destroyed on the worker and reaped from the database
  • #129725995 Containers that belong to a build that is no longer running should be destroyed from the worker and reaped from the database...unless the build they belong to both 1. failed and 2. is the latest build of the job. (Today's behavior of the containerkeepaliver.)
  • #129726203 When a worker fails to heartbeat, it should enter stalled state, rather than be removed, with its connection info cleared out
  • #129726593 A BOSH-deployed worker that is going away should enter retiring state, and only leave once all work is completed
  • #131589631 fly workers should show the state of each worker
  • #129729921 Workers can be explicitly removed via the API
  • #129726475 A worker being BOSH updated in-place should drain its builds, but retain its volumes and containers when it returns
  • #129726699 A binary-deployed worker that is being updated in-place should enter the landing state, and only leave once all work is completed
  • #136711963 A binary-deployed worker that is going away should enter the retiring state, and only leave once all work is completed
  • #129726933 Cache volumes that are of older versions of a resource that are no longer in use should be destroyed from the worker and reaped from the database
  • #129727245 Containers that were used for resource checking should be destroyed on the worker and reaped from the database upon reaching their 'best if used by' date
  • #129726063 Containers that belonged to a worker that no longer exists should be reaped from the database
  • #129726083 Volumes that belonged to a worker that no longer exists should be reaped from the database
  • #129726125 A hijacked container should not be destroyed while the user is in it
  • #129726155 A hijacked container whose actual container on the worker is gone should be reaped from the database
  • #129725921 Containers should have indefinite TTLs
  • #135570549 Volumes should have indefinite TTLs
  • #129725981 Remove TTL/validity columns from fly volumes and fly containers
  • #140165187 reopened: More explicit worker, container, volume lifecycles

If you feel there is still more to be done, or if you have any questions, leave a comment and we'll reopen if necessary!

@andreasf

This comment has been minimized.

Show comment
Hide comment
@andreasf

andreasf May 5, 2017

Hi,

it seems like none of the stories covered issue concourse/fly#98. It would be great to have a non-interactive way to hijack a failed build step. Currently, fly will ask whether to hijack the check or the task container. Should I open a new issue?

andreasf commented May 5, 2017

Hi,

it seems like none of the stories covered issue concourse/fly#98. It would be great to have a non-interactive way to hijack a failed build step. Currently, fly will ask whether to hijack the check or the task container. Should I open a new issue?

@vito vito added this to the v2.8.0 milestone May 8, 2017

dcarley added a commit to alphagov/paas-cf that referenced this issue May 12, 2017

Pin all governmentpaas containers to a version
So that we can control how containers are updated between environments and
prevent forwards/backwards incompatibilities with code in the pipeline.

This will also prevent Concourse from re-downloading all of the containers
each time we merge a change to alphagov/paas-docker-cloudfoundry-tools
because Docker Hub rebuilds everything.

Using the functionality added in:

- alphagov/paas-docker-cloudfoundry-tools#93
- alphagov/paas-semver-resource#5
- alphagov/paas-s3-resource#5

The `run-bosh-cli` task uses a different format because the "bug" of needing
to manually choose the `check` or `task` container hasn't been fixed yet.
There's more information in:

- https://webcache.googleusercontent.com/search?q=cache:9l5PuEzf5e4J:https://github.com/concourse/fly/issues/98+&cd=4&hl=en&ct=clnk&gl=uk
- concourse/concourse#629 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment