Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple container and volume creation in FindOrCreateContainer #3052

Closed
ddadlani opened this issue Jan 15, 2019 · 7 comments · Fixed by #3546
Closed

Decouple container and volume creation in FindOrCreateContainer #3052

ddadlani opened this issue Jan 15, 2019 · 7 comments · Fixed by #3546
Assignees
Labels
Projects

Comments

@ddadlani
Copy link
Contributor

@ddadlani ddadlani commented Jan 15, 2019

Related to #2926.

After #3043, ContainerProvider.FindOrCreateContainer is easier to read and part of its responsibilities have been moved to Worker.FindOrCreateContainer

However, there are still some redundancies:

  1. containerProvider.createGardenContainer still creates its own gardenWorker
  2. containerProvider and gardenWorker are both passed a volumeClient, db.Worker, gardenClient and imageFactory, despite the fact that the ContainerProvider's two exposed methods are only called from the gardenWorker.
  3. If, in the future, we want to create the volumes at an earlier stage in the process, we need to move volume creation up one level since currently it is deeply intertwined with container creation.

This issue is meant to capture the work required to untangle the functionality of gardenWorker and containerProvider and maybe reduce the responsibilities of containerProvider as well.

@ddadlani ddadlani added this to Icebox in Runtime via automation Jan 15, 2019
@ddadlani ddadlani moved this from Icebox to Backlog in Runtime Jan 15, 2019
@ddadlani ddadlani moved this from Backlog to In Flight in Runtime Jan 15, 2019
@ddadlani ddadlani added the refactor label Jan 15, 2019
@ddadlani

This comment has been minimized.

Copy link
Contributor Author

@ddadlani ddadlani commented Jan 21, 2019

As part of the detangling, we moved all functionality of the containerProvider into the gardenWorker, however, this doesn't make it easier to remove the cyclic dependency. A lot of the logic requiring the gardenWorker in containerProvider.createGardenContainer is because of the creation or inflation of volumes, which requires a gardenWorker. This strengthens the argument for decoupling the volume lifecycle from the container lifecycle.

Currently, to create a db volume, you need a container ID, which means that the creation of a volume must come after the creation of a creatingContainer. Creating a volume without a container ID will cause the volume to be GCed because it is considered orphaned.

Our next steps will be to investigate using the lifecycle states and adding timestamps to the database volumes table. This may either be a separate initializing state, or using the existing lifecycle states (creating, created, destroying) depending on what works best. Volumes in the initializing or creating state would not be GCed. This would eliminate the need for a containerID in the volume.

This raises the following questions:

  1. How will volumes and containers be associated, if not by a container ID? Can we perhaps use the build ID to associate both? What would be the added cost of running db queries for containers and volumes based on a common build ID?
  2. What happens to volumes that remain orphaned because their associated container errored during creation? Should there be some cleanup logic based on the build ID if the build failed? Should we add a state_changed_at timestamp field and GC volumes whose state hasn't changed for x amount of time?
@ddadlani ddadlani changed the title Reduce cyclical references and complexity of FindOrCreateContainer Decouple container and volume creation in FindOrCreateContainer Jan 23, 2019
@ddadlani

This comment has been minimized.

Copy link
Contributor Author

@ddadlani ddadlani commented Jan 23, 2019

Some other work that would be needed to decouple the volume and container lifecycles:

  • Database migrations: the volume lifecycle states are currently represented as an enum volume_state in the database. This enum would need to be replaced in order to use the attached volume state.
  • A bunch of the volume_repository methods (which run database queries) would need to change to reflect this new lifecycle
  • GC needs to be changed to account for the new state
  • Volume creation can then be separated out from container creation in worker.gardenWorker or containerProvider, as the case may be.
  • An upgrade/downgrade/topgun test is required to ensure the up and down migrations work correctly
@ddadlani

This comment has been minimized.

Copy link
Contributor Author

@ddadlani ddadlani commented Jan 24, 2019

Thoughts for continuing work later:

  • Next step is to modify the methods in volume_repository.go to return attached vs created volumes where it makes sense
  • For methods called through the API, we may need to also make changes to fly to show created vs attached volumes. One such method to consider is GetTeamVolumes.
  • We need to add build_id and timestamp columns to the volumes table and modify GC to consider those.

This will allow us to create volumes before attaching them to the container to simplify container creation and task execution logic. The build_id is stored in the containerMetadata which is passed to the worker.GardenWorker in FindOrCreateContainer, so we can split that up to create volumes first and use the build_id to link volumes to containers before attaching them.

One possible problem may arise when trying to match a volume to a step, since each step runs in its own container. Investigation is pending.

@ddadlani

This comment has been minimized.

Copy link
Contributor Author

@ddadlani ddadlani commented Jan 29, 2019

Our previous strategy was to create a different attached state for volumes but that had a variety of repercussions:

  • Created and attached states were very similar and the interface methods would need to be repeated anywhere created volumes were used.
  • Adding a new state would drastically change volume GC and did not account for container creation failures occurring after the volume was created.
  • Altering the volume state enum type would need to be done in a non-transaction DB migration, creating the possibility that databases might be left dirty if the migration failed.

To mitigate the GC issue, we considered adding a timestamp to volumes which would indicate an expiry time, i.e. if at expiry time, the volume did not have a container attached, the volume could be GCed. However, this forces the existence of a minimum time before volumes can get GCed and doesn't account for all cases i.e. ATC being down for a while, etc.

We are now investigating using ContainerOwners as owners for both volumes and containers. Here is how GC works in different cases:

  • If the volume has been created but the container doesn't exist yet, it shouldn't be GCed.
  • If the volume is created and is attached to a container but the container is in use, it shouldn't be GCed.
  • If the build passed or failed and the container has been GCed, the volume should also be GCed.

This means that if the ContainerOwner and container are both removed, the volume is safe to delete.

@ddadlani

This comment has been minimized.

Copy link
Contributor Author

@ddadlani ddadlani commented Feb 7, 2019

For now we are going with separating out container and volume creation from after the creatingContainer stage because there may be other changes in the future which direct any future decoupling work.

We are removing a loop in FindOrCreateContainer

which appears to be unused, from looking at the logs. There are also locks around scheduling and running builds which should enforce that only one ATC creates a specific container.

However, we should ensure that we have an integration/top-level test that checks for race conditions in the case that multiple ATCs are trying to create the same container.

@ddadlani

This comment has been minimized.

Copy link
Contributor Author

@ddadlani ddadlani commented Feb 21, 2019

waiting on 5.0 release to merge

ddadlani added a commit that referenced this issue Mar 19, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
kcmannem pushed a commit that referenced this issue Mar 25, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
@ddadlani ddadlani moved this from In Flight to Backlog in Runtime Mar 27, 2019
@ddadlani ddadlani moved this from Backlog to In Flight in Runtime Mar 27, 2019
ddadlani added a commit that referenced this issue Apr 3, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
ddadlani added a commit that referenced this issue Apr 3, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
ddadlani added a commit that referenced this issue Apr 8, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
ddadlani added a commit that referenced this issue Apr 10, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
ddadlani added a commit that referenced this issue Apr 10, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
ddadlani added a commit that referenced this issue Apr 11, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
ddadlani added a commit that referenced this issue Apr 16, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
kcmannem pushed a commit that referenced this issue Apr 18, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
kcmannem pushed a commit that referenced this issue Apr 23, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
@kcmannem kcmannem moved this from In Flight to Backlog in Runtime May 1, 2019
kcmannem pushed a commit that referenced this issue May 1, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
ddadlani added a commit that referenced this issue May 7, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
ddadlani added a commit that referenced this issue May 9, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
kcmannem pushed a commit that referenced this issue May 10, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
kcmannem pushed a commit that referenced this issue May 16, 2019
combine containerProvider and worker
add workerHelper which stores helper functions

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
Co-authored-by: Krishna Mannem <kmannem@pivotal.io>

#3052
@marco-m

This comment has been minimized.

Copy link
Contributor

@marco-m marco-m commented May 21, 2019

@topherbullock @ddadlani is the blocked label still valid ? Asking since I see progress on branch refactor/3052. Does it refer to branch issue/3052 which had last commit in 2019-02 ? Merci :-)

@kcmannem kcmannem removed the blocked label May 21, 2019
@ddadlani ddadlani moved this from Backlog to In Flight in Runtime May 21, 2019
@vito vito closed this in #3546 May 21, 2019
Runtime automation moved this from In Flight to Done May 21, 2019
@ddadlani ddadlani moved this from Done to Accepted in Runtime May 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Runtime
Accepted
5 participants
You can’t perform that action at this time.