Skip to content
This repository has been archived by the owner on Feb 2, 2022. It is now read-only.

remount volumes on restart #30

Merged
merged 2 commits into from
Sep 4, 2019
Merged

remount volumes on restart #30

merged 2 commits into from
Sep 4, 2019

Conversation

kcmannem
Copy link
Member

@kcmannem kcmannem commented Sep 3, 2019

Fixes concourse/concourse#4264

Signed-off-by: Krishna Mannem kmannem@pivotal.io

Signed-off-by: Krishna Mannem <kmannem@pivotal.io>
volume/driver/overlay_linux.go Outdated Show resolved Hide resolved
volume/driver/overlay_linux.go Outdated Show resolved Hide resolved
volume/driver/overlay_linux.go Outdated Show resolved Hide resolved
Signed-off-by: Krishna Mannem <kmannem@pivotal.io>
volume/driver/overlay_linux.go Show resolved Hide resolved
volume/driver/overlay_linux.go Outdated Show resolved Hide resolved
@ddadlani
Copy link
Contributor

ddadlani commented Sep 4, 2019

@xtreme-sameer-vohra and I tried out this PR and it seems to work. This was our process:

Acceptance

  1. Run Concourse with one worker using docker-compose
  2. Set a pipeline and watch it run successfully
  3. Run docker restart concourse_worker_1 and re-run the pipeline

Without this PR, we saw that booklit/unit would error out with task config not found, implying that the booklit volume was empty. We also double checked on the worker, that the volume corresponding to booklit was empty but the overlay directory had all the bits in it.

With the PR, we saw that the task ran successfully, and that the volume corresponding to booklit was correctly populated, and that cat /proc/mounts showed the corresponding overlay mount on the worker.

Asides

  1. In both the passing and failing cases, we saw check containers failing with unknown handle because the ATC was not yet aware that the containers were no longer present on the worker. This resolves itself once the missing_since column in the containers table triggers those containers to be GCed, but this is not a clear error.
  2. To test this case, we needed to disable worker retiring. This is because, when running docker restart, the worker transitions to retiring and then back to running. This caused the volumes to get GCed from the DB, but the worker restarted too quickly for the actual volumes to be cleaned up from the worker. This meant that we did not use the existing volumes on the worker, and also that every restart caused volumes to pile up. However this is specific to using docker restart as workers do not transition from retiring to running.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

empty volumes when using overlay on Prod
3 participants