Skip to content
This repository has been archived by the owner on Oct 13, 2023. It is now read-only.

[18.09 backport] Windows: DetachVhd attempt in cleanup #113

Merged
merged 1 commit into from
Nov 27, 2018

Conversation

thaJeztah
Copy link
Member

backport of moby#37712 for 18.09

fixes moby#36218 for 18.09

first commit was skipped, as the go-winio version was already bumped to v0.4.11 in #75

git checkout -b 18.09_backport_detach ce-engine/18.09

# skipped, already bumped in https://github.com/docker/engine/pull/75
# git cherry-pick -s -S -x 66966941f9c22dc5ab36414ef00a5aaf50e39dd4
git cherry-pick -s -S -x efdad5374465a2a889d7572834a2dcca147af4fb
git push -u origin

This is a fix for a few related scenarios where it's impossible to remove layers or containers
until the host is rebooted. Generally (or at least easiest to repro) through a forced daemon kill
while a container is running.

Possibly slightly worse than that, as following a host reboot, the scratch layer would possibly be leaked and
left on disk under the dataroot\windowsfilter directory after the container is removed.

One such example of a failure:

  1. run a long running container with the --rm flag
    docker run --rm -d --name test microsoft/windowsservercore powershell sleep 30
  2. Force kill the daemon not allowing it to cleanup. Simulates a crash or a host power-cycle.
  3. (re-)Start daemon
  4. docker ps -a
    PS C:\control> docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    7aff773d782b malloc "powershell start-sl…" 11 seconds ago Removal In Progress malloc
  5. Try to remove
    PS C:\control> docker rm 7aff
    Error response from daemon: container 7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d: driver "windowsfilter" failed to remove root filesystem: rename C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d-removing: Access is denied.
    PS C:\control>

Step 5 fails.

(cherry picked from commit efdad53)
Signed-off-by: Sebastiaan van Stijn github@gone.nl

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: John Howard <jhoward@microsoft.com>

This is a fix for a few related scenarios where it's impossible to remove layers or containers
until the host is rebooted. Generally (or at least easiest to repro) through a forced daemon kill
while a container is running.

Possibly slightly worse than that, as following a host reboot, the scratch layer would possibly be leaked and
left on disk under the dataroot\windowsfilter directory after the container is removed.

One such example of a failure:

1. run a long running container with the --rm flag
docker run --rm -d --name test microsoft/windowsservercore powershell sleep 30
2. Force kill the daemon not allowing it to cleanup. Simulates a crash or a host power-cycle.
3. (re-)Start daemon
4. docker ps -a
PS C:\control> docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                PORTS               NAMES
7aff773d782b        malloc              "powershell start-sl…"   11 seconds ago      Removal In Progress                       malloc
5. Try to remove
PS C:\control> docker rm 7aff
Error response from daemon: container 7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d: driver "windowsfilter" failed to remove root filesystem: rename C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d-removing: Access is denied.
PS C:\control>

Step 5 fails.

(cherry picked from commit efdad53)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah added this to the 18.09.1 milestone Nov 9, 2018
@thaJeztah
Copy link
Member Author

ping @jhowardmsft

@lowenna
Copy link

lowenna commented Nov 9, 2018

@thaJeztah LGTM (not a maintainer), assuming this repo already has the required go-winio revendor with the new function call. Will this also hit EE?

EDIT: NVM - just saw your comment about not needing the go-winio fix

@thaJeztah
Copy link
Member Author

Will this also hit EE?

Yes; during the time that CE is supported, the EE repository is kept in sync with the CE repository.

After 6 months (when the next CE version is released, and 18.09 CE reaches EOL), backported to 18.09 EE will be done in the private EE repository (EE has a longer support - 24 months)

1 similar comment
@thaJeztah
Copy link
Member Author

Will this also hit EE?

Yes; during the time that CE is supported, the EE repository is kept in sync with the CE repository.

After 6 months (when the next CE version is released, and 18.09 CE reaches EOL), backported to 18.09 EE will be done in the private EE repository (EE has a longer support - 24 months)

@thaJeztah thaJeztah closed this Nov 9, 2018
@thaJeztah thaJeztah reopened this Nov 9, 2018
@thaJeztah
Copy link
Member Author

Flaky test; https://jenkins.dockerproject.org/job/Docker-PRs/51701/console

This test once was flaky on PowerPC (moby#23626), but now seeing it flaky again on other platforms

01:00:23 FAIL: docker_api_swarm_service_test.go:96: DockerSwarmSuite.TestAPISwarmServicesMultipleAgents
01:00:23 
01:00:23 [dec8237be0549] waiting for daemon to start
01:00:23 [dec8237be0549] daemon started
01:00:23 
01:00:23 [d083a94b09054] waiting for daemon to start
01:00:23 [d083a94b09054] daemon started
01:00:23 
01:00:23 [d84c1ebbb4023] waiting for daemon to start
01:00:23 [d84c1ebbb4023] daemon started
01:00:23 
01:00:23 [d083a94b09054] exiting daemon
01:00:23 docker_api_swarm_service_test.go:120:
01:00:23     waitAndAssert(c, defaultReconciliationTimeout, reducedCheck(sumAsIntegers, d1.CheckActiveContainerCount, d3.CheckActiveContainerCount), checker.Equals, instances)
01:00:23 docker_utils_test.go:435:
01:00:23     c.Assert(v, checker, args...)
01:00:23 ... obtained int = 6
01:00:23 ... expected int = 5
01:00:23 ... output: "4616bee2750e\n79b336198aff\n90030a2e1c75\n", output: "36ca890a0447\n4847f43c120b\n6b0070467adb\n"
01:00:23 
01:00:23 [dec8237be0549] exiting daemon
01:00:23 [d84c1ebbb4023] exiting daemon

Copy link

@andrewhsu andrewhsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrewhsu andrewhsu merged commit 4fd103a into docker-archive:18.09 Nov 27, 2018
@thaJeztah thaJeztah deleted the 18.09_backport_detach branch November 27, 2018 18:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants