Skip to content
This repository has been archived by the owner on Oct 13, 2023. It is now read-only.

[18.09 backport] Windows: DetachVhd attempt in cleanup #113

Merged
merged 1 commit into from
Nov 27, 2018

Conversation

thaJeztah
Copy link
Member

backport of moby#37712 for 18.09

fixes moby#36218 for 18.09

first commit was skipped, as the go-winio version was already bumped to v0.4.11 in #75

git checkout -b 18.09_backport_detach ce-engine/18.09

# skipped, already bumped in https://github.com/docker/engine/pull/75
# git cherry-pick -s -S -x 66966941f9c22dc5ab36414ef00a5aaf50e39dd4
git cherry-pick -s -S -x efdad5374465a2a889d7572834a2dcca147af4fb
git push -u origin

This is a fix for a few related scenarios where it's impossible to remove layers or containers
until the host is rebooted. Generally (or at least easiest to repro) through a forced daemon kill
while a container is running.

Possibly slightly worse than that, as following a host reboot, the scratch layer would possibly be leaked and
left on disk under the dataroot\windowsfilter directory after the container is removed.

One such example of a failure:

  1. run a long running container with the --rm flag
    docker run --rm -d --name test microsoft/windowsservercore powershell sleep 30
  2. Force kill the daemon not allowing it to cleanup. Simulates a crash or a host power-cycle.
  3. (re-)Start daemon
  4. docker ps -a
    PS C:\control> docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    7aff773d782b malloc "powershell start-sl…" 11 seconds ago Removal In Progress malloc
  5. Try to remove
    PS C:\control> docker rm 7aff
    Error response from daemon: container 7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d: driver "windowsfilter" failed to remove root filesystem: rename C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d-removing: Access is denied.
    PS C:\control>

Step 5 fails.

(cherry picked from commit efdad53)
Signed-off-by: Sebastiaan van Stijn github@gone.nl

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: John Howard <jhoward@microsoft.com>

This is a fix for a few related scenarios where it's impossible to remove layers or containers
until the host is rebooted. Generally (or at least easiest to repro) through a forced daemon kill
while a container is running.

Possibly slightly worse than that, as following a host reboot, the scratch layer would possibly be leaked and
left on disk under the dataroot\windowsfilter directory after the container is removed.

One such example of a failure:

1. run a long running container with the --rm flag
docker run --rm -d --name test microsoft/windowsservercore powershell sleep 30
2. Force kill the daemon not allowing it to cleanup. Simulates a crash or a host power-cycle.
3. (re-)Start daemon
4. docker ps -a
PS C:\control> docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                PORTS               NAMES
7aff773d782b        malloc              "powershell start-sl…"   11 seconds ago      Removal In Progress                       malloc
5. Try to remove
PS C:\control> docker rm 7aff
Error response from daemon: container 7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d: driver "windowsfilter" failed to remove root filesystem: rename C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d-removing: Access is denied.
PS C:\control>

Step 5 fails.

(cherry picked from commit efdad53)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah added this to the 18.09.1 milestone Nov 9, 2018
@thaJeztah
Copy link
Member Author

ping @jhowardmsft

@lowenna
Copy link

lowenna commented Nov 9, 2018

@thaJeztah LGTM (not a maintainer), assuming this repo already has the required go-winio revendor with the new function call. Will this also hit EE?

EDIT: NVM - just saw your comment about not needing the go-winio fix

@thaJeztah
Copy link
Member Author

Will this also hit EE?

Yes; during the time that CE is supported, the EE repository is kept in sync with the CE repository.

After 6 months (when the next CE version is released, and 18.09 CE reaches EOL), backported to 18.09 EE will be done in the private EE repository (EE has a longer support - 24 months)

1 similar comment
@thaJeztah
Copy link
Member Author

Will this also hit EE?

Yes; during the time that CE is supported, the EE repository is kept in sync with the CE repository.

After 6 months (when the next CE version is released, and 18.09 CE reaches EOL), backported to 18.09 EE will be done in the private EE repository (EE has a longer support - 24 months)

@thaJeztah thaJeztah closed this Nov 9, 2018
@thaJeztah thaJeztah reopened this Nov 9, 2018
@thaJeztah
Copy link
Member Author

Flaky test; https://jenkins.dockerproject.org/job/Docker-PRs/51701/console

This test once was flaky on PowerPC (moby#23626), but now seeing it flaky again on other platforms

01:00:23 FAIL: docker_api_swarm_service_test.go:96: DockerSwarmSuite.TestAPISwarmServicesMultipleAgents
01:00:23 
01:00:23 [dec8237be0549] waiting for daemon to start
01:00:23 [dec8237be0549] daemon started
01:00:23 
01:00:23 [d083a94b09054] waiting for daemon to start
01:00:23 [d083a94b09054] daemon started
01:00:23 
01:00:23 [d84c1ebbb4023] waiting for daemon to start
01:00:23 [d84c1ebbb4023] daemon started
01:00:23 
01:00:23 [d083a94b09054] exiting daemon
01:00:23 docker_api_swarm_service_test.go:120:
01:00:23     waitAndAssert(c, defaultReconciliationTimeout, reducedCheck(sumAsIntegers, d1.CheckActiveContainerCount, d3.CheckActiveContainerCount), checker.Equals, instances)
01:00:23 docker_utils_test.go:435:
01:00:23     c.Assert(v, checker, args...)
01:00:23 ... obtained int = 6
01:00:23 ... expected int = 5
01:00:23 ... output: "4616bee2750e\n79b336198aff\n90030a2e1c75\n", output: "36ca890a0447\n4847f43c120b\n6b0070467adb\n"
01:00:23 
01:00:23 [dec8237be0549] exiting daemon
01:00:23 [d84c1ebbb4023] exiting daemon

Copy link

@andrewhsu andrewhsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrewhsu andrewhsu merged commit 4fd103a into docker-archive:18.09 Nov 27, 2018
@thaJeztah thaJeztah deleted the 18.09_backport_detach branch November 27, 2018 18:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants