devicemapper waitClose timeout too fast when system is under load, need configurable timeout #4389

jwforres · 2014-02-28T19:08:19Z

Multiple processes are starting up containers and these containers are running processes that exit after they finish. When this happens under heavy system load the removal of these containers is sometimes failing. With the daemon's debug logging enabled I can see in these cases I can see that there is a timeout in waitClose so the removeDevice that follows that fails.

(10 of these) [debug] deviceset.go:754 Waiting for unmount of {hash} opencount=1
[error] driver.go:121 Warning: error unmounting device {hash}: Timeout while waiting for device {hash} to close

But a few seconds later I can check the open count on the container's device with dmsetup info and it is back to 0 and I am able to remove that container with docker rm

Right now this timeout is hardcoded to 1 second. Can we get a daemon configuration option to extend that timeout?

Log details from one of these container's lifecycle: http://ur1.ca/gq0m4

System details below:
OS: RHEL 6.5

uname -a
Linux ip-10-69-146-54 2.6.32-431.5.1.el6oso.bz844450.x86_64 #1 SMP Tue Feb 18 14:29:16 EST 2014 x86_64 x86_64 x86_64 GNU/Linux

docker version
Client version: 0.8.1
Go version (client): go1.2
Git commit (client): a1598d1/0.8.1
Server version: 0.8.1
Git commit (server): a1598d1/0.8.1
Go version (server): go1.2
Last stable version: 0.8.1

docker info
Containers: 9
Images: 15
Driver: devicemapper
Pool Name: docker-202:66-299-pool
Data file: /var/lib/docker/devicemapper/devicemapper/data
Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 1815.7 Mb
Data Space Total: 102400.0 Mb
Metadata Space Used: 2.5 Mb
Metadata Space Total: 2048.0 Mb
Debug mode (server): true
Debug mode (client): false
Fds: 23
Goroutines: 154
Execution Driver: lxc-0.9.0
EventsListeners: 277
Kernel Version: 2.6.32-431.5.1.el6oso.bz844450.x86_64
Init SHA1: 1af2a5d353d6a0d4bfebafc9360e2fb90f49610d
Init Path: /usr/libexec/docker/dockerinit

The text was updated successfully, but these errors were encountered:

We've seen some cases in the wild where waiting for unmount/deactivate of devmapper devices taking a long time (several seconds). So, we increase the sleeps to 10 seconds before we timeout. For instance: moby#4389 But, in order to not keep other processes blocked we unlock the global dm lock while waiting to allow other devices to continue working. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

alexlarsson · 2014-03-13T13:11:24Z

#4504 is closed, which i believe will fix this

jwforres · 2014-03-13T14:03:19Z

The first timeout was happening in waitClose, and then later it was also timing out failing to remove the device. The second timeout should resolved by #4504

As reported in moby#4389 we're currently seeing timeouts in waitClose on some systems. We already bumped the timeout in waitRemove() in moby#4504. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

We've seen some cases in the wild where waiting for unmount/deactivate of devmapper devices taking a long time (several seconds). So, we increase the sleeps to 10 seconds before we timeout. For instance: moby#4389 But, in order to not keep other processes blocked we unlock the global dm lock while waiting to allow other devices to continue working. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

As reported in moby#4389 we're currently seeing timeouts in waitClose on some systems. We already bumped the timeout in waitRemove() in moby#4504. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

unclejack · 2014-03-30T14:40:20Z

Alex Larsson's pull requests should have fixed this problem.

@jwforres If you're still running into this problem, please let us know. I'll close the issue now.

We've seen some cases in the wild where waiting for unmount/deactivate of devmapper devices taking a long time (several seconds). So, we increase the sleeps to 10 seconds before we timeout. For instance: moby/moby#4389 But, in order to not keep other processes blocked we unlock the global dm lock while waiting to allow other devices to continue working. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

As reported in moby/moby#4389 we're currently seeing timeouts in waitClose on some systems. We already bumped the timeout in waitRemove() in moby/moby#4504. Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

alexlarsson mentioned this issue Mar 6, 2014

devmapper: Increase sleep times and unlock while sleeping #4504

Merged

alexlarsson mentioned this issue Mar 18, 2014

devmapper: Increase timeout in waitClose to 10sec #4739

Merged

alexlarsson mentioned this issue Mar 18, 2014

docker fails to mount the block device for the container on devicemapper #4036

Closed

unclejack closed this as completed Mar 30, 2014

unclejack mentioned this issue Apr 21, 2014

docker daemon timeout in 0.8.1 #4228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

devicemapper waitClose timeout too fast when system is under load, need configurable timeout #4389

devicemapper waitClose timeout too fast when system is under load, need configurable timeout #4389

jwforres commented Feb 28, 2014

alexlarsson commented Mar 13, 2014

jwforres commented Mar 13, 2014

unclejack commented Mar 30, 2014

devicemapper waitClose timeout too fast when system is under load, need configurable timeout #4389

devicemapper waitClose timeout too fast when system is under load, need configurable timeout #4389

Comments

jwforres commented Feb 28, 2014

alexlarsson commented Mar 13, 2014

jwforres commented Mar 13, 2014

unclejack commented Mar 30, 2014