Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devicemapper waitClose timeout too fast when system is under load, need configurable timeout #4389

Closed
jwforres opened this issue Feb 28, 2014 · 3 comments

Comments

@jwforres
Copy link

Multiple processes are starting up containers and these containers are running processes that exit after they finish. When this happens under heavy system load the removal of these containers is sometimes failing. With the daemon's debug logging enabled I can see in these cases I can see that there is a timeout in waitClose so the removeDevice that follows that fails.

(10 of these) [debug] deviceset.go:754 Waiting for unmount of {hash} opencount=1
[error] driver.go:121 Warning: error unmounting device {hash}: Timeout while waiting for device {hash} to close

But a few seconds later I can check the open count on the container's device with dmsetup info and it is back to 0 and I am able to remove that container with docker rm

Right now this timeout is hardcoded to 1 second. Can we get a daemon configuration option to extend that timeout?

Log details from one of these container's lifecycle: http://ur1.ca/gq0m4

System details below:
OS: RHEL 6.5

uname -a
Linux ip-10-69-146-54 2.6.32-431.5.1.el6oso.bz844450.x86_64 #1 SMP Tue Feb 18 14:29:16 EST 2014 x86_64 x86_64 x86_64 GNU/Linux

docker version
Client version: 0.8.1
Go version (client): go1.2
Git commit (client): a1598d1/0.8.1
Server version: 0.8.1
Git commit (server): a1598d1/0.8.1
Go version (server): go1.2
Last stable version: 0.8.1

docker info
Containers: 9
Images: 15
Driver: devicemapper
Pool Name: docker-202:66-299-pool
Data file: /var/lib/docker/devicemapper/devicemapper/data
Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 1815.7 Mb
Data Space Total: 102400.0 Mb
Metadata Space Used: 2.5 Mb
Metadata Space Total: 2048.0 Mb
Debug mode (server): true
Debug mode (client): false
Fds: 23
Goroutines: 154
Execution Driver: lxc-0.9.0
EventsListeners: 277
Kernel Version: 2.6.32-431.5.1.el6oso.bz844450.x86_64
Init SHA1: 1af2a5d353d6a0d4bfebafc9360e2fb90f49610d
Init Path: /usr/libexec/docker/dockerinit

alexlarsson added a commit to alexlarsson/docker that referenced this issue Mar 6, 2014
We've seen some cases in the wild where waiting for unmount/deactivate
of devmapper devices taking a long time (several seconds). So, we increase
the sleeps to 10 seconds before we timeout. For instance:

moby#4389

But, in order to not keep other processes blocked we unlock the global
dm lock while waiting to allow other devices to continue working.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
@alexlarsson
Copy link
Contributor

#4504 is closed, which i believe will fix this

@jwforres
Copy link
Author

The first timeout was happening in waitClose, and then later it was also timing out failing to remove the device. The second timeout should resolved by #4504

alexlarsson added a commit to alexlarsson/docker that referenced this issue Mar 18, 2014
As reported in moby#4389 we're
currently seeing timeouts in waitClose on some systems. We already
bumped the timeout in waitRemove() in
moby#4504.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
unclejack pushed a commit to unclejack/moby that referenced this issue Mar 18, 2014
We've seen some cases in the wild where waiting for unmount/deactivate
of devmapper devices taking a long time (several seconds). So, we increase
the sleeps to 10 seconds before we timeout. For instance:

moby#4389

But, in order to not keep other processes blocked we unlock the global
dm lock while waiting to allow other devices to continue working.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
unclejack pushed a commit to unclejack/moby that referenced this issue Mar 18, 2014
As reported in moby#4389 we're
currently seeing timeouts in waitClose on some systems. We already
bumped the timeout in waitRemove() in
moby#4504.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
@unclejack
Copy link
Contributor

Alex Larsson's pull requests should have fixed this problem.

@jwforres If you're still running into this problem, please let us know. I'll close the issue now.

shykes pushed a commit to shykes/docker-dev that referenced this issue Oct 2, 2014
We've seen some cases in the wild where waiting for unmount/deactivate
of devmapper devices taking a long time (several seconds). So, we increase
the sleeps to 10 seconds before we timeout. For instance:

moby/moby#4389

But, in order to not keep other processes blocked we unlock the global
dm lock while waiting to allow other devices to continue working.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
shykes pushed a commit to shykes/docker-dev that referenced this issue Oct 2, 2014
As reported in moby/moby#4389 we're
currently seeing timeouts in waitClose on some systems. We already
bumped the timeout in waitRemove() in
moby/moby#4504.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants