Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6 #8094

smerrill · 2014-09-17T20:45:36Z

This change will allow the Docker daemon's init script to wait up to 5
minutes before being forcibly terminated by the initscript. Many
non-trivial containers will take more than the default 3 seconds to
stop, which can result in containers whose rootfs is still mounted and
will not restart when the daemon starts up again, or worse, orphan
processes that are still running.

smerrill · 2014-09-17T20:50:38Z

Component

docker-io-1.1.2-1.el6.x86_64

Hardware Platform

x86_64

Platform

CentOS 6.5

Summary

On restart/stop, docker can lose track of currently mounted dm devices on long-running container shutdowns.

Details

The docker daemon can lose track of thin mounted device mapper volumes if the docker daemon is killed. The RHEL 6 initscript uses killproc -p to stop the Docker daemon (see https://github.com/docker/docker/blob/v1.1.2/contrib/init/sysvinit-redhat/docker#L71) This can occur if a process takes longer than the killproc timeout to exit, or if the docker daemon receives a SIG_KILL. When the daemon is restarted, docker will be unable to restart any containers that were still active after the docker daemon dies.

Reproducibility

Every time

Steps to Reproduce

Given a CentOS 6.5 instance with docker-io-1.1.2 installed:

service docker start
docker run -d --name=dockerbug centos:centos6 /bin/sleep 1000
service docker restart
docker start dockerbug

Actual Results

What happened when you reached the bug?

[root@host ~]# docker ps | grep dockerbug
# (no response, this container is not running)
[root@host ~]# docker start dockerbug
Error response from daemon: Cannot start container dockerbug: Error getting container b4fb0bd2b919b4136ee15cf95bfe5e72d5e33c78bfa178bc21404f182c9fc64d from driver devicemapper: Error mounting '/dev/mapper/docker-8:3-12846136-b4fb0bd2b919b4136ee15cf95bfe5e72d5e33c78bfa178bc21404f182c9fc64d' on '/var/lib/docker/devicemapper/mnt/b4fb0bd2b919b4136ee15cf95bfe5e72d5e33c78bfa178bc21404f182c9fc64d': device or resource busy
2014/09/17 16:02:55 Error: failed to start one or more containers

Expected Results

What do you think was supposed to happen?

[root@host ~]# docker ps | grep dockerbug
# (no response, this container is not running)
[root@host ~]# docker start dockerbug
dockerbug

Workaround

After verifying the contained processes have exited, identify the affected device mapper ID by using docker inspect <containername>, and unmount the device by hand. By default it will be /var/lib/docker/devicemapper/mnt/$CONTAINER_ID. You should be able to start the container again.

[root@host ~]# umount /var/lib/docker/devicemapper/mnt/b4fb0bd2b919b4136ee15cf95bfe5e72d5e33c78bfa178bc21404f182c9fc64d
[root@host ~]# docker ps | grep dockerbug
# (no response, this container is not running)
[root@host ~]# docker start dockerbug
dockerbug

jstaph · 2014-09-17T21:00:51Z

Possibly related issues:
when docker daemon stop illegality, docker start container_id may failed #8065
docker fails to mount the block device for the container on devicemapper #4036

This change will allow the Docker daemon's init script to wait up to 5 minutes before being forcibly terminated by the initscript. Many non-trivial containers will take more than the default 3 seconds to stop, which can result in containers whose rootfs is still mounted and will not restart when the daemon starts up again, or worse, orphan processes that are still running. Signed-off-by: Steven Merrill <steven.merrill@gmail.com>

lsm5 · 2014-09-20T21:13:29Z

@maxamillion @jperrin ... could you check if this is good for merge here, and also for the el6 rpm?

jperrin · 2014-09-21T03:00:32Z

Yeah, I'm okay with this for now. It's a valid point, although I think there might be a better way to solve this long term. This is a good short term fix.

lsm5 · 2014-09-21T03:03:59Z

@jperrin cool thanks, I'll update the rpm with this.

@tianon ping

smerrill · 2014-09-21T13:36:46Z

It is an interesting problem, because the docker daemon will obviously also try to shut down all running containers when it gets the signal to shut down, and it will probably always know better than a shell script how to do that properly, especially since parts of the daemon can register additional tasks with eng.onShutdown().

I had originally thought of putting something in the init script that would loop through $(docker ps -q) and try to stop all containers individually, but I decided not to since that would be duplicating functionality that already exists in the daemon itself.

tianon · 2014-09-22T19:22:29Z

Seems hacky, but LGTM. You're good for a merge on this then, @lsm5?

lsm5 · 2014-09-23T00:01:27Z

@tianon yup

Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6

smerrill force-pushed the feature/avoid-docker-start-woes branch from c5e7430 to 36dbc4b Compare September 17, 2014 21:00

smerrill changed the title ~~Try to avoid issues when the Docker daemon restarts.~~ Try to avoid issues when the Docker daemon restarts on RHEL/CentOS 6 Sep 17, 2014

smerrill force-pushed the feature/avoid-docker-start-woes branch from 36dbc4b to ce07932 Compare September 18, 2014 03:17

smerrill force-pushed the feature/avoid-docker-start-woes branch from ce07932 to 640d2ef Compare September 18, 2014 12:21

smerrill mentioned this pull request Sep 18, 2014

docker fails to mount the block device for the container on devicemapper #4036

Closed

smerrill changed the title ~~Try to avoid issues when the Docker daemon restarts on RHEL/CentOS 6~~ Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6 Sep 18, 2014

tianon added a commit that referenced this pull request Sep 23, 2014

Merge pull request #8094 from smerrill/feature/avoid-docker-start-woes

3ea5a20

Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6

tianon merged commit 3ea5a20 into moby:master Sep 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6 #8094

Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6 #8094

smerrill commented Sep 17, 2014

smerrill commented Sep 17, 2014

jstaph commented Sep 17, 2014

lsm5 commented Sep 20, 2014

jperrin commented Sep 21, 2014

lsm5 commented Sep 21, 2014

smerrill commented Sep 21, 2014

tianon commented Sep 22, 2014

lsm5 commented Sep 23, 2014

Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6 #8094

Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6 #8094

Conversation

smerrill commented Sep 17, 2014

smerrill commented Sep 17, 2014

Component

Hardware Platform

Platform

Summary

Details

Reproducibility

Steps to Reproduce

Actual Results

Expected Results

Workaround

jstaph commented Sep 17, 2014

lsm5 commented Sep 20, 2014

jperrin commented Sep 21, 2014

lsm5 commented Sep 21, 2014

smerrill commented Sep 21, 2014

tianon commented Sep 22, 2014

lsm5 commented Sep 23, 2014