Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6 #8094

Merged
merged 1 commit into from Sep 23, 2014

Conversation

smerrill
Copy link
Contributor

This change will allow the Docker daemon's init script to wait up to 5
minutes before being forcibly terminated by the initscript. Many
non-trivial containers will take more than the default 3 seconds to
stop, which can result in containers whose rootfs is still mounted and
will not restart when the daemon starts up again, or worse, orphan
processes that are still running.

@smerrill
Copy link
Contributor Author

Component

docker-io-1.1.2-1.el6.x86_64

Hardware Platform

x86_64

Platform

CentOS 6.5

Summary

On restart/stop, docker can lose track of currently mounted dm devices on long-running container shutdowns.

Details

The docker daemon can lose track of thin mounted device mapper volumes if the docker daemon is killed. The RHEL 6 initscript uses killproc -p to stop the Docker daemon (see https://github.com/docker/docker/blob/v1.1.2/contrib/init/sysvinit-redhat/docker#L71) This can occur if a process takes longer than the killproc timeout to exit, or if the docker daemon receives a SIG_KILL. When the daemon is restarted, docker will be unable to restart any containers that were still active after the docker daemon dies.

Reproducibility

Every time

Steps to Reproduce

Given a CentOS 6.5 instance with docker-io-1.1.2 installed:

service docker start
docker run -d --name=dockerbug centos:centos6 /bin/sleep 1000
service docker restart
docker start dockerbug
Actual Results

What happened when you reached the bug?

[root@host ~]# docker ps | grep dockerbug
# (no response, this container is not running)
[root@host ~]# docker start dockerbug
Error response from daemon: Cannot start container dockerbug: Error getting container b4fb0bd2b919b4136ee15cf95bfe5e72d5e33c78bfa178bc21404f182c9fc64d from driver devicemapper: Error mounting '/dev/mapper/docker-8:3-12846136-b4fb0bd2b919b4136ee15cf95bfe5e72d5e33c78bfa178bc21404f182c9fc64d' on '/var/lib/docker/devicemapper/mnt/b4fb0bd2b919b4136ee15cf95bfe5e72d5e33c78bfa178bc21404f182c9fc64d': device or resource busy
2014/09/17 16:02:55 Error: failed to start one or more containers
Expected Results

What do you think was supposed to happen?

[root@host ~]# docker ps | grep dockerbug
# (no response, this container is not running)
[root@host ~]# docker start dockerbug
dockerbug
Workaround

After verifying the contained processes have exited, identify the affected device mapper ID by using docker inspect <containername>, and unmount the device by hand. By default it will be /var/lib/docker/devicemapper/mnt/$CONTAINER_ID. You should be able to start the container again.

[root@host ~]# umount /var/lib/docker/devicemapper/mnt/b4fb0bd2b919b4136ee15cf95bfe5e72d5e33c78bfa178bc21404f182c9fc64d
[root@host ~]# docker ps | grep dockerbug
# (no response, this container is not running)
[root@host ~]# docker start dockerbug
dockerbug

@smerrill smerrill force-pushed the feature/avoid-docker-start-woes branch from c5e7430 to 36dbc4b Compare September 17, 2014 21:00
@jstaph
Copy link

jstaph commented Sep 17, 2014

Possibly related issues:
when docker daemon stop illegality, docker start container_id may failed #8065
docker fails to mount the block device for the container on devicemapper #4036

@smerrill smerrill changed the title Try to avoid issues when the Docker daemon restarts. Try to avoid issues when the Docker daemon restarts on RHEL/CentOS 6 Sep 17, 2014
@smerrill smerrill force-pushed the feature/avoid-docker-start-woes branch from 36dbc4b to ce07932 Compare September 18, 2014 03:17
This change will allow the Docker daemon's init script to wait up to 5
minutes before being forcibly terminated by the initscript. Many
non-trivial containers will take more than the default 3 seconds to
stop, which can result in containers whose rootfs is still mounted and
will not restart when the daemon starts up again, or worse, orphan
processes that are still running.

Signed-off-by: Steven Merrill <steven.merrill@gmail.com>
@smerrill smerrill force-pushed the feature/avoid-docker-start-woes branch from ce07932 to 640d2ef Compare September 18, 2014 12:21
@smerrill smerrill changed the title Try to avoid issues when the Docker daemon restarts on RHEL/CentOS 6 Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6 Sep 18, 2014
@lsm5
Copy link
Contributor

lsm5 commented Sep 20, 2014

@maxamillion @jperrin ... could you check if this is good for merge here, and also for the el6 rpm?

@jperrin
Copy link
Contributor

jperrin commented Sep 21, 2014

Yeah, I'm okay with this for now. It's a valid point, although I think there might be a better way to solve this long term. This is a good short term fix.

@lsm5
Copy link
Contributor

lsm5 commented Sep 21, 2014

@jperrin cool thanks, I'll update the rpm with this.

@tianon ping

@smerrill
Copy link
Contributor Author

It is an interesting problem, because the docker daemon will obviously also try to shut down all running containers when it gets the signal to shut down, and it will probably always know better than a shell script how to do that properly, especially since parts of the daemon can register additional tasks with eng.onShutdown().

I had originally thought of putting something in the init script that would loop through $(docker ps -q) and try to stop all containers individually, but I decided not to since that would be duplicating functionality that already exists in the daemon itself.

@tianon
Copy link
Member

tianon commented Sep 22, 2014

Seems hacky, but LGTM. You're good for a merge on this then, @lsm5?

@lsm5
Copy link
Contributor

lsm5 commented Sep 23, 2014

@tianon yup

tianon added a commit that referenced this pull request Sep 23, 2014
Try to avoid issues when the Docker daemon restarts or stops on RHEL/CentOS 6
@tianon tianon merged commit 3ea5a20 into moby:master Sep 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants