Non-fatal signals break restart policies #11065

tonistiigi · 2015-02-27T19:13:13Z

Sending a signal with docker kill that is handled by the application will turn off restart policies for that container.

> docker run -d --restart=always --name s0 busybox top
> kill -SIGTERM $(docker inspect -f "{{.State.Pid}}" s0)
> sleep 1
> docker ps | grep s0 | wc -l
1
> docker kill -s CONT s0
> docker ps | grep s0 | wc -l
1
> kill -SIGTERM $(docker inspect -f "{{.State.Pid}}" s0)
> sleep 1
> docker ps | grep s0 | wc -l
0

Not sure how to fix this without changing the behavior for docker kill as there is no way to predict if a signal will cause the container to exit or not.

To me its kind of weird that docker kill -s TERM foo and kill -SIGTERM $(docker inspect -f "{{.State.Pid}}" foo) have different meaning for the restart logic, but I can see it being useful for some situations.

Tested with v1.5.0 and master

The text was updated successfully, but these errors were encountered:

tonistiigi · 2015-03-05T20:33:15Z

/cc @crosbymichael

crosbymichael · 2015-03-26T19:34:15Z

Yep, this looks like a bug, thanks for the ping.

tonistiigi · 2015-03-26T19:37:43Z

@crosbymichael Any ideas how to fix it? Seems more like a logic issue than a code bug.

crosbymichael · 2015-03-26T19:39:42Z

logic but it's kinda hard to identify what is a "non-fatal" signal compared to something else. Like what happens if a non-fatal signal actually crashes the app?

tonistiigi · 2015-03-26T19:48:29Z

Yes, for anything other than STOP and KILL (afaik) its for the application to decide. There are no rules for the others. Some, like HUP or CONT are quite obvious of course but I don't think mind reading is a way to go. I'm more thinking about something like ~~docker kill -s TERM --no-restart/--stop~~ edit: docker kill --stop-timeout=10

crosbymichael · 2015-03-27T17:20:23Z

What about things like SIGINT? I usually consider it a terminating signal in most apps

tonistiigi · 2015-03-27T20:19:07Z

@crosbymichael Yes, thats the default. But application can choose to ignore it or do something else.

ble · 2015-04-24T00:19:31Z

I think this is where it happens:
Kill command calls into container.KillSig if there is a signal and it is not SIGKILL: https://github.com/docker/docker/blob/267bd51adbf9e344a5d18bb0c7df4e09bcb60668/daemon/kill.go#L25

container.KillSig calls monitor.ExitOnNext()
https://github.com/docker/docker/blob/267bd51adbf9e344a5d18bb0c7df4e09bcb60668/daemon/container.go#L714

ble · 2015-04-24T00:21:15Z

But it is more a "decide what the behavior should be" kinda bug than a "definitely wrong" bug.

tonistiigi · 2015-04-25T09:27:36Z

@ble @crosbymichael

In response to #12727, I'm proposing the following syntax:

docker kill -s SIGNAL --stop-timeout=N

If the signal is one of SIGKILL, SIGTERM, SIGINT then the --stop-timeout will default to 10 seconds. For any other signal it will default to 0.

If stop-timeout is 0, the command will be equal to sending kill -s to the containers PID1. So no extra logic for handling the restarts. If you send docker kill -s KILL --stop-timeout=0 the container will be killed and restart monitor will take over and restart the container.

If stop-timeout is greater than 0 then after sending the signal, command will wait for the container to stop(docker kill command will not return instantly as it does now). If the container stops before the timeout is reached the restart monitor will not take over and docker kill command returns cleanly. Should the container not stop within the timeout, docker kill will return with an error code and message. If container stops after that, the restart monitor will take over again.

Problem with the current solution is that although it provides defaults that work most of the time it leaves edge-cases that are impossible to handle correctly. #12727 provides better defaults but still leaves cases that can't be handled. Not only the problem I brought up in the comments there - also for example when my container uses a custom signal to do a clean shutdown, I wouldn't be able to shut down a container with any signal except the ones that are whitelisted.

This solution still maintains good defaults and adds a way to handle all the edge-cases(I hope). Of course, as a counter-argument, the logic is quite complicated to grasp.

ble · 2015-04-26T00:09:41Z

@tonistiigi +1

I'm not super knowledgeable about the project, but it certainly seems like one might want to be able to determine whether or not a container had stopped on the basis of docker kill exit code.

visualphoenix · 2015-07-02T21:32:18Z

Can we get this moved back to 1.7.x?

cpuguy83 · 2015-09-15T15:18:15Z

We now have stopsignal support, we can probably look at this to determine if the restart policy should be disabled.

markgoddard · 2019-10-24T13:59:05Z

I am still seeing this issue in Docker CE 19.0.3:

docker run --stop-signal SIGTERM --restart=unless-stopped -d --name sleepy centos:7 bash -c "trap \"\" SIGHUP; sleep infinity"
docker ps -a | grep sleepy
docker kill --signal HUP sleepy
71f79e8b7c5d        centos:7                                           "bash -c 'trap \"\" SI…"   5 seconds ago       Up 4 seconds                            sleepy
systemctl restart docker
docker ps -a | grep sleepy
4e14d7f92eff        centos:7                                           "bash -c 'trap \"\" SI…"   51 seconds ago      Exited (137) 31 seconds ago                       sleepy

I would expect the container to be running after restarting docker.

Due to a Docker bug [1] we cannot use Docker to send SIGHUP to the container because it will mark it as stopped. This patch sends the signal directly to the process, bypassing Docker. 'changed_when: false' is also removed from the relevant task as it definitely changes the state. In the future we could do the refresh only if there really is a need for another one. [1] moby/moby#11065 Change-Id: Ief73bbd24568d6941384ea3330ab45f11aa42d37 Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com> Closes-Bug: #1845244

* Update kolla-ansible from branch 'master' - Merge "Fix nova scheduler down after first docker restart" - Fix nova scheduler down after first docker restart Due to a Docker bug [1] we cannot use Docker to send SIGHUP to the container because it will mark it as stopped. This patch sends the signal directly to the process, bypassing Docker. 'changed_when: false' is also removed from the relevant task as it definitely changes the state. In the future we could do the refresh only if there really is a need for another one. [1] moby/moby#11065 Change-Id: Ief73bbd24568d6941384ea3330ab45f11aa42d37 Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com> Closes-Bug: #1845244

Due to a Docker bug [1] we cannot use Docker to send SIGHUP to the container because it will mark it as stopped. This patch sends the signal directly to the process, bypassing Docker. 'changed_when: false' is also removed from the relevant task as it definitely changes the state. In the future we could do the refresh only if there really is a need for another one. [1] moby/moby#11065 Change-Id: Ief73bbd24568d6941384ea3330ab45f11aa42d37 Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com> Closes-Bug: #1845244 (cherry picked from commit 6bdf202) (cherry picked from commit 2242fce)

Due to a Docker bug [1] we cannot use Docker to send SIGHUP to the container because it will mark it as stopped. This patch sends the signal directly to the process, bypassing Docker. 'changed_when: false' is also removed from the relevant task as it definitely changes the state. In the future we could do the refresh only if there really is a need for another one. [1] moby/moby#11065 Change-Id: Ief73bbd24568d6941384ea3330ab45f11aa42d37 Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com> Closes-Bug: #1845244 (cherry picked from commit 6bdf202)

crosbymichael added the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label Mar 26, 2015

crosbymichael added this to the 1.7.0 milestone Mar 26, 2015

This was referenced Apr 24, 2015

When sending a signal to container, only set container to exit on next if the signal is KILL or INT #12722

Closed

Don't turn off restart when sending some signals #12727

Closed

icecrime modified the milestones: 1.7.0, next May 28, 2015

thaJeztah modified the milestones: 1.8.0, next Jun 5, 2015

visualphoenix mentioned this issue Jul 2, 2015

Drop support for RHEL6/CentOS6 #14365

Closed

3 tasks

calavera removed this from the 1.8.0 milestone Jul 20, 2015

bboreham mentioned this issue Jul 22, 2015

weaveproxy stub breaks Docker restart policies weaveworks/weave#1209

Closed

tonistiigi mentioned this issue Aug 18, 2015

[Carry #11827] Add a new event for process exiting #15669

Closed

icecrime added the area/runtime label Sep 10, 2016

vdemeester mentioned this issue Sep 10, 2016

Taking stop-signal into account when docker kill #26464

Merged

2 tasks

thaJeztah mentioned this issue Sep 28, 2016

unless-stopped restart policy doesn't work well with docker kill --signal #26087

Closed

crosbymichael closed this as completed in #26464 Oct 28, 2016

thaJeztah added this to the 1.13.0 milestone Oct 28, 2016

markgoddard mentioned this issue Oct 24, 2019

--restart=unless-stopped does not restart containers after daemon restart docker/for-linux#652

Open

3 tasks

leonshaw mentioned this issue Jul 31, 2020

Signal breaks unless-stopped restart policy #41302

Open

chmyaf mentioned this issue Nov 30, 2021

LATERA-779: Update hook command hydra-billing/certbot-service#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-fatal signals break restart policies #11065

Non-fatal signals break restart policies #11065

tonistiigi commented Feb 27, 2015

tonistiigi commented Mar 5, 2015

crosbymichael commented Mar 26, 2015

tonistiigi commented Mar 26, 2015

crosbymichael commented Mar 26, 2015

tonistiigi commented Mar 26, 2015

crosbymichael commented Mar 27, 2015

tonistiigi commented Mar 27, 2015

ble commented Apr 24, 2015

ble commented Apr 24, 2015

tonistiigi commented Apr 25, 2015

ble commented Apr 26, 2015

visualphoenix commented Jul 2, 2015

cpuguy83 commented Sep 15, 2015

markgoddard commented Oct 24, 2019

Non-fatal signals break restart policies #11065

Non-fatal signals break restart policies #11065

Comments

tonistiigi commented Feb 27, 2015

tonistiigi commented Mar 5, 2015

crosbymichael commented Mar 26, 2015

tonistiigi commented Mar 26, 2015

crosbymichael commented Mar 26, 2015

tonistiigi commented Mar 26, 2015

crosbymichael commented Mar 27, 2015

tonistiigi commented Mar 27, 2015

ble commented Apr 24, 2015

ble commented Apr 24, 2015

tonistiigi commented Apr 25, 2015

ble commented Apr 26, 2015

visualphoenix commented Jul 2, 2015

cpuguy83 commented Sep 15, 2015

markgoddard commented Oct 24, 2019