Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-fatal signals break restart policies #11065

Closed
tonistiigi opened this issue Feb 27, 2015 · 14 comments · Fixed by #26464
Closed

Non-fatal signals break restart policies #11065

tonistiigi opened this issue Feb 27, 2015 · 14 comments · Fixed by #26464
Labels
area/runtime kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Milestone

Comments

@tonistiigi
Copy link
Member

Sending a signal with docker kill that is handled by the application will turn off restart policies for that container.

> docker run -d --restart=always --name s0 busybox top
> kill -SIGTERM $(docker inspect -f "{{.State.Pid}}" s0)
> sleep 1
> docker ps | grep s0 | wc -l
1
> docker kill -s CONT s0
> docker ps | grep s0 | wc -l
1
> kill -SIGTERM $(docker inspect -f "{{.State.Pid}}" s0)
> sleep 1
> docker ps | grep s0 | wc -l
0

Not sure how to fix this without changing the behavior for docker kill as there is no way to predict if a signal will cause the container to exit or not.

To me its kind of weird that docker kill -s TERM foo and kill -SIGTERM $(docker inspect -f "{{.State.Pid}}" foo) have different meaning for the restart logic, but I can see it being useful for some situations.

Tested with v1.5.0 and master

@tonistiigi
Copy link
Member Author

/cc @crosbymichael

@crosbymichael crosbymichael added the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label Mar 26, 2015
@crosbymichael
Copy link
Contributor

Yep, this looks like a bug, thanks for the ping.

@crosbymichael crosbymichael added this to the 1.7.0 milestone Mar 26, 2015
@tonistiigi
Copy link
Member Author

@crosbymichael Any ideas how to fix it? Seems more like a logic issue than a code bug.

@crosbymichael
Copy link
Contributor

logic but it's kinda hard to identify what is a "non-fatal" signal compared to something else. Like what happens if a non-fatal signal actually crashes the app?

@tonistiigi
Copy link
Member Author

Yes, for anything other than STOP and KILL (afaik) its for the application to decide. There are no rules for the others. Some, like HUP or CONT are quite obvious of course but I don't think mind reading is a way to go. I'm more thinking about something like docker kill -s TERM --no-restart/--stop edit: docker kill --stop-timeout=10

@crosbymichael
Copy link
Contributor

What about things like SIGINT? I usually consider it a terminating signal in most apps

@tonistiigi
Copy link
Member Author

@crosbymichael Yes, thats the default. But application can choose to ignore it or do something else.

@ble
Copy link

ble commented Apr 24, 2015

I think this is where it happens:
Kill command calls into container.KillSig if there is a signal and it is not SIGKILL: https://github.com/docker/docker/blob/267bd51adbf9e344a5d18bb0c7df4e09bcb60668/daemon/kill.go#L25

container.KillSig calls monitor.ExitOnNext()
https://github.com/docker/docker/blob/267bd51adbf9e344a5d18bb0c7df4e09bcb60668/daemon/container.go#L714

@ble
Copy link

ble commented Apr 24, 2015

But it is more a "decide what the behavior should be" kinda bug than a "definitely wrong" bug.

@tonistiigi
Copy link
Member Author

@ble @crosbymichael

In response to #12727, I'm proposing the following syntax:

docker kill -s SIGNAL --stop-timeout=N

If the signal is one of SIGKILL, SIGTERM, SIGINT then the --stop-timeout will default to 10 seconds. For any other signal it will default to 0.

If stop-timeout is 0, the command will be equal to sending kill -s to the containers PID1. So no extra logic for handling the restarts. If you send docker kill -s KILL --stop-timeout=0 the container will be killed and restart monitor will take over and restart the container.

If stop-timeout is greater than 0 then after sending the signal, command will wait for the container to stop(docker kill command will not return instantly as it does now). If the container stops before the timeout is reached the restart monitor will not take over and docker kill command returns cleanly. Should the container not stop within the timeout, docker kill will return with an error code and message. If container stops after that, the restart monitor will take over again.

Problem with the current solution is that although it provides defaults that work most of the time it leaves edge-cases that are impossible to handle correctly. #12727 provides better defaults but still leaves cases that can't be handled. Not only the problem I brought up in the comments there - also for example when my container uses a custom signal to do a clean shutdown, I wouldn't be able to shut down a container with any signal except the ones that are whitelisted.

This solution still maintains good defaults and adds a way to handle all the edge-cases(I hope). Of course, as a counter-argument, the logic is quite complicated to grasp.

@ble
Copy link

ble commented Apr 26, 2015

@tonistiigi +1

I'm not super knowledgeable about the project, but it certainly seems like one might want to be able to determine whether or not a container had stopped on the basis of docker kill exit code.

@icecrime icecrime modified the milestones: 1.7.0, next May 28, 2015
@thaJeztah thaJeztah modified the milestones: 1.8.0, next Jun 5, 2015
@visualphoenix
Copy link

Can we get this moved back to 1.7.x?

@cpuguy83
Copy link
Member

We now have stopsignal support, we can probably look at this to determine if the restart policy should be disabled.

@markgoddard
Copy link

I am still seeing this issue in Docker CE 19.0.3:

docker run --stop-signal SIGTERM --restart=unless-stopped -d --name sleepy centos:7 bash -c "trap \"\" SIGHUP; sleep infinity"
docker ps -a | grep sleepy
docker kill --signal HUP sleepy
71f79e8b7c5d        centos:7                                           "bash -c 'trap \"\" SI…"   5 seconds ago       Up 4 seconds                            sleepy
systemctl restart docker
docker ps -a | grep sleepy
4e14d7f92eff        centos:7                                           "bash -c 'trap \"\" SI…"   51 seconds ago      Exited (137) 31 seconds ago                       sleepy

I would expect the container to be running after restarting docker.

openstack-gerrit pushed a commit to openstack/kolla-ansible that referenced this issue Oct 30, 2019
Due to a Docker bug [1] we cannot use Docker to send
SIGHUP to the container because it will mark it as
stopped.
This patch sends the signal directly to the process,
bypassing Docker.

'changed_when: false' is also removed from the
relevant task as it definitely changes the state.
In the future we could do the refresh only if
there really is a need for another one.

[1] moby/moby#11065

Change-Id: Ief73bbd24568d6941384ea3330ab45f11aa42d37
Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Closes-Bug: #1845244
openstack-gerrit pushed a commit to openstack/openstack that referenced this issue Oct 30, 2019
* Update kolla-ansible from branch 'master'
  - Merge "Fix nova scheduler down after first docker restart"
  - Fix nova scheduler down after first docker restart
    
    Due to a Docker bug [1] we cannot use Docker to send
    SIGHUP to the container because it will mark it as
    stopped.
    This patch sends the signal directly to the process,
    bypassing Docker.
    
    'changed_when: false' is also removed from the
    relevant task as it definitely changes the state.
    In the future we could do the refresh only if
    there really is a need for another one.
    
    [1] moby/moby#11065
    
    Change-Id: Ief73bbd24568d6941384ea3330ab45f11aa42d37
    Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
    Closes-Bug: #1845244
openstack-gerrit pushed a commit to openstack/kolla-ansible that referenced this issue Nov 5, 2019
Due to a Docker bug [1] we cannot use Docker to send
SIGHUP to the container because it will mark it as
stopped.
This patch sends the signal directly to the process,
bypassing Docker.

'changed_when: false' is also removed from the
relevant task as it definitely changes the state.
In the future we could do the refresh only if
there really is a need for another one.

[1] moby/moby#11065

Change-Id: Ief73bbd24568d6941384ea3330ab45f11aa42d37
Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Closes-Bug: #1845244
(cherry picked from commit 6bdf202)
(cherry picked from commit 2242fce)
openstack-gerrit pushed a commit to openstack/kolla-ansible that referenced this issue Nov 5, 2019
Due to a Docker bug [1] we cannot use Docker to send
SIGHUP to the container because it will mark it as
stopped.
This patch sends the signal directly to the process,
bypassing Docker.

'changed_when: false' is also removed from the
relevant task as it definitely changes the state.
In the future we could do the refresh only if
there really is a need for another one.

[1] moby/moby#11065

Change-Id: Ief73bbd24568d6941384ea3330ab45f11aa42d37
Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Closes-Bug: #1845244
(cherry picked from commit 6bdf202)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants