Race condition between SIGTERM and SIGHUP

Tested on Debian testing, with supervisor 3.3.1. If supervisord is in the process of shutting down after receiving a SIGTERM, and it receives a SIGHUP while it is waiting for some process to die, the shutdown process is interrupted and supervisord restarts instead.
Steps to reproduce:
1) start supervisord with the attached configuration file [(test-supervisor.zip)](https://github.com/Supervisor/supervisor/files/2612380/test-supervisor.zip), which runs the test.sh script (set the path to test.sh in the configuration file). test.sh sleeps for a few seconds on SIGTERM, giving enough time for the race condition to happen.
2) `$ killall -SIGTERM supervisord && sleep 0.5 && killall -SIGHUP supervisord`
3) supervisord restarts instead of terminating. The program's output reads (emphasis mine):
...
2018-11-24 15:57:04,987 INFO spawned: 'test' with pid 9226
2018-11-24 15:57:05,988 INFO success: test entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-11-24 15:58:06,750 **WARN received SIGTERM indicating exit request**
2018-11-24 15:58:06,750 INFO waiting for test to die
2018-11-24 15:58:07,252 **WARN received SIGHUP indicating restart request**
2018-11-24 15:58:10,255 WARN killing 'test' (9226) with SIGKILL
2018-11-24 15:58:10,255 INFO waiting for test to die
2018-11-24 15:58:10,256 INFO stopped: test (terminated by SIGKILL)
2018-11-24 15:58:10,264 **INFO RPC interface 'supervisor' initialized**
2018-11-24 15:58:10,264 CRIT Server 'unix_http_server' running without any HTTP authentication checking
...

Why it matters: when a systemd user session is closed, all the running programs receive SIGTERM and SIGHUP, on the basis that at least one of them should instruct the program to shutdown. In the case of supervisord, this often triggers the race condition and the program is restarted rather than stopped. Then systemd waits for 90 more seconds before killing it with SIGKILL. When this happens during system shutdown or restart, the computer hangs for 90 seconds before completing the operation, which can totally be avoided.

Possible fixes:
1) ignore/queue incoming signals while a signal is already being handled (by reading older issues, it seems to me that this is already done for supervisorctl commands, but clearly not for signals).
2) introduce some kind of priority between signals, e.g. SIGTERM can interrupt a SIGHUP but not the other way around.

I could try to make a PR once a clear strategy is chosen.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Race condition between SIGTERM and SIGHUP #1175

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Race condition between SIGTERM and SIGHUP #1175

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions