Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-add and improve the warning for workers terminated due to a signal #2908

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

TomiBelan
Copy link
Contributor

  1. Re-add the warning for workers terminated due to a signal

    The warnings are printed later in the main loop, not directly in the signal handler. This should fix the RuntimeError from RuntimeError due to a warning being logged on SIGHUP #2564. See https://stackoverflow.com/q/45680378.

    The order of events in the log might be a bit different from before.

  2. Fix false positive "terminated due to signal 15"

    During a reload (i.e. SIGHUP sent to master), gunicorn used to send multiple SIGTERMs to each worker. Every time the master got a SIGCHLD from one worker exiting, it sent a SIGTERM to all old workers who are still alive.

    If you send a worker another SIGTERM when it's just about to shut down and the Python interpreter is already deinitialized, the signal won't be caught. The process exit status will be WIFSIGNALED SIGTERM.

    Explanation:

    This PR limits it to sending max one SIGTERM per worker in manage_workers(). I believe this is safe because handle_exit() in workers/base.py just sets self.alive = False anyway. So sending multiple SIGTERMs is never helpful to get a worker "unstuck". If a worker doesn't react to the first SIGTERM because it's in some kind of infinite loop, murder_workers() will notice it eventually, and send it a SIGABRT and a SIGKILL.

    The SIGTERMs sent by handle_winch() and stop() are not changed in this PR.

I get the impression Gunicorn maintenance is mostly dead, so I don't have high expectations of getting any reply, but I'll hold out hope that somebody will review this PR one day. :)

Testing

Compare what happens on these three commits. Use any WSGI app.

$ git checkout 20.1.0   # you will see warnings about signal 15 and probably some log errors
$ git checkout 01cf8421511dd16405a09564931e4508168a8909  # you will see warnings about signal 15
$ git checkout 3fe47e6a3f1d3a2b4011020b284970efeffc1312  # you will see only INFO

$ python3 -m gunicorn -w 10 testapp

$ kill -HUP $THE_MAIN_PID

I got the warnings and errors on the first try, but if you can't reproduce the problem, increasing the number of workers might help.

Original: b695b49 (benoitc#2475)
Reverted: 76f8da2

The warnings are printed later in the main loop, not directly in the signal
handler. This should fix the RuntimeError from benoitc#2564.
During a reload (i.e. SIGHUP sent to master), gunicorn used to send multiple
SIGTERMs to each worker. Every time the master got a SIGCHLD from one worker
exiting, it sent a SIGTERM to all old workers who are still alive.

If you send a worker another SIGTERM when it's just about to shut down and
the Python interpreter is already deinitialized, the signal won't be caught.
The process exit status will be WIFSIGNALED SIGTERM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants