-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reap workers in the main loop #2314
base: master
Are you sure you want to change the base?
Conversation
0a2653c
to
0b5d41b
Compare
This is not safe as is due to this code that might cause Gunicorn to miss signals: def signal(self, sig, frame):
if len(self.SIG_QUEUE) < 5:
self.SIG_QUEUE.append(sig)
self.wakeup() |
Ahh, I think I may have identified why we had SIGCHLD separately handled? Is it because some systems exhibit a behavior where not calling |
Python actually checks for the situation I linked to, taking care not to re-install the signal handler for I've modified the first commit to address that and commented the code to indicate this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made some comment, but i'm still unsure to understand the need of this patch. Can you clarify what is it trying to solve? Do we have a way to reproduce the issue?
Just code cleanup. It avoids the need to call |
f59ea64
to
c5ca40e
Compare
I reworked this with a separate indicator flag that the arbiter uses to mark that it needs to reap workers in the next main loop iteration. This change ensures that the arbiter will always reap workers even if the signal queue is full. I also added better explanation for why it is good to re-install the handler. Please take another look. |
c5ca40e
to
8ef2e40
Compare
The use of use of Anyway the patch looks fine. I need to test it on BSDs systems, in particularly OpenBSD where I remember the special handling of the CHLD signal was added for. Can it wait next release (let's make it next month?) ? |
Not any more because that's what this patch changes!
I don't think any BSD has this problem. From what I read it's mos likely some versions of Solaris.1 In any case, setting the handler when it is already set is not a problem, so I'm not worried about regression here. The only possible change here would to fix behavior on a system where Gunicorn was previously broken.
Absolutely. |
8ef2e40
to
9347afe
Compare
4387671
to
7e279d2
Compare
@benoitc would you take another look here, please? I think it's a good idea to reap on the main thread. I think this approach is easier to understand, safer, reduces duplicate code, and eliminates concurrent modification of the workers list. |
7e279d2
to
de33726
Compare
Handle SIGCHLD like every other signal, waking up the arbiter and handling the signal in the main loop rather than in the signal handler. Take special care to reinstall the signal handler since Python may not. Clean up workers and call the worker_exit hook in only one place. When killing a worker, do not clean it up. The arbiter will now clean up the worker and invoke the hook when it reaps the worker. Ensure that all workers have their temporary watchdog files closed and that the arbiter does not exit or log about other child processes dying. With reaping handled in the main loop and kill_worker delegating responsibility for cleanup to the reaping loop, iterate over the workers dictionary everywhere else without concern for concurrent modification.
de33726
to
a526c00
Compare
if not worker: | ||
continue | ||
|
||
worker.tmp.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was best to ensure that we actually only do anything below here if this is actually our worker. But I can make this a separate PR if that's desired.
SIG_NAMES = dict( | ||
(getattr(signal, name), name[3:].lower()) for name in dir(signal) | ||
if name[:3] == "SIG" and name[3] != "_" | ||
if name[:3] == "SIG" and name[3] != "_" and name[3:] != "CLD" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition gave me a pause, not obvious.
if name[:3] == "SIG" and name[3] != "_" and name[3:] != "CLD" | |
if name[:3] == "SIG" and name[3] != "_" and name[3:] != "CLD" # SIGCLD is an obsolete name for SIGCHLD |
At least according to https://man7.org/linux/man-pages/man7/signal.7.html
What was the motivation for singling it out here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I encountered an error without this (on Linux) but I'll double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tilgovi I will check this patch. But one of the reason we handled CHLD differently is that montoring is separate from handling signals targetting the process itself. Having all in in the same signal loop means that this signal wil be in the same queue. |
I wasn't aware of this pr until now, but FYI, I've posted #3148 which also aims to handle SIGCHLD on the main thread, although in a slightly different way. |
We could put the SIGCHLD signal on the front of the queue, if what you want is just to handle it with higher priority, but I think it's a good idea to make the handler itself short and defer reaping to the main thread. I also had a version of this with just a boolean flag for whether to reap on wakeup, if for some reason you really don't want SIGCHLD in the queue at all. But I don't think I understand in what way this signal is for "monitoring" or different from "signals targeting the process itself." Can you explain anymore what concerns you? |
I see on the other PR that it seems like one hesitation you have is that you don't want to wait for children when exiting Gunicorn, but that already happens. While we reap children in the signal context, handling other signals is blocked. |
Handle SIGCHLD like every other signal, waking up the arbiter and handling the signal in the main loop rather than in the signal handler.
Take special care to reinstall the signal handler in case Python avoided doing so to prevent infinite recursion.
Clean up workers and call the worker_exit hook in only one place. When killing a worker, do not clean it up. The arbiter will now clean up the worker and invoke the hook when it reaps the worker.
With reaping handled in the main loop and kill_worker delegating responsibility for cleanup to the reaping loop, iterate over the workers dictionary everywhere else without concern for concurrent modification.