-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arbiter: Handle SIGCHLD like all other signals + misc signal handling improvements #3148
base: master
Are you sure you want to change the base?
Conversation
97c695b
to
fb6e5d2
Compare
Thanks for the patch I appreciate. But I'm not sure I agree with this change or #2314 (cc @tilgovi ). It removes the asynchronous way to monitor the workers by mixing it with arbiter signaling. Afaik we don't to wait for a worker signal when we just want to quit Gunicorn or reload it. These messages should be taken in priority among the rest. Also likely we want to process all signals related to the workers and not skip one which is done when the signal queue is used. I see this is removed in this patch but by removing the signal queue you miss the fact that it is used a s a buffer that protect us to receive to much messages to handle. Sometimes this happening due to a badly configured alert handler or just a supervisor like systemd going crazy. We need to keep it around. This has been added on purpose. To fix the orignal issue (not having the ability to print without crashing), I think we should handle logging in a separate thread out of the signal handler. We can redirect all messages to a pipe or unix socket that will be consumed by this thread to print or actually write the logs. A patch in this direction would not break the logic exposed above and continue. This thread could also be used to handle stats. Would you make this patch? |
@benoitc Thanks for having a look! It's great to know some history as well behind the code -- git blame didn't help me much there unfortunately. I'll answer your findings in the same order as you wrote them:
Maybe I've misunderstood then, because I believed that the code actually did wait for workers when one quits/reloads Gunicorn? I am basing this on Arbiter.stop(): def stop(self, graceful=True):
... # lots of lines omitted
self.kill_workers(sig)
# wait until the graceful timeout
while self.WORKERS and time.time() < limit:
time.sleep(0.1) This function won't exit until either the timeout has been reach, or all workers have been removed. So, while the code above is (most likely) executing the time.sleep(0.1), SIGCHLD keeps coming in, removing the workers. Reading #2314, I believe this is the same analysis which @tilgovi has done as well. Just so that I don't misunderstand, was this what you were referring to?
Yes, I suspected that the '5' max limit was due to something like that, but if there is a supervisor process going bananas, spamming signals to a process, I doubt that such a limit would help in a precarious situation like that since the queue would keep being full as it gets new signals? The reason the queue was removed has to do with signal safety: I think the signal handler should be kept as simple as possible, with no memory allocation nor print-outs nor locking of any kind. Since I believe the list manipulation in python involves some kind of malloc(), I don't think it's strictly signal safe (although in practice, it might be). And since I saw that SIG_QUEUE has been around since the very first revision of arbiter.py (in 2009), I thought that maybe things have changed since back then. And since there's no logging when the buffer is full, I didn't know if we actually had any data which supported its existence?
But how would we implement such a proposal, send formatted logs from a signal context to a pipe, without doing any dynamic memory allocation in the signal context? In C I would have understood -- use a static buffer on the stack and use snprintf(). In python however, we don't have that kind of control over memory. Also, then we'll have another thread to manage with all problems and considerations that comes with it (life-time, locking). So before introducing another thread, I think we need to carefully consider if there is no other way to solve the problem. Although I am of course very biased in all of this, I think the solution offered in this PR solves the logging problem and also manages to remove some complexity rather than adding more on top. If you still want to go in this direction however, perhaps creating a separate issue would be a good idea? But please consider these thoughts above! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw @tilgovi also singles out CLD
in his PR.
Whichever PR gets continued / merged, it may be worthwhile documenting this special treatment / aliasing of CLD
in the code, to ease future maintenance.
@sylt Python signal handler code does not get executed in the C signal handler context. I believe the interpreter just executes the signal handler at certain times before returning to the main execution of the script. I think it's perfectly safe to allocate memory and so forth. So while we don't have to worry about C-level concerns, we do still have to reason about the Python interpreter switching execution between the signal handler code and the main loop. This is why my PR really only suggests that the benefit is a lack of risk of concurrent modification of the workers list and some code simplification in the form of reaping only happening in one place. But I think our analysis is otherwise correct. |
@tilgovi Alright, thanks for the correction, that's a relief to hear it works in such a way! :) Well, then I guess commit 2 in this PR shouldn't be needed from a memory safety point of view, so perhaps I should throw it out. |
I found this PR after observing the following in the logs of a gunicorn docker container: this occurs when OOMKilled gets triggered by memory pressure I actually came here to make a PR to fix it so both CHLD and CLD get handled the same, but it looks like you beat me to it by a few days? |
misunderstood what was going on earlier -- it's looking for the wrong alias I think. just in case this on takes a while to get merged, I made this quick fix as an option: #3151 |
@benoitc I have made an alternative approach here with the following highlights:
Would such an approach seem more reasonable? |
11c6970
to
5848bc4
Compare
@sylt i will revisit this PR for 23.1 this month and come back to you. One thinng I would keep is a separate sleep function. I understand you timeout in the queue, but I think it's good to keep a way to wake up the master separately. One of the reason for it is to allow later a way to optimize the scheduling and extend its usage. I would like to be more near the system than the python stdlib there. |
... as opposed to in signal context. This is beneficial, since it means that we can, in a signal safe way, print messages about why e.g. a worker stopped its execution. And since handle_sigchld() logs what it does anyway, don't bother printing out that we're handling SIGCHLD. If workers are killed at rapid pace, we won't get as many SIGCHLD as there are workers killed anyway.
Since we can use something from queue.*, we can make it blocking as well, removing the need for two different data structures.
This change is meant to handle the return value of waitpid() in a way that is more in line with the man page of said syscall. The changes can be summarized as follows: * Use os.WIFEXITED and os.WIFSIGNALED to determine what caused waitpid() to return, and exactly how a worker may have exited. * In case of normal termination, use os.WEXITSTATUS() to read the exit status (instead of using a hand rolled bit shift). A redundant log was removed in this code path. * In case of termination by a signal, use os.WTERMSIG() to determine the signal which caused the worker to terminate. This was buggy before, since the WCOREFLAG (0x80) could cause e.g. a SIGSEGV (code 11) to be reported as "code 139", meaning "code (0x80 | 11)". * Since waitpid() isn't called with WSTOPPED nor WCONTINUED, there's no need to have any os.WIFSTOPPED or os.WIFCONTINUED handling.
According to the python signal documentation[1], SIGCHLD is handled differently from other signals. Specifically, if the underlying implementation resets the SIGCHLD signal handler, then python won't reinstall it (as it does for other signals). This behavior doesn't seem to exist for neither Linux nor Mac, but perhaps one could argue that it's good practise anyway. [1] https://docs.python.org/3/library/signal.html
* Look up handlers in __init__() to induce run-time error early on if something is wrong. * Since we now know that all handlers exist, we can simplify the main loop in arbiter, in such a way that we don't need to call wakeup(). So after this commit, the pipe in arbiter is only used to deliver which signal was sent.
It accepts an optional "due_to_signal" argument which can be used to tell if the wakeup was made because a signal handler needs to be executed or not.
@benoitc OK, thanks for giving it a look! Hmm, instead of adding an Arbiter:sleep() method, I added an Arbiter:wakeup() method, so that "fast" wakeups of the arbiter can be requested (instead of potentially having to wait 1 second before it runs again due to timeout). Would that solve the problem do you think? |
yes this can work. I will review the pr tomorrow as I plane to make a new release during the week |
I see some regression potential in the two most recent patches, unless further cleaned up for edge cases & readability - see review comments. That does however not stand in the way of merging everything up to 7ecea2d right now, last two could still be added later separately. |
@pajod I've reverted the latest two patches! I'm a little bit short on time ATM, but I'll try and answer your comments in near time. But order is at least restored :) |
As of writing, there are three commits in this series in the following order: