-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arbiter: don't log if handling SIGCHLD #3064
base: master
Are you sure you want to change the base?
Conversation
Logging when handling a signal is a bad practice See https://docs.python.org/3/library/logging.html#thread-safety Fixes benoitc#2816
Thanks but this is done on purpose like the comment says. It's not common. I prefer to keep it like this. This place is threadsafe also. |
This code should be threadsafe. I suspect the error is happening before. We need first to understand why this code is re-entrant here. Let keep it open since it is actually solving the cause. But I woud insist that we need to understand why this comes now. |
linking to #3050 I think we should revisit this change that introducde more logging. @hydrargyrum if you happen to be around i'm on irc. |
Wanted to shine a little bit of info I found since I have encountered this bug as well. The issue isn't with thread-safety but rather with re-entrancy within the logging module and the underlying The issue occurs (usually on a slow disk) when the SIGCHLD handler interrupts the main process while it was in the process of writing its logging buffer. The handler (which is the same thread) will try to write to that buffer but will be unable because it doesn't allow reentrant code, it will then raise an exception. If logger is set to write It is a hard bug to reproduce, because the main thread has to interrupt itself exactly when it was writing the buffer. Ref: |
@ja0x is correct. Python programs MUST NOT print or log anything inside a signal handler. https://stackoverflow.com/q/45680378 is another link that explains the root cause of this problem. This gunicorn bug has a long and sad history. Dec 2017: Issue #1663 was reported: In some cases (for example OOM) workers die without any error logged, or even crashloop. If I may be frank, I feel very disappointed and frustrated by the gunicorn project. It is a cruel irony that my PR #2908 which attempts to handle this correctly was ignored, but PRs #2315 and #2419 were merged, causing a regression. But I'll try to swallow my bitter feelings and offer constructive advice. My advice to @benoitc is to do this:
Doing that fixes #2564 properly (at least I think so), fixes #3050, doesn't regress #2215, and removes the need for PR #3064 (this PR). Hope that helps. |
This is allextremely helpful and politely offered advice. I'll try to find time to push this along. I appreciate all your effort here and the exception patience with helping to sort this out. Most of the maintainers are volunteers and we don't always find the sustained time to reconstruct all of this context, so gathering it in one place is immensely helpful. |
@TomiBelan Thank you for the write-up. Since my failed approach three years ago I am silently hoping to get the issue fixed for good :) |
I've rebased #2314, which moves reaping to the main thread. That should move the logging out of the signal handler. Any reviews or testing of that would be appreciated. I've assigned myself the related issues to follow up on any other changes afterwards. |
Also bitten by this. I debugged and googled my way to this PR – I have nothing to add to @ja0x and @TomiBelan excellent analysis, so just a +1. I experience the same issue with reentrant logging inside To make matters worse, in my case gunicorn is auto-launched and monitored from supervisord, running a Flask app. After the unhandled exception due to
I.e. the server is bricked after the On a positive note, I've been using gunicorn for years and this is the first major issue. Kudos and huge thanks to the maintainers! |
Without really checking what PRs were opened already, I posted a PR yesterday (#3148) with an attempt at solving various signal handling issues in gunicorn I had the misfortune to encounter. I couldn't see that any of the open proposals had used the same approach to solve the problem, so I thought people could have a look before I eventually close it. |
@TomiBelan, thanks for the excellent summary. I just started digging myself when I discovered it. In my case it's gunicorn (sometimes) dying on To maintainers, please note the seriousness of this bug, even if it's intermittent. And the fact it was re-introduced in the same version in which it was fixed (21.0.0) makes it open for 3+ years. 😞 |
Logging when handling a signal is a bad practice
See https://docs.python.org/3/library/logging.html#thread-safety
Fixes #2816