Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not send redundant SIGQUITs #2073

Merged
merged 1 commit into from Jun 9, 2023
Merged

Conversation

brsakai-csco
Copy link
Contributor

If the worker reports itself as shutting down, do not second another SIGQUIT to the worker. It already knows it is shutting down, and this can cause a race with signal handlers that triggers coredumps.

Summary

Full writeup in #2046
The short version:

  • When a worker receives a SIGQUIT, it begins to gracefully shut down
  • During graceful shutdown, the worker notifies the manager thread of its shutdown
  • If the manager sent the initial SIGQUIT, then it already knows the worker is shutting down and takes no action
  • If the manager did not send the initial SIGQUIT, then it marks the worker for graceful shutdown, causing it to send a second SIGQUIT
  • This second SIGQUIT can arrive once Perl has already de-registered the QUIT signal handler, which causes it to perform the default QUIT action (coredump)

The patch here increments the worker's quit state in the managing thread so that it does not send a second SIGQUIT to the worker if the manager finds out about the graceful shutdown from the worker

Motivation

This behavior was causing coredumps in our application, since we were using SIGQUIT to gracefully end workers.
I understand from #1883/#1449 that there are also other ways for the worker to initiate graceful shutdown, which can trigger the same race condition

References

Tested via the test app uploaded to #2046
Attempts to fix the same issue seen in #1883

lib/Mojo/Server/Prefork.pm Outdated Show resolved Hide resolved
@kraih kraih requested review from a team, kraih, jhthorsen and christopherraa June 9, 2023 13:22
@marcusramberg
Copy link
Member

I like this fix better, except for the if duplication/perltidy.

@kraih
Copy link
Member

kraih commented Jun 9, 2023

Please squash your commits.

@brsakai-csco
Copy link
Contributor Author

@kraih Done, I think. Let me know if that doesn't look how you expect

@kraih
Copy link
Member

kraih commented Jun 9, 2023

Is do not second another SIGQUIT to the worker (from the commit message) correct english?

If the worker reports itself as shutting down, do not send a SIGQUIT to the worker. It already knows it is shutting down, and this can cause a race with signal handlers that can trigger core dumps.
@brsakai-csco
Copy link
Contributor Author

I didn't even see that. Must've switched the words in my head without realizing. Something like do not send a second > do not send another > do not second another.

Updated the commit message 👍

@jixam
Copy link
Contributor

jixam commented Jun 9, 2023

I like this 👍

For completeness, just a note that this does not remove the race condition. It is still possible to hit it by sending two SIGQUIT signals close to each other or by sending a single signal when a worker is already shutting down.

To fully fix the race, $SIG{QUIT} = 'IGNORE'; is needed before exit 0; in the worker. However, this is not trivial to do because the handler is localized.

@mergify mergify bot merged commit 14d2875 into mojolicious:main Jun 9, 2023
10 checks passed
@brsakai-csco brsakai-csco deleted the patch-1 branch June 13, 2023 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants