Fix potential core dump during worker process shutdown #1883

jixam · 2021-11-17T14:44:42Z

Summary

Remove redundant stop_gracefully after finish.

Motivation

When reaching max_accepts, IOLoop will

run a finish callback on the Prefork worker which will
send a finished=1 heartbeat on a socket which will
cause the Prefork manager to SIGQUIT the worker which will
have the worker call stop_gracefully on the IOLoop

This seems redundant (and slightly circular) as the IOLoop is already in the process of stopping. It is also a small bug because the worker process could have uninstalled its local $SIG{QUIT} handler between step 2 and step 3 and the signal will then cause a core dump (#1449).

I am a bit wary of suggesting this but I could not find anything else happening due to that finish callback so I propose to break the cycle already at the first step.

(Most of the PR is cleaning up the heartbeat protocol now that finished is no longer used.)

References

This fixes #1449 which is a rare but real problem.

kraih · 2021-11-17T15:17:26Z

I'm a big fan of simplifying code, but i'm also worried about losing important functionality. This PR needs to be double checked.

kraih · 2023-06-07T13:53:53Z

This PR needs to be rebased.

jixam · 2023-06-07T15:01:06Z

Thanks, I will just close it as nobody seems interested in reviewing.

kraih · 2023-06-07T15:07:39Z

That's the reason i was asking for it to be rebased, if tests don't work nobody will review. 🙄

jixam · 2023-06-08T08:04:50Z

All tests pass and there are no conflicts so I just reopened 🤷

kraih · 2023-06-08T10:37:50Z

The PR would leave the documentation incorrect at least, since the worker does not go into graceful shutdown mode anymore.

kraih · 2023-06-08T11:45:47Z

So, since the question has come up on IRC: The upside of this change is that it eliminates a possible signal handler race condition during worker shutdown that kills the worker in an unfortunate way. The downside is that it leaves the possibility of a worker getting stuck not handling any new requests forever, without the manager process ever knowing, if the app developer made a small mistake that keeps the event loop running.

Actually not an easy decision to make.

brsakai-csco · 2023-06-08T13:57:40Z

Looks like I'm new to this discussion, but what's the rationale for breaking the cycle at (1) instead of at (3)? That's the method we ended up with in #2046. Seems like it would be useful to notify the managing thread about attempted graceful shutdown, so it can still timeout and SIGKILL us if we get stuck.

rabbiveesh · 2023-06-08T14:15:02Z

It sounds to me like we wouldn't want this change if it can end up with impossible-to-debug zombie processes.
As an app developer, I want to be able to assume that when I shutdown my app, it will shutdown in all instances

jixam · 2023-06-09T03:55:04Z

I didn't consider that a manual SIGQUIT is on the same code path so I agree that this PR is a no-go as is. However, I am unsure how to change it.

Please consider this skeleton app:

perl -Mojo -E 'get "/" => sub ($c) { $c->inactivity_timeout(0); $c->render_later; }; app->start' prefork -a 100

It seems wrong to close all requests here just because max_accepts is reached. I think that is a current bug?

But for SIGQUIT, yes – the timeout should obviously apply.

jixam · 2023-06-09T12:37:19Z

I am not going to update this PR as I think the fix by @brsakai-csco is better and my concern above is a separate issue.

jixam · 2023-06-09T13:49:21Z

Closing in favor of #2073.

Remove redundant stop_gracefully on finish

d85e4b2

jixam force-pushed the remove-grace-on-max_accept branch from 452d039 to d85e4b2 Compare November 17, 2021 15:23

jixam mentioned this pull request Mar 2, 2022

Hypnotoad / Prefork dump cores on small values of accepts and disabled keep-alive #1449

Closed

jixam mentioned this pull request Jun 7, 2023

Manager can send SIGQUIT to worker while exiting, causing coredump #2046

Open

jixam closed this Jun 7, 2023

jixam reopened this Jun 8, 2023

kraih requested review from a team, marcusramberg, jberger and Grinnz June 8, 2023 09:18

brsakai-csco mentioned this pull request Jun 9, 2023

Do not send redundant SIGQUITs #2073

Merged

jixam closed this Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix potential core dump during worker process shutdown #1883

Fix potential core dump during worker process shutdown #1883

jixam commented Nov 17, 2021

kraih commented Nov 17, 2021

kraih commented Jun 7, 2023

jixam commented Jun 7, 2023

kraih commented Jun 7, 2023

jixam commented Jun 8, 2023

kraih commented Jun 8, 2023

kraih commented Jun 8, 2023

brsakai-csco commented Jun 8, 2023

rabbiveesh commented Jun 8, 2023

jixam commented Jun 9, 2023

jixam commented Jun 9, 2023

jixam commented Jun 9, 2023

Fix potential core dump during worker process shutdown #1883

Fix potential core dump during worker process shutdown #1883

Conversation

jixam commented Nov 17, 2021

Summary

Motivation

References

kraih commented Nov 17, 2021

kraih commented Jun 7, 2023

jixam commented Jun 7, 2023

kraih commented Jun 7, 2023

jixam commented Jun 8, 2023

kraih commented Jun 8, 2023

kraih commented Jun 8, 2023

brsakai-csco commented Jun 8, 2023

rabbiveesh commented Jun 8, 2023

jixam commented Jun 9, 2023

jixam commented Jun 9, 2023

jixam commented Jun 9, 2023