Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover from broken process pool #42

Merged
merged 4 commits into from
Apr 30, 2022
Merged

Recover from broken process pool #42

merged 4 commits into from
Apr 30, 2022

Conversation

tswayne
Copy link
Contributor

@tswayne tswayne commented Apr 15, 2022

We hit a pretty unfortunate edge case in this library where one broken process will corrupt the entire pool. When that happens, the entire process pool (self._executor) is broken and the worker does not recover, but it also does not terminate. The result is that it continues to pick up new jobs and send them to the process pool to execute and they immediately throw a BrokenExecutor exception. For us, this resulted in the corrupt worker essentially draining our queue and funneling them directly to the dead queue.

It does look like there is an older attempt to handle this here, however it is unreachable because this line catches all exceptions first.

The fix in this pr, which we've been using for a while now, recovers from this state by throwing the broken process pool away and letting the next tick re-create a fresh one. This has been working really well, since broken processes are very rare (for us), and all the initial jobs impacted by the broken process are retried.

This PR also includes a small useful change to enable the proto client to accept and pass all the kwargs that the connection accepts along. We use that client directly in a few places and need to forward some options to the connection.

@tswayne
Copy link
Contributor Author

tswayne commented Apr 25, 2022

Hey @cdrx - just want to bump this to make sure it's on your radar.

@cdrx cdrx merged commit d83bb6a into cdrx:master Apr 30, 2022
@cdrx
Copy link
Owner

cdrx commented Apr 30, 2022

Good catch, thank you for opening this PR.

@cdrx
Copy link
Owner

cdrx commented Apr 30, 2022

This has been published to PyPi, in version 1.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants