Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close inlined savers on exceptions in multiprocessing #390

Merged

Conversation

JelleAalbers
Copy link
Member

What is the problem / what does the code in this PR do
This is a fix for exception handling during multiprocessing. Previously, if strax crashed during multiprocessing, _temp directories could remain on disk, and exceptions were not always recorded in metadata. This should fix that. It's a low-priority PR: this won't prevent any crashes, just handle them slightly nicer. In other words, a perfect hobby project for a 'retired' colleague ;-).

The problem in more detail:

  • If an exception occurs, exceptions propagate through strax's mailbox system. Usually, all savers subscribe to some mailbox, so they will receive the exception, write it in their metadata, and close gracefully. (https://github.com/AxFoundation/strax/blob/master/strax/storage/common.py#L577)
  • During multiprocessing, ParallelSourcePlugin "inlines" some savers, so the save operation can happen in the same process that processes the data. Inlined savers are outside the mailbox system, and do not get informed of exceptions the usual way.

The PR looks bigger than it is; it changes many lines just to indent them. iter won't win any beauty contests with its high indentation level; if you see a good way to refactor it, go for it!

Can you briefly describe how it works?
Two changes were needed to fix this:

  • Change Plugin.iter so the cleanup code is always called: basically, we wrap everything in a big try-except loop. This also serves another purpose: replacing duplicated 'cleanup + return' calls with raising a custom exception.
  • Change Mailbox._send_from so that when sending to a mailbox fails -- most importantly because the mailbox is killed -- we throw the exception back to the sending iterable, i.e. the plugin or loader producing the input. Usually nothing will happen with this, and the exception is just reraised. However, it gives ParallelSourcePlugin a chance to call cleanup. This will wait for any outstanding save operations and close its savers (i.e. consolidate metadata, record the exception, and remove _temp). Finally, the exception will still be reraised, and strax continues to die as per usual.

Can you give a minimal working example (or illustrate with a figure)?

Try adding rechunk_on_save = False to Records in testutils.py; this will cause master to fail test_exception in test_core.

Copy link
Member

@JoranAngevaare JoranAngevaare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jelle! Many thanks, this will be useful especially for the event builders-processing!

We maybe should indeed at some point split the iter(), the try except would have not changed the git-history so much if we could simplify this block a bit.

@JoranAngevaare JoranAngevaare merged commit bf40a21 into AxFoundation:master Feb 2, 2021
@JelleAalbers JelleAalbers deleted the close_inlined_savers branch February 2, 2021 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants