Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_result_handler dies on raised exceptions [multiprocessing] #83371

Open
sindrig mannequin opened this issue Jan 2, 2020 · 6 comments
Open

_result_handler dies on raised exceptions [multiprocessing] #83371

sindrig mannequin opened this issue Jan 2, 2020 · 6 comments
Labels
stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error

Comments

@sindrig
Copy link
Mannequin

sindrig mannequin commented Jan 2, 2020

BPO 39190
Nosy @sindrig
PRs
  • bpo-39190: Fix deadlock when callback raises #17795
  • Files
  • test_pool_error.py: Steps to reproduce
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-01-02.12:26:53.919>
    labels = ['3.10', 'library', '3.9', 'type-crash', '3.11']
    title = '_result_handler dies on raised exceptions [multiprocessing]'
    updated_at = <Date 2021-09-19.22:01:28.549>
    user = 'https://github.com/sindrig'

    bugs.python.org fields:

    activity = <Date 2021-09-19.22:01:28.549>
    actor = 'iritkatriel'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2020-01-02.12:26:53.919>
    creator = 'sindrig'
    dependencies = []
    files = ['48818']
    hgrepos = []
    issue_num = 39190
    keywords = ['patch']
    message_count = 1.0
    messages = ['359194']
    nosy_count = 1.0
    nosy_names = ['sindrig']
    pr_nums = ['17795']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue39190'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    Linked PRs

    @sindrig
    Copy link
    Mannequin Author

    sindrig mannequin commented Jan 2, 2020

    Raising an Exception in a callback handler of apply_async and/or map_async will kill the _result_handler thread. This causes unexpected behavior as all subsequent callbacks won't be called and worse, pool.join will deadlock.

    The documentation states that callbacks should return immediately but it does not warn the user against raising an exception.

    Attached are steps to reproduce.

    @sindrig sindrig mannequin added type-bug An unexpected behavior, bug, or error 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump and removed type-bug An unexpected behavior, bug, or error labels Jan 2, 2020
    @iritkatriel iritkatriel added 3.10 only security fixes 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 (EOL) end of life labels Sep 19, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    duaneg added a commit to duaneg/cpython that referenced this issue Mar 27, 2025
    …cess pools
    
    User-supplied callbacks are called from an internal pool management thread. At
    present any exceptions they raise are not caught and so propagate out and kill
    the thread. This then causes problems for subsequent pool operations, including
    joining the pool hanging.
    
    As a QoL improvement, catch and handle any such exceptions using the system
    exception hook. Thus by default details of the exception will be printed to
    stderr, but the pool's integrity will remain intact.
    @picnixz picnixz removed 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes labels Mar 28, 2025
    @graingert
    Copy link
    Contributor

    could we instead raise the exception out of AsyncResult.get ?

    @duaneg
    Copy link

    duaneg commented Mar 30, 2025

    could we instead raise the exception out of AsyncResult.get ?

    I don't think that would be appropriate: a result has been successfully computed with no exception, IMO it should be returned from AysncResult.get even if a callback fails. If callbacks were designed to be able to modify the computed result it might be different.

    Note that there could be multiple callbacks failing, which means multiple exceptions. They can also be raised by error callbacks, in which case we would have the exception that occurred during the call, and the additional one(s) from the callback(s). Using an exception group would be problematic in that existing code handling an individual exception from get would break.

    Whoops, I was totally mistaken about there being multiple callbacks, sorry! However I still don't think a callback failing should change the computed result.

    @graingert
    Copy link
    Contributor

    graingert commented Mar 31, 2025

    it doesn't change the computed result, only the exception.

    because this bug totally hoses the multiprocessing Pool were sort of free to change the behaviour however we want. It doesn't matter if code could break it's already broken! So we could raise an ExceptionGroup out of AsyncResult.get()

    or perhaps we could mark the Pool as broken so all methods and AsyncResult.get() raise BrokenPoolException(pool_id), then in the __exit__ of the Pool we can detect the BrokenPoolException and raise the exception from the callback.

    @duaneg
    Copy link

    duaneg commented Apr 1, 2025

    it doesn't change the computed result, only the exception.

    Sure, but from the point of view of the calling code it is the same thing. They call get and get an exception instead of the correct value: a different result. However, I do take your point: the calling code should know something is broken, so we should raise. Calling the default hook will probably mean the error is logged at best, which is too easy to overlook.

    because this bug totally hoses the multiprocessing Pool were sort of free to change the behaviour however we want. It doesn't matter if code could break it's already broken! So we could raise an ExceptionGroup out of AsyncResult.get()

    Right. I've never used exception groups but AIUI starting to throw them from a method that previously didn't is a breaking API change. However, it doesn't really matter in this case, since if it happens the program would have been badly broken previously anyway. Is that correct?

    So, the semantics of get would be:

    • No change if no callback raises (i.e. it will return the call's value or raise a naked exception as appropriate)
    • If callback raises, get will raise an exception group containing the callback exception
    • If error_callback raises, get will raise an exception group containing the callback exception and the original exception

    Existing exception handling code will not catch these new exception groups, which is good: we wouldn't want them to be treated as "normal" failures.

    or perhaps we could mark the Pool as broken so all methods and AsyncResult.get() raise BrokenPoolException(pool_id), then in the __exit__ of the Pool we can detect the BrokenPoolException and raise the exception from the callback.

    I like the sound of this, but it does get a bit complicated.

    If we are __exit__ing because an exception was raised and a callback has failed, we presumably want to include the exception that was raised in the exception group, otherwise it would be lost. However, it could be the exception group itself being raised from get, in which case we probably don't want it in the group twice. Right?

    E.g. consider exiting with a "normal" error (note if we didn't do the get the context would probably exit before the task can run, meaning the naked ValueError would propagate out):

    with Pool() as pool:
        res = pool.async_apply(foo, callback=raising)
    
        # Ensure the task has run and set the error, but don't let it propagate out
        try:
            res.get()
        except CallbackError:
            pass
    
        raise ValueError("error")

    ...and exiting via callback error:

    with Pool() as pool:
        pool.apply_async(foo, callback=raising).get()

    We can't avoid the duplication by checking if it is already in the list of sub-exceptions: that fails for managers which use proxies and unpickles the same exception group into two separate instances. Instead I've skipped adding the exception if it is a CallbackError. In theory a different one could have been raised and hence will be lost, e.g. if the code does a get on an async result from a different pool, but that seems pathological enough that we can probably ignore it.

    I have thought I had a working implementation of this, with unit tests for all the variations of the new behaviour, but it is significantly more complex than the original fix. I'll push it out so you can have a look and see what you think. If we go with this approach I'll also need to update the API documentation and the NEWS entry as well, but I'll wait until I hear your thoughts before proceeding with that.

    Edited to add: looks like it is not working properly after all, I will continue to debug tomorrow, but let me know if you have any comments on the approach in general in the meantime.

    @picnixz picnixz added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Apr 1, 2025
    @picnixz
    Copy link
    Member

    picnixz commented Apr 1, 2025

    (the interpreter doesn't crash per se, just that the interpreter hangs; but no segmentation fault occurs, which is why I changed the labels)

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    5 participants