Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32738][CORE][3.0] Should reduce the number of active threads if fatal error happens in Inbox.process #29763

Closed
wants to merge 1 commit into from

Commits on Sep 15, 2020

  1. [SPARK-32738][CORE] Should reduce the number of active threads if fat…

    …al error happens in `Inbox.process`
    
    ### What changes were proposed in this pull request?
    
    Processing for `ThreadSafeRpcEndpoint` is controlled by  `numActiveThreads` in `Inbox`. Now if any fatal error happens during `Inbox.process`, `numActiveThreads` is not reduced. Then other threads can not process messages in that inbox, which causes the endpoint to "hang". For other type of endpoints, we also should keep  `numActiveThreads` correct.
    
    This problem is more serious in previous Spark 2.x versions since the driver, executor and block manager endpoints are all thread safe endpoints.
    
    To fix this, we should reduce the number of active threads if fatal error happens in `Inbox.process`.
    
    ### Why are the changes needed?
    
    `numActiveThreads` is not correct when fatal error happens and will cause the described problem.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Add a new test.
    
    Closes apache#29580 from wzhfy/deal_with_fatal_error.
    
    Authored-by: Zhenhua Wang <wzh_zju@163.com>
    Signed-off-by: Wenchen Fan <wenchen@databricks.com>
    wzhfy committed Sep 15, 2020
    Configuration menu
    Copy the full SHA
    589061c View commit details
    Browse the repository at this point in the history