-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15606][core] Use non-blocking removeExecutor call to avoid deadlocks #13355
Conversation
Although this patch resolves this particular issue I would echo the comment in #11728 by @zsxwing {quote} |
Jenkins add to whitelist |
Jenkins test this please |
Test build #59494 has finished for PR 13355 at commit
|
This configuration is for people setting a custom thread number for their special environments. I don't want to keep increasing this number when someone complains the default thread number is too small. Instead of this fix, I prefer to fix the real issue you mentioned in JIRA. You can just make |
agreed. I'll take a look. |
@zsxwing Do you mean change BlockManagerMaster.removeExecutor to send the message using send (fire and forget) rather than askWithRetry? |
Yes. You probably need to add a new |
OK, that's what I tried but it threw up some errors in some other tests which I'm investigating. |
Ah, it should be |
reverted original fix and replaced with using non-blocking call in BlockManagerMaster.removeExecutor. Also added a new test suite to run Distributed suite forcing the number of dispatcher threads to 2. This suite will fail without the fix. |
Test build #59686 has finished for PR 13355 at commit
|
@@ -38,7 +38,8 @@ class BlockManagerMaster( | |||
|
|||
/** Remove a dead executor from the driver endpoint. This is only called on the driver side. */ | |||
def removeExecutor(execId: String) { | |||
tell(RemoveExecutor(execId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used by other places. It's better to add a new method instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so I've added a new removeExecutorAsync method to minimise side effects
Test build #59726 has finished for PR 13355 at commit
|
Test build #59728 has finished for PR 13355 at commit
|
@@ -0,0 +1,42 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove this file? This is one of the slowest tests. It's not worth to run it for this issue.
@robbinspg LGTM after you remove |
Test suite removed |
Test build #59762 has finished for PR 13355 at commit
|
@zsxwing ok to merge now? |
LGTM. Thanks, merging to master and 2.0 |
…dlocks ## What changes were proposed in this pull request? Set minimum number of dispatcher threads to 3 to avoid deadlocks on machines with only 2 cores ## How was this patch tested? Spark test builds Author: Pete Robbins <robbinspg@gmail.com> Closes #13355 from robbinspg/SPARK-13906. (cherry picked from commit 7c07d17) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
…dlocks ## What changes were proposed in this pull request? Set minimum number of dispatcher threads to 3 to avoid deadlocks on machines with only 2 cores ## How was this patch tested? Spark test builds Author: Pete Robbins <robbinspg@gmail.com> Closes #13355 from robbinspg/SPARK-13906.
…dlocks Set minimum number of dispatcher threads to 3 to avoid deadlocks on machines with only 2 cores Spark test builds Author: Pete Robbins <robbinspg@gmail.com> Closes apache#13355 from robbinspg/SPARK-13906. (cherry picked from commit d98fb19)
What changes were proposed in this pull request?
Set minimum number of dispatcher threads to 3 to avoid deadlocks on machines with only 2 cores
How was this patch tested?
Spark test builds