-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11212][Core][Streaming]Make preferred locations support ExecutorCacheTaskLocation and update… #9181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… ReceiverTracker and ReceiverSchedulingPolicy to use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TaskLocation in the return type because the locations could be host from Receiver.preferredLocation, or ExecutorCacheTaskLocation.
|
I tested this patch using 5 workers, 24 executors, 24 receivers and there were no receiver restarting logs in the test. |
|
Test build #43982 has finished for PR 9181 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewor14 @kayousterhout
Can you take a look at this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary because previously, we didn't allow the user to pass in ExecutorCacheTaskLocations and we just tried to figure them out automatically? (and why doesn't that work here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR is to enable the streaming scheduler to place receivers (which run as tasks) in specific executors. Basically, I want to have more control on the placement of the receivers such that they are evenly distributed among the executors. We tried to do this without changing the core scheduling logic. But it does not allow specifying particular executor as preferred location, only at the host level. So if there are two executors in the same host, and I want two receivers to run on them (one on each executor), I cannot specify that. Current code only specifies the host as preference, which may end up launching both receivers on the same executor. We try to work around it but restarting a receiver when it does not launch in the desired executor and hope that next time it will be started in the right one. But that cause lots of restarts, and delays in correctly launching the receiver.
So this change, would allow the streaming scheduler to specify the exact executor as the preferred location. Also this is not exposed to the user, only the streaming scheduler uses this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good -- can you add this description to the JIRA and to the pull request description (so that it will be in the commit message)? Scheduler changes LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zsxwing Please do so. :)
|
retest this please |
|
Test build #44022 has finished for PR 9181 at commit
|
|
retest this please |
|
Test build #44028 has finished for PR 9181 at commit
|
|
retest this please |
|
Test build #44035 has finished for PR 9181 at commit
|
|
retest this please |
|
Test build #44048 has finished for PR 9181 at commit
|
|
Test build #44148 has finished for PR 9181 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: grammar. It will try to scheduler receiver such that they are evenly distributed
|
Overall, LGTM, except a few minor refactorings. |
|
@tdas addressed your comments.
For this one, I prefer to add the executor info in a separate PR and also add it to UI. |
Created https://issues.apache.org/jira/browse/SPARK-11333 to track it |
|
Test build #44390 has finished for PR 9181 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:shouldnt this map be in the line above? I think .map { loc => would fit.
|
Test build #44420 has finished for PR 9181 at commit
|
|
Merging this to master. Thanks @zsxwing |
… ReceiverTracker and ReceiverSchedulingPolicy to use it
This PR includes the following changes:
executor_<host>_<executorID>(e.g., "executor_localhost_2"), to support specifying the executor locations for RDD.ReceiverTrackerto optimize the starting time of Receivers when there are multiple executors in a host.The goal of this PR is to enable the streaming scheduler to place receivers (which run as tasks) in specific executors. Basically, I want to have more control on the placement of the receivers such that they are evenly distributed among the executors. We tried to do this without changing the core scheduling logic. But it does not allow specifying particular executor as preferred location, only at the host level. So if there are two executors in the same host, and I want two receivers to run on them (one on each executor), I cannot specify that. Current code only specifies the host as preference, which may end up launching both receivers on the same executor. We try to work around it but restarting a receiver when it does not launch in the desired executor and hope that next time it will be started in the right one. But that cause lots of restarts, and delays in correctly launching the receiver.
So this change, would allow the streaming scheduler to specify the exact executor as the preferred location. Also this is not exposed to the user, only the streaming scheduler uses this.