Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Oct 20, 2015

… ReceiverTracker and ReceiverSchedulingPolicy to use it

This PR includes the following changes:

  1. Add a new preferred location format, executor_<host>_<executorID> (e.g., "executor_localhost_2"), to support specifying the executor locations for RDD.
  2. Use the new preferred location format in ReceiverTracker to optimize the starting time of Receivers when there are multiple executors in a host.

The goal of this PR is to enable the streaming scheduler to place receivers (which run as tasks) in specific executors. Basically, I want to have more control on the placement of the receivers such that they are evenly distributed among the executors. We tried to do this without changing the core scheduling logic. But it does not allow specifying particular executor as preferred location, only at the host level. So if there are two executors in the same host, and I want two receivers to run on them (one on each executor), I cannot specify that. Current code only specifies the host as preference, which may end up launching both receivers on the same executor. We try to work around it but restarting a receiver when it does not launch in the desired executor and hope that next time it will be started in the right one. But that cause lots of restarts, and delays in correctly launching the receiver.

So this change, would allow the streaming scheduler to specify the exact executor as the preferred location. Also this is not exposed to the user, only the streaming scheduler uses this.

… ReceiverTracker and ReceiverSchedulingPolicy to use it
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use TaskLocation in the return type because the locations could be host from Receiver.preferredLocation, or ExecutorCacheTaskLocation.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 20, 2015

I tested this patch using 5 workers, 24 executors, 24 receivers and there were no receiver restarting logs in the test.

@SparkQA
Copy link

SparkQA commented Oct 20, 2015

Test build #43982 has finished for PR 9181 at commit f5e7c4f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewor14 @kayousterhout
Can you take a look at this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary because previously, we didn't allow the user to pass in ExecutorCacheTaskLocations and we just tried to figure them out automatically? (and why doesn't that work here?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to enable the streaming scheduler to place receivers (which run as tasks) in specific executors. Basically, I want to have more control on the placement of the receivers such that they are evenly distributed among the executors. We tried to do this without changing the core scheduling logic. But it does not allow specifying particular executor as preferred location, only at the host level. So if there are two executors in the same host, and I want two receivers to run on them (one on each executor), I cannot specify that. Current code only specifies the host as preference, which may end up launching both receivers on the same executor. We try to work around it but restarting a receiver when it does not launch in the desired executor and hope that next time it will be started in the right one. But that cause lots of restarts, and delays in correctly launching the receiver.

So this change, would allow the streaming scheduler to specify the exact executor as the preferred location. Also this is not exposed to the user, only the streaming scheduler uses this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good -- can you add this description to the JIRA and to the pull request description (so that it will be in the commit message)? Scheduler changes LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zsxwing Please do so. :)

@zsxwing
Copy link
Member Author

zsxwing commented Oct 21, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44022 has finished for PR 9181 at commit f5e7c4f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 21, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44028 has finished for PR 9181 at commit f5e7c4f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 21, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44035 has finished for PR 9181 at commit f5e7c4f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 21, 2015

@zsxwing I think this duplicated the existing issue / PR #9096 . Does this cover the same logic, and, can you include all of the changes here?

@zsxwing
Copy link
Member Author

zsxwing commented Oct 21, 2015

retest this please

@zsxwing
Copy link
Member Author

zsxwing commented Oct 21, 2015

@zsxwing I think this duplicated the existing issue / PR #9096 . Does this cover the same logic, and, can you include all of the changes here?

This is a different issue that only about ExecutorCacheTaskLocation. #9096 is a minor patch. I can update this one after #9096 gets merged.

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44048 has finished for PR 9181 at commit f5e7c4f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 22, 2015

@zsxwing I merged #9096

@SparkQA
Copy link

SparkQA commented Oct 22, 2015

Test build #44148 has finished for PR 9181 at commit 35f7936.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: grammar. It will try to scheduler receiver such that they are evenly distributed

@tdas
Copy link
Contributor

tdas commented Oct 26, 2015

Overall, LGTM, except a few minor refactorings.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 27, 2015

@tdas addressed your comments.

Its probably a good idea to also add executor information, not just the host.

For this one, I prefer to add the executor info in a separate PR and also add it to UI.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 27, 2015

For this one, I prefer to add the executor info in a separate PR and also add it to UI.

Created https://issues.apache.org/jira/browse/SPARK-11333 to track it

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #44390 has finished for PR 9181 at commit 63c66eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:shouldnt this map be in the line above? I think .map { loc => would fit.

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #44420 has finished for PR 9181 at commit 46c5197.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Oct 27, 2015

Merging this to master. Thanks @zsxwing

@asfgit asfgit closed this in 9fbd75a Oct 27, 2015
@zsxwing zsxwing deleted the executor-location branch October 28, 2015 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants