[SPARK-2774] - Set preferred locations for reduce tasks #1697

shivaram · 2014-07-31T18:39:25Z

Motivation for the change is in JIRA. There are a couple of things that I would like feedback about

Should we sort the map outputs by size for every task -- This could be expensive if we have a large number of map outputs.
The number of preferred locations to use. Technically we could set this to a larger number, but I am not sure how it will affect the locality / delay scheduling in TaskSetManager.

…-locality

SparkQA · 2014-07-31T18:44:00Z

QA tests have started for PR 1697. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17592/consoleFull

rxin · 2014-07-31T23:10:45Z

core/src/main/scala/org/apache/spark/MapOutputTracker.scala

@@ -284,6 +290,24 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf)
    cachedSerializedStatuses.contains(shuffleId) || mapStatuses.contains(shuffleId)
  }

+  // Return the list of locations and blockSizes for each reducer.
+  def getStatusByReducer(shuffleId: Int): Option[Map[Int, Array[(BlockManagerId, Long)]]] = {


comment on the thread safety

also comment on the semantics of the return value (what does the Int mean - what does the index in the array mean, etc)

Added comments -- This method is not thread safe as TimestampedHashMap is not thread safe. However we only call this from DAGScheduler which is single threaded AFAIK

rxin · 2014-07-31T23:17:35Z

I have some concern (maybe unfounded) about runtime. If we have 50k map tasks and 10k reduce tasks, this would reduce doing 10k sort, each on 50k items right?

shivaram · 2014-07-31T23:26:27Z

Thanks for taking a look -- One thing I realized is that we only need top-5 and don't need to sort the data. I'll try to use the Guava Ordering class and do some benchmarks

Also add a unit test for ordering and address comments

shivaram · 2014-08-01T03:11:13Z

I switched to using Guava's ordering function now and added another unit test for that. I plan to do a microbenchmark to see how long it takes to get top 5 from a list of Longs.

@kayousterhout -- Is there a way to benchmark the scheduler to see if a change introduces any performance regressions ?

SparkQA · 2014-08-01T03:14:23Z

QA tests have started for PR 1697. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17631/consoleFull

SparkQA · 2014-08-01T04:05:33Z

QA results for PR 1697:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17631/consoleFull

shivaram · 2014-08-01T20:19:21Z

I ran some microbenchmarks as outlined at https://gist.github.com/shivaram/63620c47f0ad50106e0a
The comments below the gist have some numbers that I got on my laptop.

Overall I think we should just use a upper bound on the number of map tasks and not return any preferred locations if we have more than say 1000 map tasks. There might be some more optimization we can do in terms of filtering out zeros etc. but a simple heuristic might be a good and safe start for now.

@rxin Thoughts ?

shivaram · 2014-08-01T20:25:48Z

One more thing we can do is to coalesce sizes from all tasks on a machine and only do node-level locality. As map outputs are on disk there shouldn't be any difference for node vs. process level locality ?

…-locality Conflicts: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

SparkQA · 2014-08-01T23:29:34Z

QA tests have started for PR 1697. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17715/consoleFull

SparkQA · 2014-08-02T00:33:09Z

QA results for PR 1697:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17715/consoleFull

shivaram · 2014-10-27T16:40:40Z

Ping @rxin -- Any thoughts on this ? I can merge to upstream and it'll be great to have this in 1.2

rxin · 2014-11-02T22:03:09Z

Can we bring this up to date, and:

Add a switch to turn it on / off
Add a config option to disable this automatically when num reduce / map tasks is greater than a certain threshold?

JoshRosen · 2014-12-23T23:45:13Z

I agree with @rxin; I'd be totally fine with including this an an experimental feature, perhaps opt-in while we test it (like we did with sort-based shuffle).

shivaram · 2014-12-24T00:58:25Z

Sure. I'll bring this up to date, put it behind a config flag this week and ping the PR.

pwendell · 2015-01-19T09:52:19Z

Let's close this issue pending an update from @shivaram (just doing some JIRA clean-up).

This is another attempt at apache#1697 addressing some of the earlier concerns. This adds a couple of thresholds based on number map and reduce tasks beyond which we don't use preferred locations for reduce tasks. This patch also fixes some bugs in DAGSchedulerSuite where the MapStatus objects created didn't have the right number of reducers set.

Set preferred locations for reduce tasks. The basic design is that we maintain a map from reducerId to a list of (sizes, locations) for each shuffle. We then set the preferred locations to be any machines that have 20% of more of the output that needs to be read by the reduce task. This will result in at most 5 preferred locations for each reduce task. Selecting the preferred locations involves O(# map tasks * # reduce tasks) computation, so we restrict this feature to cases where we have fewer than 1000 map tasks and 1000 reduce tasks. Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6652 from shivaram/reduce-locations and squashes the following commits: 492e25e [Shivaram Venkataraman] Remove unused import 2ef2d39 [Shivaram Venkataraman] Address code review comments 897a914 [Shivaram Venkataraman] Remove unused hash map f5be578 [Shivaram Venkataraman] Use fraction of map outputs to determine locations Also removes caching of preferred locations to make the API cleaner 68bc29e [Shivaram Venkataraman] Fix line length 1090b58 [Shivaram Venkataraman] Change flag name 77ce7d8 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations e5d56bd [Shivaram Venkataraman] Add flag to turn off locality for shuffle deps 6cfae98 [Shivaram Venkataraman] Filter out zero blocks, rename variables 9d5831a [Shivaram Venkataraman] Address some more comments 8e31266 [Shivaram Venkataraman] Fix style 0df3180 [Shivaram Venkataraman] Address code review comments e7d5449 [Shivaram Venkataraman] Fix merge issues ad7cb53 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations df14cee [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 5093aea [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 0171d3c [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations bc4dfd6 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 774751b [Shivaram Venkataraman] Fix bug introduced by line length adjustment 34d0283 [Shivaram Venkataraman] Fix style issues 3b464b7 [Shivaram Venkataraman] Set preferred locations for reduce tasks This is another attempt at #1697 addressing some of the earlier concerns. This adds a couple of thresholds based on number map and reduce tasks beyond which we don't use preferred locations for reduce tasks.

Set preferred locations for reduce tasks. The basic design is that we maintain a map from reducerId to a list of (sizes, locations) for each shuffle. We then set the preferred locations to be any machines that have 20% of more of the output that needs to be read by the reduce task. This will result in at most 5 preferred locations for each reduce task. Selecting the preferred locations involves O(# map tasks * # reduce tasks) computation, so we restrict this feature to cases where we have fewer than 1000 map tasks and 1000 reduce tasks. Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes apache#6652 from shivaram/reduce-locations and squashes the following commits: 492e25e [Shivaram Venkataraman] Remove unused import 2ef2d39 [Shivaram Venkataraman] Address code review comments 897a914 [Shivaram Venkataraman] Remove unused hash map f5be578 [Shivaram Venkataraman] Use fraction of map outputs to determine locations Also removes caching of preferred locations to make the API cleaner 68bc29e [Shivaram Venkataraman] Fix line length 1090b58 [Shivaram Venkataraman] Change flag name 77ce7d8 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations e5d56bd [Shivaram Venkataraman] Add flag to turn off locality for shuffle deps 6cfae98 [Shivaram Venkataraman] Filter out zero blocks, rename variables 9d5831a [Shivaram Venkataraman] Address some more comments 8e31266 [Shivaram Venkataraman] Fix style 0df3180 [Shivaram Venkataraman] Address code review comments e7d5449 [Shivaram Venkataraman] Fix merge issues ad7cb53 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations df14cee [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 5093aea [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 0171d3c [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations bc4dfd6 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 774751b [Shivaram Venkataraman] Fix bug introduced by line length adjustment 34d0283 [Shivaram Venkataraman] Fix style issues 3b464b7 [Shivaram Venkataraman] Set preferred locations for reduce tasks This is another attempt at apache#1697 addressing some of the earlier concerns. This adds a couple of thresholds based on number map and reduce tasks beyond which we don't use preferred locations for reduce tasks.

Set preferred locations for reduce tasks. The basic design is that we maintain a map from reducerId to a list of (sizes, locations) for each shuffle. We then set the preferred locations to be any machines that have 20% of more of the output that needs to be read by the reduce task. This will result in at most 5 preferred locations for each reduce task. Selecting the preferred locations involves O(# map tasks * # reduce tasks) computation, so we restrict this feature to cases where we have fewer than 1000 map tasks and 1000 reduce tasks. Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes apache#6652 from shivaram/reduce-locations and squashes the following commits: 492e25e [Shivaram Venkataraman] Remove unused import 2ef2d39 [Shivaram Venkataraman] Address code review comments 897a914 [Shivaram Venkataraman] Remove unused hash map f5be578 [Shivaram Venkataraman] Use fraction of map outputs to determine locations Also removes caching of preferred locations to make the API cleaner 68bc29e [Shivaram Venkataraman] Fix line length 1090b58 [Shivaram Venkataraman] Change flag name 77ce7d8 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations e5d56bd [Shivaram Venkataraman] Add flag to turn off locality for shuffle deps 6cfae98 [Shivaram Venkataraman] Filter out zero blocks, rename variables 9d5831a [Shivaram Venkataraman] Address some more comments 8e31266 [Shivaram Venkataraman] Fix style 0df3180 [Shivaram Venkataraman] Address code review comments e7d5449 [Shivaram Venkataraman] Fix merge issues ad7cb53 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations df14cee [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 5093aea [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 0171d3c [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations bc4dfd6 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 774751b [Shivaram Venkataraman] Fix bug introduced by line length adjustment 34d0283 [Shivaram Venkataraman] Fix style issues 3b464b7 [Shivaram Venkataraman] Set preferred locations for reduce tasks This is another attempt at apache#1697 addressing some of the earlier concerns. This adds a couple of thresholds based on number map and reduce tasks beyond which we don't use preferred locations for reduce tasks. Conflicts: core/src/main/scala/org/apache/spark/MapOutputTracker.scala core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala

https://quip-apple.com/43d7ACTfEBrc

shivaram added 4 commits July 30, 2014 21:15

Set preferred locations for reduce tasks

3fe76f7

Amortize traversing array by doing it once per RDD.

f8390dd

Merge branch 'master' of https://github.com/apache/spark into reducer…

6782dea

…-locality

Add a unit test that checks for reducer locality

3666ff5

shivaram changed the title ~~[SPARK-2774 - Set preferred locations for reduce tasks~~ SPARK-2774 - Set preferred locations for reduce tasks Jul 31, 2014

shivaram changed the title ~~SPARK-2774 - Set preferred locations for reduce tasks~~ [SPARK-2774] - Set preferred locations for reduce tasks Jul 31, 2014

rxin reviewed Jul 31, 2014
View reviewed changes

Use Guava's Ordering.greatestOf to avoid sorting

acf1a2b

Also add a unit test for ordering and address comments

Merge branch 'master' of https://github.com/apache/spark into reducer…

6193039

…-locality Conflicts: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

asfgit closed this in 1ac1c1d Jan 19, 2015

shivaram mentioned this pull request Feb 12, 2015

[SPARK-2774] Set preferred locations for reduce tasks #4576

Closed

shivaram mentioned this pull request Jun 4, 2015

[SPARK-2774] Set preferred locations for reduce tasks #6652

Closed

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023

rdar://106150608 Bump Hadoop to 3.3.4.3-apple (apache#1697)

924a4ac

https://quip-apple.com/43d7ACTfEBrc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-2774] - Set preferred locations for reduce tasks #1697

[SPARK-2774] - Set preferred locations for reduce tasks #1697

shivaram commented Jul 31, 2014

SparkQA commented Jul 31, 2014

rxin Jul 31, 2014

rxin Jul 31, 2014

shivaram Aug 1, 2014

rxin commented Jul 31, 2014

shivaram commented Jul 31, 2014

shivaram commented Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

shivaram commented Aug 1, 2014

shivaram commented Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 2, 2014

shivaram commented Oct 27, 2014

rxin commented Nov 2, 2014

JoshRosen commented Dec 23, 2014

shivaram commented Dec 24, 2014

pwendell commented Jan 19, 2015

[SPARK-2774] - Set preferred locations for reduce tasks #1697

[SPARK-2774] - Set preferred locations for reduce tasks #1697

Conversation

shivaram commented Jul 31, 2014

SparkQA commented Jul 31, 2014

rxin Jul 31, 2014

Choose a reason for hiding this comment

rxin Jul 31, 2014

Choose a reason for hiding this comment

shivaram Aug 1, 2014

Choose a reason for hiding this comment

rxin commented Jul 31, 2014

shivaram commented Jul 31, 2014

shivaram commented Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 1, 2014

shivaram commented Aug 1, 2014

shivaram commented Aug 1, 2014

SparkQA commented Aug 1, 2014

SparkQA commented Aug 2, 2014

shivaram commented Oct 27, 2014

rxin commented Nov 2, 2014

JoshRosen commented Dec 23, 2014

shivaram commented Dec 24, 2014

pwendell commented Jan 19, 2015