[SPARK-18967][SCHEDULER] compute locality levels even if delay = 0 #16376

squito · 2016-12-21T21:51:43Z

What changes were proposed in this pull request?

Before this change, with delay scheduling off, spark would effectively
ignore locality preferences for bulk scheduling. With this change,
locality preferences are used when multiple offers are made
simultaneously.

How was this patch tested?

Test case added which fails without this change. All unit tests run via jenkins.

Before this change, with delay scheduling off, spark would effectively ignore locality preferences for bulk scheduling. This ensures that locality preferences are used when multiple offers are made simultaneously, and adds a test case for it.

mridulm · 2016-12-21T23:30:21Z

At first glance, this looks like the right change - but I might be missing something.
Also +CC @lirui-intel who made this change.

SparkQA · 2016-12-22T00:11:18Z

Test build #70494 has finished for PR 16376 at commit 341b73a.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2016-12-22T00:31:43Z

It'd be nice to check whether the test case attached to SPARK-1937 still applies. Reading the description of that bug, it seems related not only to this but to other things you've been playing with recently.

SparkQA · 2016-12-22T00:47:37Z

Test build #70495 has finished for PR 16376 at commit 6347464.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

lirui-apache · 2016-12-22T09:14:01Z

The change looks good to me, although I still want to make sure I understand it correctly. Before the change, a locality level is invalid if it has delay=0. The patch changes that and makes such locality level valid too. So during scheduling, it gives us a chance to try the smaller levels first, e.g. PROCESS_LOCAL, NODE_LOCAL, otherwise we'll just schedule the task to any executors available. And meanwhile, since the wait time for the smaller levels are 0, we'll quickly fall back to ANY if a satisfying executor is not available. So that the delay=0 is still enforced. Is this correct?

kayousterhout

This looks good. Can you also update the comment on myLocalityLevels in TaskSetManager.scala to clarify what it's used for? It currently mentions delay scheduling, which is no longer true after this PR.

Similar to @vanzin, I was a little nervous about this change so tracked down the origin of it and why it was added as part of #892 (for SPARK-1937). It looks like the bug you've fixed was added towards the end of the review process for #892: it was added in the 2nd to last commit in that PR, and apparently as a result of the June 10th comment beginning with "I've updated...". That comment implies that this bug was introduced as a last-minute opportunistic performance improvement (and not as part of fixing the bug that PR was addressing). Specifically, that comment/commit changed TaskSchedulerImpl to iterate only though TSM.myLocalityLevels, rather than through all of the locality levels. So, you could also fix this issue by un-doing that change, but I think the approach you took here is more intuitive (and results in slightly better performance, since it does avoid iterating over levels for which nothing will happen, which is exactly what the original buggy change was intended to do).

kayousterhout · 2017-01-04T01:23:36Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

@@ -338,7 +338,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
    }.getOrElse(offers)

    // Randomly shuffle offers to avoid always placing tasks on the same set of workers.


you can remove this comment now that it's duplicated in the shuffleOffers method description

kayousterhout · 2017-01-04T01:39:46Z

@lirui-intel yes, that's consistent with my understanding. The TaskSetManager still checks that it's not going beyond the currently-allowed locality level here, so locality levels > 0 will still be enforced, and the TaskSchedulerImpl is still checking all of the locality levels in myLocalityLevels each time it has an offer, so it will still schedule non-local tasks when the wait is 0 and no local tasks can be scheduled. Also, my comment above may be helpful for understanding why this bug was originally introduced.

lirui-apache · 2017-01-04T08:44:17Z

@kayousterhout I see. Thanks for the explanations :)

…delay scheduling off

squito · 2017-01-05T19:23:00Z

Thanks for the feedback. I've updated the comment on myLocalityLevels.

Also I updated the tests slightly. To ensure that we're really testing no delay, I updated the tests to use a manual clock which never advances. And also to address @lirui-intel 's concern, i also added a test that we still schedule at non-preferred locations immediately with delay scheduling off. (Somewhat obvious in current implementation since we always add ANY but I think its a good regression test if nothing else.)

squito · 2017-01-05T19:33:59Z

core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

+   * This allows a performance optimization, of skipping levels that aren't relevant (eg., skip
+   * PROCESS_LOCAL if no tasks could be run PROCESS_LOCAL for the current set of executors).
+   */
  var myLocalityLevels = computeValidLocalityLevels()


As I was figuring out the purpose of this for what to put in the comment, I made a couple of observations:

For each executor we add or remove, its an O(numExecutors) operation to update the locality levels. So overall its an O(numExecutors^2) to add a bunch. Minor on small clusters, but I wonder if this is an issue when you're using dynamic allocation and going up and down to 1000s of executors. Its all happening with a lock on the TaskSchedulerImpl too.

Though we recompute valid locality levels as executors come and go, we do not as tasks complete. That's not a problem -- as offers come in, we still go through the right task lists. But it does make me wonder whether this business of updating the locality levels for the current set of executors is useful, and instead we should just always use all levels.

(1) does seem like an issue. I also mostly agree for (2), since the logic of avoiding unnecessarily waiting for delay timeouts is already handled (separately from the myLocalityLevels calculation) here. My only hesitation is that myLocalityLevels does allow avoiding the delay timeout in cases where there are tasks have constraints to run on executors that haven't been granted to the application, so that use case seems like it might merit keeping the code (also, if you agree, can you update the myLocalityLevels comment?). In any case I'd do this in a separate PR.

kayousterhout

LGTM. New test looks great!

kayousterhout · 2017-01-05T19:43:51Z

core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

+   * This allows a performance optimization, of skipping levels that aren't relevant (eg., skip
+   * PROCESS_LOCAL if no tasks could be run PROCESS_LOCAL for the current set of executors).
+   */
  var myLocalityLevels = computeValidLocalityLevels()


(1) does seem like an issue. I also mostly agree for (2), since the logic of avoiding unnecessarily waiting for delay timeouts is already handled (separately from the myLocalityLevels calculation) here. My only hesitation is that myLocalityLevels does allow avoiding the delay timeout in cases where there are tasks have constraints to run on executors that haven't been granted to the application, so that use case seems like it might merit keeping the code (also, if you agree, can you update the myLocalityLevels comment?). In any case I'd do this in a separate PR.

SparkQA · 2017-01-05T21:26:18Z

Test build #70937 has finished for PR 16376 at commit 930c2b7.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-05T21:42:56Z

Test build #70936 has finished for PR 16376 at commit 9c64888.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-01-09T16:17:26Z

Jenkins, retest this please

SparkQA · 2017-01-09T18:49:11Z

Test build #71084 has finished for PR 16376 at commit 930c2b7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout · 2017-01-09T20:27:44Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

    val sc: SparkContext,
    val maxTaskFailures: Int,
-    blacklistTrackerOpt: Option[BlacklistTracker],
+    val blacklistTrackerOpt: Option[BlacklistTracker],


Just noticed that this doesn't seem to be used publicly anywhere -- does it need to be a val? Should it be private val?

oh good point, this change is not necessary (must have been part of another change which I later reverted, sorry)

doh, actually this is used in tests. I override createTaskSetManager to use my manual clock. Another alternative is to allow the clock to get passed in to the TaskSCheudlerImpl constructor, but that ends up being a bigger code change.

I'll make it private[scheduler] val, slightly tighter than the implicit private[spark] it would be otherwise.

SparkQA · 2017-01-10T18:00:15Z

Test build #71137 has finished for PR 16376 at commit c7e7d6e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout · 2017-01-10T19:13:16Z

core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala

+    }
+  }
+
+  test("With delay scheduling off, tasks can be run at any locality level immediately") {


Sorry last thing: just realized -- does this test need to first submit a local resource offer? That makes sure that the local executor is considered alive. Otherwise, process local won't be in the set of allowed locality levels because of the code here: https://github.com/apache/spark/pull/16376/files#diff-bad3987c83bd22d46416d3dd9d208e76R966, which makes this test somewhat less effective if I understand correctly

yes, you are absolutely right. I've updated and also added a check to make sure tsm includes lower locality levels.

SparkQA · 2017-01-10T21:05:02Z

Test build #71141 has finished for PR 16376 at commit e2344d2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout · 2017-01-10T21:51:15Z

LGTM

mridulm · 2017-01-10T22:28:42Z

LGTM, thanks @squito !

SparkQA · 2017-01-11T00:23:46Z

Test build #71156 has finished for PR 16376 at commit 0852742.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-01-11T16:34:57Z

Jenkins, retest this please

SparkQA · 2017-01-11T19:12:01Z

Test build #71223 has finished for PR 16376 at commit 0852742.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout · 2017-02-07T00:14:26Z

@squito I just noticed this hasn't been merged. Is this good to go pending tests passing again?

kayousterhout · 2017-02-07T00:14:34Z

Jenkins, retest this please

SparkQA · 2017-02-07T02:53:35Z

Test build #72477 has finished for PR 16376 at commit 0852742.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-02-07T03:32:56Z

yes, I think this is ready (I just noticed a couple of minor nits with a fresh read but no real changes)

SparkQA · 2017-02-07T06:12:46Z

Test build #72484 has finished for PR 16376 at commit cf8271e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kayousterhout · 2017-02-07T06:40:35Z

Awesome always enthusiastic about fixing minor nits!! I merged this into master. I didn't merge it into 2.1 but I don't feel strongly about it.

## What changes were proposed in this pull request? Before this change, with delay scheduling off, spark would effectively ignore locality preferences for bulk scheduling. With this change, locality preferences are used when multiple offers are made simultaneously. ## How was this patch tested? Test case added which fails without this change. All unit tests run via jenkins. Author: Imran Rashid <irashid@cloudera.com> Closes apache#16376 from squito/locality_without_delay.

squito added 2 commits December 21, 2016 15:50

fix test name

6347464

kayousterhout reviewed Jan 4, 2017

View reviewed changes

squito added 4 commits January 4, 2017 15:41

Merge branch 'master' into locality_without_delay

8c329c6

update comments

c216160

use manual clock in test; add test for non-preferred scheduling with …

9c64888

…delay scheduling off

typo

930c2b7

squito commented Jan 5, 2017

View reviewed changes

kayousterhout reviewed Jan 5, 2017

View reviewed changes

kayousterhout reviewed Jan 9, 2017

View reviewed changes

squito added 2 commits January 10, 2017 11:50

Merge branch 'master' into locality_without_delay

fecc280

cleanup

c7e7d6e

fix

e2344d2

kayousterhout reviewed Jan 10, 2017

View reviewed changes

cleanup

0852742

squito added 2 commits February 6, 2017 21:30

style

f5e7d03

Merge branch 'master' into locality_without_delay

cf8271e

asfgit closed this in d904309 Feb 7, 2017

		@@ -338,7 +338,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
		}.getOrElse(offers)

		// Randomly shuffle offers to avoid always placing tasks on the same set of workers.

[SPARK-18967][SCHEDULER] compute locality levels even if delay = 0 #16376

[SPARK-18967][SCHEDULER] compute locality levels even if delay = 0 #16376

Uh oh!

Conversation

squito commented Dec 21, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

mridulm commented Dec 21, 2016

Uh oh!

SparkQA commented Dec 22, 2016

Uh oh!

vanzin commented Dec 22, 2016

Uh oh!

SparkQA commented Dec 22, 2016

Uh oh!

lirui-apache commented Dec 22, 2016

Uh oh!

kayousterhout left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kayousterhout commented Jan 4, 2017

Uh oh!

lirui-apache commented Jan 4, 2017

Uh oh!

squito commented Jan 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kayousterhout left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 5, 2017

Uh oh!

SparkQA commented Jan 5, 2017

Uh oh!

squito commented Jan 9, 2017

Uh oh!

SparkQA commented Jan 9, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 10, 2017

Uh oh!

kayousterhout commented Jan 10, 2017

Uh oh!

mridulm commented Jan 10, 2017

Uh oh!

SparkQA commented Jan 11, 2017

Uh oh!

squito commented Jan 11, 2017

Uh oh!

SparkQA commented Jan 11, 2017

Uh oh!

kayousterhout commented Feb 7, 2017

Uh oh!

kayousterhout commented Feb 7, 2017

Uh oh!

SparkQA commented Feb 7, 2017

Uh oh!

squito commented Feb 7, 2017

Uh oh!

SparkQA commented Feb 7, 2017

kayousterhout left a comment •

edited

Loading