-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-18967][SCHEDULER] compute locality levels even if delay = 0 #16376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
341b73a
6347464
8c329c6
c216160
9c64888
930c2b7
fecc280
c7e7d6e
e2344d2
0852742
f5e7d03
cf8271e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,7 +29,7 @@ import org.scalatest.mock.MockitoSugar | |
| import org.apache.spark._ | ||
| import org.apache.spark.internal.config | ||
| import org.apache.spark.internal.Logging | ||
| import org.apache.spark.storage.BlockManagerId | ||
| import org.apache.spark.util.ManualClock | ||
|
|
||
| class FakeSchedulerBackend extends SchedulerBackend { | ||
| def start() {} | ||
|
|
@@ -819,4 +819,89 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B | |
| assert(!taskScheduler.hasExecutorsAliveOnHost("host0")) | ||
| assert(taskScheduler.getExecutorsAliveOnHost("host0").isEmpty) | ||
| } | ||
|
|
||
| test("Locality should be used for bulk offers even with delay scheduling off") { | ||
| val conf = new SparkConf() | ||
| .set("spark.locality.wait", "0") | ||
| sc = new SparkContext("local", "TaskSchedulerImplSuite", conf) | ||
| // we create a manual clock just so we can be sure the clock doesn't advance at all in this test | ||
| val clock = new ManualClock() | ||
|
|
||
| // We customize the task scheduler just to let us control the way offers are shuffled, so we | ||
| // can be sure we try both permutations, and to control the clock on the tasksetmanager. | ||
| val taskScheduler = new TaskSchedulerImpl(sc) { | ||
| override def shuffleOffers(offers: IndexedSeq[WorkerOffer]): IndexedSeq[WorkerOffer] = { | ||
| // Don't shuffle the offers around for this test. Instead, we'll just pass in all | ||
| // the permutations we care about directly. | ||
| offers | ||
| } | ||
| override def createTaskSetManager(taskSet: TaskSet, maxTaskFailures: Int): TaskSetManager = { | ||
| new TaskSetManager(this, taskSet, maxTaskFailures, blacklistTrackerOpt, clock) | ||
| } | ||
| } | ||
| // Need to initialize a DAGScheduler for the taskScheduler to use for callbacks. | ||
| new DAGScheduler(sc, taskScheduler) { | ||
| override def taskStarted(task: Task[_], taskInfo: TaskInfo) {} | ||
| override def executorAdded(execId: String, host: String) {} | ||
| } | ||
| taskScheduler.initialize(new FakeSchedulerBackend) | ||
|
|
||
| // Make two different offers -- one in the preferred location, one that is not. | ||
| val offers = IndexedSeq( | ||
| WorkerOffer("exec1", "host1", 1), | ||
| WorkerOffer("exec2", "host2", 1) | ||
| ) | ||
| Seq(false, true).foreach { swapOrder => | ||
| // Submit a taskset with locality preferences. | ||
| val taskSet = FakeTask.createTaskSet( | ||
| 1, stageId = 1, stageAttemptId = 0, Seq(TaskLocation("host1", "exec1"))) | ||
| taskScheduler.submitTasks(taskSet) | ||
| val shuffledOffers = if (swapOrder) offers.reverse else offers | ||
| // Regardless of the order of the offers (after the task scheduler shuffles them), we should | ||
| // always take advantage of the local offer. | ||
| val taskDescs = taskScheduler.resourceOffers(shuffledOffers).flatten | ||
| withClue(s"swapOrder = $swapOrder") { | ||
| assert(taskDescs.size === 1) | ||
| assert(taskDescs.head.executorId === "exec1") | ||
| } | ||
| } | ||
| } | ||
|
|
||
| test("With delay scheduling off, tasks can be run at any locality level immediately") { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry last thing: just realized -- does this test need to first submit a local resource offer? That makes sure that the local executor is considered alive. Otherwise, process local won't be in the set of allowed locality levels because of the code here: https://github.com/apache/spark/pull/16376/files#diff-bad3987c83bd22d46416d3dd9d208e76R966, which makes this test somewhat less effective if I understand correctly
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, you are absolutely right. I've updated and also added a check to make sure tsm includes lower locality levels. |
||
| val conf = new SparkConf() | ||
| .set("spark.locality.wait", "0") | ||
| sc = new SparkContext("local", "TaskSchedulerImplSuite", conf) | ||
|
|
||
| // we create a manual clock just so we can be sure the clock doesn't advance at all in this test | ||
| val clock = new ManualClock() | ||
| val taskScheduler = new TaskSchedulerImpl(sc) { | ||
| override def createTaskSetManager(taskSet: TaskSet, maxTaskFailures: Int): TaskSetManager = { | ||
| new TaskSetManager(this, taskSet, maxTaskFailures, blacklistTrackerOpt, clock) | ||
| } | ||
| } | ||
| // Need to initialize a DAGScheduler for the taskScheduler to use for callbacks. | ||
| new DAGScheduler(sc, taskScheduler) { | ||
| override def taskStarted(task: Task[_], taskInfo: TaskInfo) {} | ||
| override def executorAdded(execId: String, host: String) {} | ||
| } | ||
| taskScheduler.initialize(new FakeSchedulerBackend) | ||
| // make an offer on the preferred host so the scheduler knows its alive. This is necessary | ||
| // so that the taskset knows that it *could* take advantage of locality. | ||
| taskScheduler.resourceOffers(IndexedSeq(WorkerOffer("exec1", "host1", 1))) | ||
|
|
||
| // Submit a taskset with locality preferences. | ||
| val taskSet = FakeTask.createTaskSet( | ||
| 1, stageId = 1, stageAttemptId = 0, Seq(TaskLocation("host1", "exec1"))) | ||
| taskScheduler.submitTasks(taskSet) | ||
| val tsm = taskScheduler.taskSetManagerForAttempt(1, 0).get | ||
| // make sure we've setup our test correctly, so that the taskset knows it *could* use local | ||
| // offers. | ||
| assert(tsm.myLocalityLevels.contains(TaskLocality.NODE_LOCAL)) | ||
| // make an offer on a non-preferred location. Since the delay is 0, we should still schedule | ||
| // immediately. | ||
| val taskDescs = | ||
| taskScheduler.resourceOffers(IndexedSeq(WorkerOffer("exec2", "host2", 1))).flatten | ||
| assert(taskDescs.size === 1) | ||
| assert(taskDescs.head.executorId === "exec2") | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I was figuring out the purpose of this for what to put in the comment, I made a couple of observations:
For each executor we add or remove, its an O(numExecutors) operation to update the locality levels. So overall its an O(numExecutors^2) to add a bunch. Minor on small clusters, but I wonder if this is an issue when you're using dynamic allocation and going up and down to 1000s of executors. Its all happening with a lock on the
TaskSchedulerImpltoo.Though we recompute valid locality levels as executors come and go, we do not as tasks complete. That's not a problem -- as offers come in, we still go through the right task lists. But it does make me wonder whether this business of updating the locality levels for the current set of executors is useful, and instead we should just always use all levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) does seem like an issue. I also mostly agree for (2), since the logic of avoiding unnecessarily waiting for delay timeouts is already handled (separately from the myLocalityLevels calculation) here. My only hesitation is that myLocalityLevels does allow avoiding the delay timeout in cases where there are tasks have constraints to run on executors that haven't been granted to the application, so that use case seems like it might merit keeping the code (also, if you agree, can you update the myLocalityLevels comment?). In any case I'd do this in a separate PR.