[SPARK-21225][CORE] Considering CPUS_PER_TASK when allocating task slots for each WorkerOffer #18435

JackYangzg · 2017-06-27T11:43:30Z

JIRA Issue:https://issues.apache.org/jira/browse/SPARK-21225
In the function "resourceOffers", It declare a variable "tasks" for storage the tasks which have allocated a executor. It declared like this:
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores))
But, I think this code only conside a situation for that one task per core. If the user set "spark.task.cpus" as 2 or 3, It really don't need so much Mem. I think It can motify as follow:
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK))
to instead.
Motify like this the other earning is that it's more easy to understand the way how the tasks allocate offers.

srowen · 2017-06-27T11:56:44Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

@@ -345,7 +345,7 @@ private[spark] class TaskSchedulerImpl(

    val shuffledOffers = shuffleOffers(filteredOffers)
    // Build a list of tasks to assign to each worker.
-    val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores))
+    val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](Math.ceil(o.cores*1.0/CPUS_PER_TASK).toInt))


Please use math.ceil(o.cores.toDouble / CPUS_PER_TASK).toInt
How much difference can this make? you're saving at most about 8 bytes x # cores per offer.

I have motify to use math.ceil(o.cores.toDouble / CPUS_PER_TASK).toInt);
and each offer can save o.cores*(1-1/CPUS_PER_TASK)*length(TaskDescription) bytes.

JackYangzg · 2017-06-27T12:49:40Z

I have motify to use math.ceil(o.cores.toDouble/CPUS_PER_TASK).toInt);
and each offer can save o.cores*(1-1/CPUS_PER_TASK)*length(TaskDescription) bytes.

srowen · 2017-06-27T12:54:33Z

It doesn't save a TaskDescription's size; it saves the size of a reference only. that's why I'm not sure this is worthwhile, but at the same time, doesn't really hurt.

JackYangzg · 2017-06-27T13:02:13Z

yes, It's only save a reference.
I have read the related code, and It only affect this code
if (availableCpus(i) >= CPUS_PER_TASK) { try { for (task <- taskSet.resourceOffer(execId, host, maxLocality)) { **tasks(i) += task** val tid = task.taskId taskIdToTaskSetManager(tid) = taskSet taskIdToExecutorId(tid) = execId executorIdToRunningTaskIds(execId).add(tid) availableCpus(i) -= CPUS_PER_TASK assert(availableCpus(i) >= 0) launchedTask = true } }
obvious, it safe

jerryshao · 2017-06-28T09:00:01Z

From my understanding, this looks like a bug here, we didn't consider CPU_PER_TASK configuration. Instead of saving memory, I think this PR is more like fixing a bug here. As for saving memory, yes it does, but I don't think performance will be affected a lot.

jerryshao · 2017-06-28T09:07:01Z

Besides is it enough to use o.cores / CPUS_PER_TASK? I'm not sure why do we need to use ceil, for example if we have 10 cores in work offer and CPUS_PER_TASK is 3, then 3 slots should be enough? Please correct me if I'm wrong.

JackYangzg · 2017-06-28T09:27:13Z

@jerryshao I agree with you point, it enough to use o.cores / CPUS_PER_TASK to instead。like the example you list, the left one core is not enough to allocate for a task.
In fact, to deal with this
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK))
is easy to understand the way how to allocate offers for the task .
@srowen

…nction resourceOffers

jerryshao · 2017-06-28T09:32:10Z

Can you please change your PR title and description to reflect the real issue here?

JackYangzg · 2017-06-28T09:50:40Z

@jerryshao Thank you, I have changed.

SparkQA · 2017-06-28T15:52:13Z

Test build #3816 has finished for PR 18435 at commit b745dab.

This patch fails due to an unknown error code, -10.
This patch merges cleanly.
This patch adds no public classes.

JackYangzg · 2017-06-29T00:44:50Z

From the jenkins, The Execption is caused by lost of the file "target/unit-tests.log" ,I think it not caused by this PR and all the tests look like passed. Can you reset the test for me, thanks you @srowen

jerryshao · 2017-06-29T02:06:40Z

@jiangxb1987 @cloud-fan Can you please review this JIRA? The changes should be safe from my understanding.

Also can we change the title to: Considering CPUS_PER_TASK when allocating task slots for each WorkerOffer. I think that is more clear to reflect the issue here.

JackYangzg · 2017-06-29T04:32:08Z

@jerryshao Ok

srowen · 2017-06-29T09:02:45Z

This is only changing the default size of a collection, so I think it's safe to merge.

SparkQA · 2017-06-29T11:56:12Z

Test build #3819 has finished for PR 18435 at commit b745dab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-06-29T12:54:08Z

LGTM, merging to master!

…ots for each WorkerOffer JIRA Issue:https://issues.apache.org/jira/browse/SPARK-21225 In the function "resourceOffers", It declare a variable "tasks" for storage the tasks which have allocated a executor. It declared like this: `val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores))` But, I think this code only conside a situation for that one task per core. If the user set "spark.task.cpus" as 2 or 3, It really don't need so much Mem. I think It can motify as follow: val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK)) to instead. Motify like this the other earning is that it's more easy to understand the way how the tasks allocate offers. Author: 杨治国10192065 <yang.zhiguo@zte.com.cn> Closes apache#18435 from JackYangzg/motifyTaskCoreDisp.

srowen reviewed Jun 27, 2017

View reviewed changes

JackYangzg force-pushed the motifyTaskCoreDisp branch 2 times, most recently from be633c6 to 0074c3b Compare June 27, 2017 12:39

[SPARK-21225][CORE] decrease the Mem using for variable 'tasks' in fu…

b745dab

…nction resourceOffers

JackYangzg force-pushed the motifyTaskCoreDisp branch from 0074c3b to b745dab Compare June 28, 2017 09:30

JackYangzg changed the title ~~[SPARK-21225][CORE] decrease the Mem using for variable 'tasks' in fu…~~ [SPARK-21225][CORE] make it easy understand for offering resources for tasks and saving Mem Jun 28, 2017

srowen approved these changes Jun 28, 2017

View reviewed changes

JackYangzg changed the title ~~[SPARK-21225][CORE] make it easy understand for offering resources for tasks and saving Mem~~ [SPARK-21225][CORE] Considering CPUS_PER_TASK when allocating task slots for each WorkerOffer Jun 29, 2017

asfgit closed this in 29bd251 Jun 29, 2017

JackYangzg deleted the motifyTaskCoreDisp branch June 30, 2017 03:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21225][CORE] Considering CPUS_PER_TASK when allocating task slots for each WorkerOffer #18435

[SPARK-21225][CORE] Considering CPUS_PER_TASK when allocating task slots for each WorkerOffer #18435

JackYangzg commented Jun 27, 2017 •

edited

srowen Jun 27, 2017

JackYangzg Jun 27, 2017

JackYangzg commented Jun 27, 2017

srowen commented Jun 27, 2017

JackYangzg commented Jun 27, 2017

jerryshao commented Jun 28, 2017 •

edited

jerryshao commented Jun 28, 2017

JackYangzg commented Jun 28, 2017

jerryshao commented Jun 28, 2017

JackYangzg commented Jun 28, 2017

SparkQA commented Jun 28, 2017

JackYangzg commented Jun 29, 2017 •

edited

jerryshao commented Jun 29, 2017

JackYangzg commented Jun 29, 2017

srowen commented Jun 29, 2017

SparkQA commented Jun 29, 2017

cloud-fan commented Jun 29, 2017

[SPARK-21225][CORE] Considering CPUS_PER_TASK when allocating task slots for each WorkerOffer #18435

[SPARK-21225][CORE] Considering CPUS_PER_TASK when allocating task slots for each WorkerOffer #18435

Conversation

JackYangzg commented Jun 27, 2017 • edited

srowen Jun 27, 2017

Choose a reason for hiding this comment

JackYangzg Jun 27, 2017

Choose a reason for hiding this comment

JackYangzg commented Jun 27, 2017

srowen commented Jun 27, 2017

JackYangzg commented Jun 27, 2017

jerryshao commented Jun 28, 2017 • edited

jerryshao commented Jun 28, 2017

JackYangzg commented Jun 28, 2017

jerryshao commented Jun 28, 2017

JackYangzg commented Jun 28, 2017

SparkQA commented Jun 28, 2017

JackYangzg commented Jun 29, 2017 • edited

jerryshao commented Jun 29, 2017

JackYangzg commented Jun 29, 2017

srowen commented Jun 29, 2017

SparkQA commented Jun 29, 2017

cloud-fan commented Jun 29, 2017

JackYangzg commented Jun 27, 2017 •

edited

jerryshao commented Jun 28, 2017 •

edited

JackYangzg commented Jun 29, 2017 •

edited