New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21225][CORE] Considering CPUS_PER_TASK when allocating task slots for each WorkerOffer #18435
Conversation
@@ -345,7 +345,7 @@ private[spark] class TaskSchedulerImpl( | |||
|
|||
val shuffledOffers = shuffleOffers(filteredOffers) | |||
// Build a list of tasks to assign to each worker. | |||
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores)) | |||
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](Math.ceil(o.cores*1.0/CPUS_PER_TASK).toInt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use math.ceil(o.cores.toDouble / CPUS_PER_TASK).toInt
How much difference can this make? you're saving at most about 8 bytes x # cores per offer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have motify to use math.ceil(o.cores.toDouble / CPUS_PER_TASK).toInt);
and each offer can save o.cores*(1-1/CPUS_PER_TASK)*length(TaskDescription) bytes.
be633c6
to
0074c3b
Compare
I have motify to use |
It doesn't save a TaskDescription's size; it saves the size of a reference only. that's why I'm not sure this is worthwhile, but at the same time, doesn't really hurt. |
yes, It's only save a reference. |
From my understanding, this looks like a bug here, we didn't consider CPU_PER_TASK configuration. Instead of saving memory, I think this PR is more like fixing a bug here. As for saving memory, yes it does, but I don't think performance will be affected a lot. |
Besides is it enough to use |
@jerryshao I agree with you point, it enough to use o.cores / CPUS_PER_TASK to instead。like the example you list, the left one core is not enough to allocate for a task. |
…nction resourceOffers
0074c3b
to
b745dab
Compare
Can you please change your PR title and description to reflect the real issue here? |
@jerryshao Thank you, I have changed. |
Test build #3816 has finished for PR 18435 at commit
|
From the jenkins, The Execption is caused by lost of the file "target/unit-tests.log" ,I think it not caused by this PR and all the tests look like passed. Can you reset the test for me, thanks you @srowen |
@jiangxb1987 @cloud-fan Can you please review this JIRA? The changes should be safe from my understanding. Also can we change the title to: Considering CPUS_PER_TASK when allocating task slots for each WorkerOffer. I think that is more clear to reflect the issue here. |
@jerryshao Ok |
This is only changing the default size of a collection, so I think it's safe to merge. |
Test build #3819 has finished for PR 18435 at commit
|
LGTM, merging to master! |
…ots for each WorkerOffer JIRA Issue:https://issues.apache.org/jira/browse/SPARK-21225 In the function "resourceOffers", It declare a variable "tasks" for storage the tasks which have allocated a executor. It declared like this: `val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores))` But, I think this code only conside a situation for that one task per core. If the user set "spark.task.cpus" as 2 or 3, It really don't need so much Mem. I think It can motify as follow: val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK)) to instead. Motify like this the other earning is that it's more easy to understand the way how the tasks allocate offers. Author: 杨治国10192065 <yang.zhiguo@zte.com.cn> Closes apache#18435 from JackYangzg/motifyTaskCoreDisp.
JIRA Issue:https://issues.apache.org/jira/browse/SPARK-21225
In the function "resourceOffers", It declare a variable "tasks" for storage the tasks which have allocated a executor. It declared like this:
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores))
But, I think this code only conside a situation for that one task per core. If the user set "spark.task.cpus" as 2 or 3, It really don't need so much Mem. I think It can motify as follow:
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK))
to instead.
Motify like this the other earning is that it's more easy to understand the way how the tasks allocate offers.