New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33446][CORE] Add config spark.executor.coresOverhead #30370
Conversation
Can one of the admins verify this patch? |
I am not sure I understand what the usecase here is. To answer the jira example: If the underlying resource allocation is in blocks of 1core/3gb - then it will result in wasting 1 core when requesting for 6gb container. Do we have cases where it only considers cores and not memory ? (The reverse used to be the case in yarn about a decade back - where allocation was only around memory, with cores inferred) To give a slightly different example - if the memory allocation is in multiples of 1G, asking for 1.5g executor would give you a 2g container : with Xmx set to 1.5g, 'ignoring' the additional 0.5g [1]. [1] Purposefully ignoring memory overhead for explanation simplicity. |
Your understanding is correct. For your example, if we request 1 core + 1 coreOverhead and 6G, then this executor can only run 1 task. The coreOverhead won't be considered for task scheduling, we can still have 6G for 1 task and partition. |
Given this, why do we want to add core overhead ? What is the usecase for it ? |
We want 2 cores and 6G, but only 1 task could be allocated. Then we can guarantee 1 task can use 6 GB, and extra cpu time. |
I am not sure I follow.
In other words, spark application makes the request for what it needs - cluster provides the next higher multiple which satisfies the requirements. If this means extra memory or additional cores, spark wont use it. I am trying to understand what is the actual usecase we have, and how the existing configs wont help. Thanks ! |
I want an executor with 2 cores and 6 gb, but only 1 core used for task allocation, which means at most 1 task could be running on this executor. If I use existing configs, there would be at most 2 tasks. I want to give each task 6GB memory, but also use extra cpu time for gc or other things. |
GC is handled by the VM - we dont need an additional core for it. Having said that, if the requirement is strictly what you mentioned : allocate two cores and 6gb, run only 1 task and leave 1 core around for "other things" (let me assume some background processing ? Not sure what is happening) : you can use spark.task.cpus=2 |
I'll try to use spark.task.cpus |
What changes were proposed in this pull request?
Add config spark.executor.coresOverhead
Why are the changes needed?
This can handle mismatch of memory per core ratio between executor and underlying physical machines or vm.
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
Covered in UT.