-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-7699][Core] Lazy start the scheduler for dynamic allocation #6430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #33571 timed out for PR 6430 at commit |
|
Jenkins, retest this please. |
|
If my understanding is correct, this affects the add behavior, making it so that Can we change the logic to be that we don't reduce
|
|
Hey @sryza , so you mean we only consider the |
|
exactly |
|
OK, let me change the logic to handle this. |
|
Test build #33633 has finished for PR 6430 at commit
|
|
Test build #33650 has finished for PR 6430 at commit
|
|
Test build #33652 has finished for PR 6430 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These compareAndSet can just be set I think. In fact a @volatile boolean is OK too. I don't know what's simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to using a @volatile boolean
|
This looks like a pretty reasonable change, to attempt to defer evaluating a change in executors until something happens. I suppose this still means that something like |
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this to inInitialExecutorsPeriod and add a comment explaining the criteria for being in this period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just call this @volatile private val initializing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then could you add a comment to explain what this means, maybe something like:
// Whether we are still waiting for the initial set of executors to be allocated.
// While this is true, we will not cancel outstanding executor requests. This is
// set to false when:
// (1) the initial executors have all registered,
// (2) a stage is submitted, or
// (3) an executor idle timeout has elapsed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andrewor14 , do we need to consider this condition (1) the initial executors have all registered to change the initializing to false?
Previously in this patch we set initializing to false when:
- a stage is submitted.
- an executor is timeout due to idle (default is 60 secs)
Now we add another condition, so possibly we may have such scenario: when all the initial executors are registered, but no stage is submitted and executor is not idle timeout. So the executors can have a chance to ramp down at this time window, I think this will introduce additional requesting when stage is later submitted, is this a intended behavior?
|
Hi all, I looked at the code and I think it works. However, I think there is a slightly simpler alternative. Then the rest of the code in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style:
private def createSparkContext(
minExecutors: Int = 1,
maxExecutors: Int = 5,
initialExecutors: Int = 1): SparkContext = {
...
}
|
@andrewor14 it's the other way around. We're ok with going above initialExecutors during the initialization period, but don't want to go below it. |
|
Hm, that seems to be so. I'm just wondering if there's a better way to simplify the cases so we don't have nested This is fairly easy to reason about. For the new initialization logic, it would be good to add it as a new separate case instead of nesting it in an existing case, something like: Then in |
|
Another related problem with or without my suggested change: we actually call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, does it make sense to do this inside the case maxNeeded < numExecutorsTarget? This condition implies that we are targeting more executors that are needed, and so we should cancel outstanding requests. This is not really true during initialization.
I think a better indication that your application has initialized is if the number of executors is >= your initial executors. I discussed this in more detail on the main thread.
|
To avoid the problem @andrewor14 brought up, can we avoid calling |
|
Thanks a lot @andrewor14 and @sryza for your comments, sorry for late response, I will try to understand your comments and change the code accordingly :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sryza and @andrewor14 , do we need to avoid requesting executors also here and addExecutors when oldNumExecutorsTarget == numExecutorsTarget, not in initializing status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was fixed in 51898b5
|
Hi all, I just updated the code according to your comments, please help to review, thanks a lot. There's two opening questions I didn't address yet:
Please help to address the concern, thanks a lot and appreciate your time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little verbose, so change to logDebug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, but this conflicts with a patch I just merged
|
Test build #33878 has finished for PR 6430 at commit
|
|
Test build #33881 has finished for PR 6430 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty long-winded. I think we can just do
if (initializing) {
} else if (maxNeeded < numExecutorsTarget) {
} ...
|
@jerryshao Thanks for the update. The latest changes look fairly reasonable. (1) I think you're correct. We don't actually need to wait for the initial set of executors to register to set (2) I was thinking we don't actually need to sync with the cluster manager to get our executors while we're initializing. We don't do that currently, and I don't see why we need to add that. This |
|
Hi @andrewor14 , I just updated the code according to your comments, please help to review, thanks a lot. |
|
Test build #34151 has finished for PR 6430 at commit
|
|
lgtm. @jerryshao have you had a chance to test this on a real cluster? |
|
I just tried in my local pseudo-cluster for functionality, haven't tried in real cluster for performance, I will test it. |
|
@sryza I'm planning to pull this in. Any other comments? |
|
This looks right to me. Merging. |
This patch propose to lazy start the scheduler for dynamic allocation to avoid fast ramp down executor numbers is load is less. This implementation will: 1. immediately start the scheduler is `numExecutorsTarget` is 0, this is the expected behavior. 2. if `numExecutorsTarget` is not zero, start the scheduler until the number is satisfied, if the load is less, this initial started executors will last for at least 60 seconds, user will have a window to submit a job, no need to revamp the executors. 3. if `numExecutorsTarget` is not satisfied until the timeout, this means resource is not enough, the scheduler will start until this timeout, will not wait infinitely. Please help to review, thanks a lot. Author: jerryshao <saisai.shao@intel.com> Closes apache#6430 from jerryshao/SPARK-7699 and squashes the following commits: 02cac8e [jerryshao] Address the comments 7242450 [jerryshao] Remove the useless import ecc0b00 [jerryshao] Address the comments 6f75f00 [jerryshao] Style changes 8b8decc [jerryshao] change the test name fb822ca [jerryshao] Change the solution according to comments 1cc74e5 [jerryshao] Lazy start the scheduler for dynamic allocation
This patch propose to lazy start the scheduler for dynamic allocation to avoid fast ramp down executor numbers is load is less. This implementation will: 1. immediately start the scheduler is `numExecutorsTarget` is 0, this is the expected behavior. 2. if `numExecutorsTarget` is not zero, start the scheduler until the number is satisfied, if the load is less, this initial started executors will last for at least 60 seconds, user will have a window to submit a job, no need to revamp the executors. 3. if `numExecutorsTarget` is not satisfied until the timeout, this means resource is not enough, the scheduler will start until this timeout, will not wait infinitely. Please help to review, thanks a lot. Author: jerryshao <saisai.shao@intel.com> Closes apache#6430 from jerryshao/SPARK-7699 and squashes the following commits: 02cac8e [jerryshao] Address the comments 7242450 [jerryshao] Remove the useless import ecc0b00 [jerryshao] Address the comments 6f75f00 [jerryshao] Style changes 8b8decc [jerryshao] change the test name fb822ca [jerryshao] Change the solution according to comments 1cc74e5 [jerryshao] Lazy start the scheduler for dynamic allocation
This patch propose to lazy start the scheduler for dynamic allocation to avoid fast ramp down executor numbers is load is less.
This implementation will:
numExecutorsTargetis 0, this is the expected behavior.numExecutorsTargetis not zero, start the scheduler until the number is satisfied, if the load is less, this initial started executors will last for at least 60 seconds, user will have a window to submit a job, no need to revamp the executors.numExecutorsTargetis not satisfied until the timeout, this means resource is not enough, the scheduler will start until this timeout, will not wait infinitely.Please help to review, thanks a lot.