Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1946] Submit tasks after (configured ratio) executors have been registered #900

Closed
wants to merge 13 commits into from

Conversation

li-zhihui
Copy link
Contributor

Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality.

A simple solution is sleeping few seconds in application, so that executors have enough time to register.

The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered.

Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0

spark.scheduler.minRegisteredExecutorsRatio = 0.8

Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000

spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

Jenkins, test this please. @li-zhihui could you create a JIRA for this and add it to the title of the PR (e.g. SPARK-XXX)?

/cc @mridul @tgravescs @kayousterhout

@sryza
Copy link
Contributor

sryza commented May 28, 2014

This should probably be unified with #634

@li-zhihui li-zhihui changed the title submit stage after (configured ratio of) executors have been registered [SPARK-1946]Submit stage after (configured ratio of) executors have been registered May 28, 2014
@li-zhihui
Copy link
Contributor Author

I create a JIRA and attach a file to describe it.

@mridulm

@li-zhihui li-zhihui changed the title [SPARK-1946]Submit stage after (configured ratio of) executors have been registered [SPARK-1946] Submit stage after (configured ratio of) executors have been registered May 28, 2014
@tgravescs
Copy link
Contributor

This is very similar to SPARK-1453 and this may solve it generically instead of just yarn.

@kayousterhout
Copy link
Contributor

This is also related to SPARK-1937 (#892) -- is fixing that sufficient or is it necessary to have this sleep as well? It seems like this sleep is only necessary if the sleep time is less than the expected increase in task completion time as a result of running non-locally.

@tgravescs
Copy link
Contributor

@kayousterhout @mridulm Looking briefly at pr #892 it seems that is handling locality when executors are added later and I assume some of the locality wait configs come into affect here, but it doesn't just wait for a certain number of executors to be there to start (or period of time), does it? Are the changes good enough where that waiting for a good percentage of the executors no longer matters?

I seems like depending on your resource manager and how busy of a cluster you run on it could take a while (minutes) to get a large number of executors. I think this should just be a configuration and user code should not have to "sleep" or workaround this.

@mridulm
Copy link
Contributor

mridulm commented Jun 11, 2014

This one slipped off my radar, my apologies.
@tgravescs In #892, if there is even a single executor which is process local with any partition, then we start waiting for all levels based on configured timeouts.
Here we are trying to ensure there are sufficient executors available before we start accepting jobs. The intent is slightly differrent

@mridulm
Copy link
Contributor

mridulm commented Jun 11, 2014

Hit submit by mistake, to continue ...
The side effect of not having sufficient executors are different from #892. For example,
a) the default parallelism in yarn is based on number of executors,
b) the number of intermediate files per node for shuffle (this can bring the node down btw)
c) and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?)
and so on ...

@tgravescs
Copy link
Contributor

Thanks @mridulm for the clarification. So it seems like this change would be useful and is similar to what we discussed in SPARK-1453. Are there any general concerns over adding configs to wait? If

I'm seeing more people running into this and would like to get something implemented so each user doesn't have to put sleep in their code.

@kayousterhout
Copy link
Contributor

@li-zhihui it looks like the JIRA you created (https://issues.apache.org/jira/browse/SPARK-1946) describes the issue fixed by #892 and described by a redundant issue (https://issues.apache.org/jira/browse/SPARK-1937). As @mridulm explained (thanks!!), the primary set of issues addressed by this pull request center around the fact that Spark-on-YARN has various performance problems when not enough executors have registered yet. Could you update SPARK-1946 accordingly?

@tgravescs I'm a little nervous about adding more scheduler config options, because I think the average user would have a very difficult time figuring out that their performance problems could be fixed by tuning this particular set of options. The scheduler already has quite a few config options and I think we should be very cautious in adding more (cc @pwendell). On the other hand, as you pointed out, it seems like a user typically wants to wait for some number of executors to become available, and those semantics aren't available to the application -- so we're stuck with adding something to the scheduler code. Is it possible to do this only for the YARN scheduler / do you think it's necessary in standalone too? Doing it only for YARN (and naming the config variable accordingly) could help signal to a naive user when tuning this might help. From @mridulm's description, it sounds like many of the issues here are yarn-specific.

@kayousterhout
Copy link
Contributor

Also, @li-zhihui, can you add a unit test for this?

Second, it looks like this only works in standalone mode and not for YARN (since, as I understand the YARN code, YARN uses YarnClientSchedulerBackend and not SparkDeploySchedulerBackend)? Was that the intention?

}
while(!backend.isReady){
synchronized{
this.wait(100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for not using wait/notify here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for programming simply. :)
If someone would like to implement more backend implementations or change backend.isReady somewhere, they needn't to call NOTIFY().

But, waiting 100 milliseconds maybe too long, is 10 milliseconds OK?

@tgravescs
Copy link
Contributor

yarn uses CoarseGrainedSchedulerBackend for standalone mode and the YarnClientSchedulerBackend (which is based on CoarseGrainedSchedulerBackend) for client mode. I don't see this pr handling that since its not incrementing the totalExecutors. I haven't looked at it in detail yet but I would think you could do it in RegisterExecutor and that would handle both since SparkDeploySchedulerBackend is based on CoarseGrainedSchedulerBackend also.

@tgravescs
Copy link
Contributor

If people think its not useful for other deployments modes then I can look into making it YARN specific. We already have a small sleep to help with this in YarnClusterScheduler.

@li-zhihui
Copy link
Contributor Author

@tgravescs @kayousterhout The PR only work in standalone mode now. But it provide a abstract method isReady() in SchedulerBackend.scala for all backend implementations.

For yarn mode, it seems better to use registered number as threshold.

@li-zhihui li-zhihui changed the title [SPARK-1946] Submit stage after (configured ratio of) executors have been registered [SPARK-1946] Submit stage after (configured number) executors have been registered Jun 12, 2014
@li-zhihui
Copy link
Contributor Author

@tgravescs @mridulm @kayousterhout I add a commit which submit stage after configured number executor are registered. I think it will work well in yarn mode.

submit stage only after successfully registered executors arrived the number, default value 0

spark.executor.minRegisteredNum = 20

@sryza
Copy link
Contributor

sryza commented Jun 12, 2014

Sorry to jump in late on this, but I think spark.executor.minRegisteredNum sounds like an executor property, when this is a property of the driver.

@li-zhihui
Copy link
Contributor Author

Thanks @sryza How about spark.scheduler.minRegisteredExecutors?

ready = true
return true
}
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need use "return" in last statement i think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CrazyJvm. But if I remove "return", the method always return true. I don't know why. I use Scala 2.10.3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override def isReady(): Boolean = {
if ((System.currentTimeMillis() - createTime) >= maxRegisteredWaitingTime) {
ready = true
}
ready
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @adrian-wang , but I think it's necessary to return true quickly, because ready is true most time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CrazyJvm I made a mistake, the last "return" is not necessary.

@li-zhihui
Copy link
Contributor Author

@tgravescs @sryza I add a commit support the feature(with percentage style) for yarn mode, test successfully in yarn 2.2.0.

About default value of minRegisteredRatio, Yarn mode is 0.9, Standalone mode is 0.

@li-zhihui li-zhihui changed the title [SPARK-1946] Submit stage after (configured number) executors have been registered [SPARK-1946] Submit stage after (configured ratio) executors have been registered Jun 16, 2014
@@ -77,6 +77,12 @@ private[spark] class YarnClientSchedulerBackend(

logDebug("ClientArguments called with: " + argsArrayBuf)
val args = new ClientArguments(argsArrayBuf.toArray, conf)
totalExecutors.set(args.numExecutors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that yarn has 2 modes - yarn-client and yarn-cluster mode. This changes it for yarn-client mode, but not yarn-cluster mode. yarn-cluster mode currently just uses CoarseGrainedSchedulerBackend directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tgravescs

@li-zhihui
Copy link
Contributor Author

@tgravescs I add a commit support yarn-cluster.

A little issue, the YarnClusterSchedulerBackend can't get --num-executors as totalExecutors currently(spark-default.xml is ok).
I will follow it.
(Fixed)

@li-zhihui
Copy link
Contributor Author

@tgravescs @kayousterhout
I add a new commit according to comments.

@li-zhihui
Copy link
Contributor Author

@tgravescs @kayousterhout
I move waitBackendReady back to submitTasks method, because it (waitBackendReady in start method) dose not work on yarn-cluster mode (NullPointException because SparkContext initialize timeout) (yarn-client is ok). (abandoned)

@li-zhihui
Copy link
Contributor Author

@tgravescs @kayousterhout
It will lead to a logic deadlock in yarn-cluster mode, if waitBackendReady is in TaskSchedulerImpl.start.

How about move it (waitBackendReady) to postStartHook() ?

val conf = scheduler.sc.conf
private val timeout = AkkaUtils.askTimeout(conf)
private val akkaFrameSize = AkkaUtils.maxFrameSizeBytes(conf)
// Submit tasks only after (registered executors / total expected executors)
// is equal to at least this value, that is double between 0 and 1.
var minRegisteredRatio = conf.getDouble("spark.scheduler.minRegisteredExecutorsRatio", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the new configs to the user docs - see docs/configuration.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgravescs Done.

@tgravescs
Copy link
Contributor

postStartHook seems like a good place to put it. It will only be called once there and as you said Yarn cluster mode actually waits for the sparkcontext to be initialized before allocating executors.

It looks like we aren't handling mesos. We should atleast file a jira for this.
@li-zhihui did you look at mesos at all?

For the yarn side where you added the TODO's about the sleep. I think we can leave them here as there is another jira to remove them.

@li-zhihui
Copy link
Contributor Author

Thanks @tgravescs
I will file a new jira for handling mesos and follow it after the PR merged.

<td><code>spark.scheduler.maxRegisteredExecutorsWaitingTime</code></td>
<td>30000</td>
<td>
Whatever (registered executors / total expected executors) is reached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should clarify both of these a bit because its really you start when either one is hit so I think adding reference to maxRegisteredExecutorsWaitingTime from the description of minRegisteredExecutorsRatio would be good.

How about something like below? Note I'm not a doc writer so I'm fine with changing.

for spark.scheduler.minRegisteredExecutorsRatio:
The minimum ratio of registered executors (registered executors / total expected executors) to wait for before scheduling begins. Specified as a double between 0 and 1. Regardless of whether the minimum ratio of executors has been reached, the maximum amount of time it will wait before scheduling begins is controlled by config spark.scheduler.maxRegisteredExecutorsWaitingTime .

Then for spark.scheduler.maxRegisteredExecutorsWaitingTime:
Maximum amount of time to wait for executors to register before scheduling begins (in milliseconds).

@li-zhihui
Copy link
Contributor Author

@tgravescs add a commit according to comments.

@tgravescs
Copy link
Contributor

Looks good. Thanks @li-zhihui

@asfgit asfgit closed this in 3dd8af7 Jul 14, 2014
gzm55 pushed a commit to MediaV/spark that referenced this pull request Jul 18, 2014
…n registered

Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality.

A simple solution is sleeping few seconds in application, so that executors have enough time to register.

The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered.

\# Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0
spark.scheduler.minRegisteredExecutorsRatio = 0.8

\# Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000
spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000

Author: li-zhihui <zhihui.li@intel.com>

Closes apache#900 from li-zhihui/master and squashes the following commits:

b9f8326 [li-zhihui] Add logs & edit docs
1ac08b1 [li-zhihui] Add new configs to user docs
22ead12 [li-zhihui] Move waitBackendReady to postStartHook
c6f0522 [li-zhihui] Bug fix: numExecutors wasn't set & use constant DEFAULT_NUMBER_EXECUTORS
4d6d847 [li-zhihui] Move waitBackendReady to TaskSchedulerImpl.start & some code refactor
0ecee9a [li-zhihui] Move waitBackendReady from DAGScheduler.submitStage to TaskSchedulerImpl.submitTasks
4261454 [li-zhihui] Add docs for new configs & code style
ce0868a [li-zhihui] Code style, rename configuration property name of minRegisteredRatio & maxRegisteredWaitingTime
6cfb9ec [li-zhihui] Code style, revert default minRegisteredRatio of yarn to 0, driver get --num-executors in yarn/alpha
812c33c [li-zhihui] Fix driver lost --num-executors option in yarn-cluster mode
e7b6272 [li-zhihui] support yarn-cluster
37f7dc2 [li-zhihui] support yarn mode(percentage style)
3f8c941 [li-zhihui] submit stage after (configured ratio of) executors have been registered
asfgit pushed a commit that referenced this pull request Aug 9, 2014
…lone mode

In SPARK-1946(PR #900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set.

Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready.

Author: li-zhihui <zhihui.li@intel.com>
Author: Li Zhihui <zhihui.li@intel.com>

Closes #1525 from li-zhihui/fixre4s and squashes the following commits:

e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes
abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes
ca54bd9 [li-zhihui] Format log with String interpolation
88c7dc6 [li-zhihui] Few codes and docs refactor
41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode
(cherry picked from commit 28dbae8)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
asfgit pushed a commit that referenced this pull request Aug 9, 2014
…lone mode

In SPARK-1946(PR #900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set.

Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready.

Author: li-zhihui <zhihui.li@intel.com>
Author: Li Zhihui <zhihui.li@intel.com>

Closes #1525 from li-zhihui/fixre4s and squashes the following commits:

e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes
abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes
ca54bd9 [li-zhihui] Format log with String interpolation
88c7dc6 [li-zhihui] Few codes and docs refactor
41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…n registered

Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality.

A simple solution is sleeping few seconds in application, so that executors have enough time to register.

The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered.

\# Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0
spark.scheduler.minRegisteredExecutorsRatio = 0.8

\# Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000
spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000

Author: li-zhihui <zhihui.li@intel.com>

Closes apache#900 from li-zhihui/master and squashes the following commits:

b9f8326 [li-zhihui] Add logs & edit docs
1ac08b1 [li-zhihui] Add new configs to user docs
22ead12 [li-zhihui] Move waitBackendReady to postStartHook
c6f0522 [li-zhihui] Bug fix: numExecutors wasn't set & use constant DEFAULT_NUMBER_EXECUTORS
4d6d847 [li-zhihui] Move waitBackendReady to TaskSchedulerImpl.start & some code refactor
0ecee9a [li-zhihui] Move waitBackendReady from DAGScheduler.submitStage to TaskSchedulerImpl.submitTasks
4261454 [li-zhihui] Add docs for new configs & code style
ce0868a [li-zhihui] Code style, rename configuration property name of minRegisteredRatio & maxRegisteredWaitingTime
6cfb9ec [li-zhihui] Code style, revert default minRegisteredRatio of yarn to 0, driver get --num-executors in yarn/alpha
812c33c [li-zhihui] Fix driver lost --num-executors option in yarn-cluster mode
e7b6272 [li-zhihui] support yarn-cluster
37f7dc2 [li-zhihui] support yarn mode(percentage style)
3f8c941 [li-zhihui] submit stage after (configured ratio of) executors have been registered
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…lone mode

In SPARK-1946(PR apache#900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set.

Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready.

Author: li-zhihui <zhihui.li@intel.com>
Author: Li Zhihui <zhihui.li@intel.com>

Closes apache#1525 from li-zhihui/fixre4s and squashes the following commits:

e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes
abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes
ca54bd9 [li-zhihui] Format log with String interpolation
88c7dc6 [li-zhihui] Few codes and docs refactor
41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants