SPARK-13688: Add spark.dynamicAllocation.overrideNumInstances. #11528

rdblue · 2016-03-04T22:19:05Z

When both the number of executor instances and dynamic allocation are configured, dynamic allocation is ignored and the number of executors is statically set. This introduces a configuration property to change this behavior so that cluster administrators can make dynamic allocation the default when users set --num-executors instead of spark.dynamicAllocation.minExecutors and unintentionally disable dynamic allocation for a job.

This includes unit tests for the new cases and updates to documentation.

When both the number of executor instances and dynamic allocation are configured, dynamic allocation is ignored and the number of executors is statically set. This introduces a configuration property to change this behavior so that cluster administrators can make dynamic allocation the default when users set --num-executors instead of spark.dynamicAllocation.minExecutors and unintentionally disable dynamic allocation for a job.

AmplabJenkins · 2016-03-04T22:22:14Z

Can one of the admins verify this patch?

andrewor14 · 2016-03-04T22:33:42Z

I am against this change. We're creating a flag to control the behavior of two other flags. This level of complexity is difficult to maintain. We've already established the precedence order in SPARK-9092 so we shouldn't try to change that. Besides the simple workaround is just to not set one of the configs.

rdblue · 2016-03-05T01:29:49Z

@andrewor14, thanks for taking a look at this so quickly.

From what we've seen, what you suggest isn't a viable work-around in practice because the two properties are set at different times and aren't obviously related. We default jobs to dynamic allocation, so that's an admin configuration most users don't see (unless they want to opt in for static). Users set up their applications and it isn't clear to at that time that requesting some number of executors results in static allocation. They also don't necessarily see the notification in their job logs.

I think a reasonable interpretation is that --num-executors would control the minimum or the initial number of executors. It's fine with me that the default goes the other way, but we lose a lot of resources to accidental static allocation. I think this additional option is worth the trade-off, but I'm happy to discuss other approaches. The main problem with your suggestion is just that users aren't aware that the two are related.

srowen · 2016-03-05T08:41:34Z

core/src/main/scala/org/apache/spark/util/Utils.scala

  def isDynamicAllocationEnabled(conf: SparkConf): Boolean = {
    val numExecutor = conf.getInt("spark.executor.instances", 0)
    val dynamicAllocationEnabled = conf.getBoolean("spark.dynamicAllocation.enabled", false)
+    val overrideExecutors = conf.getBoolean("spark.dynamicAllocation.overrideNumInstances", false)


I think this just piles onto the confusion though, with yet another flag to control flags. I think this is a documentation issue, and it already generates warnings when you set these things together. I personally favor leaving it as is.

srowen · 2016-03-05T08:45:11Z

Letting --num-executors control initial executors is more reasonable to me. We unfortunately already have a different flag for that, which makes it kind of funny. It also only controls initial executors; someone who thinks "this job needs more resource" really wants to up the max, or not enable dynamic allocation at all.

I agree it's an issue for something like CDH that sets dynamic allocation by default, which is a good default. I still think the best solution is docs.

tgravescs · 2016-03-07T15:50:18Z

I agree with the above that we shouldn't add another config for this. I would rather see the default behavior changed in the 2.x line. If dynamic allocation config is on then num-executors goes to max and initial # of executors. I think this would allow users to easily cap their usage and would still allow it to free up executors. It would also allow users doing ML start out with a # of executors and if they are actually caching the data the executors wouldn't be freed up. So you would get very similar behavior to if dynamic allocation was off.

thoughts?

We have been running into this issue a lot recently on our clusters where we see a lot of wasted resources and so the dynamic allocation would help but users are so used to specifying --num-executors they don't end up using it.

tgravescs · 2016-03-07T17:04:18Z

filed https://issues.apache.org/jira/browse/SPARK-13723 for this so we can move discussion there

andrewor14 · 2016-03-28T23:59:12Z

OK, either way I think we've decided we're not going to proceed with the changes in this patch. @rdblue would you mind closing this PR? We'll continue the discussion on the JIRA.

rdblue · 2016-03-30T17:08:39Z

Close

srowen reviewed Mar 5, 2016
View reviewed changes

rdblue closed this Mar 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-13688: Add spark.dynamicAllocation.overrideNumInstances. #11528

SPARK-13688: Add spark.dynamicAllocation.overrideNumInstances. #11528

Uh oh!

rdblue commented Mar 4, 2016

Uh oh!

AmplabJenkins commented Mar 4, 2016

Uh oh!

andrewor14 commented Mar 4, 2016

Uh oh!

rdblue commented Mar 5, 2016

Uh oh!

srowen Mar 5, 2016

Uh oh!

srowen commented Mar 5, 2016

Uh oh!

tgravescs commented Mar 7, 2016

Uh oh!

tgravescs commented Mar 7, 2016

Uh oh!

andrewor14 commented Mar 28, 2016

Uh oh!

rdblue commented Mar 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SPARK-13688: Add spark.dynamicAllocation.overrideNumInstances. #11528

SPARK-13688: Add spark.dynamicAllocation.overrideNumInstances. #11528

Uh oh!

Conversation

rdblue commented Mar 4, 2016

Uh oh!

AmplabJenkins commented Mar 4, 2016

Uh oh!

andrewor14 commented Mar 4, 2016

Uh oh!

rdblue commented Mar 5, 2016

Uh oh!

srowen Mar 5, 2016

Choose a reason for hiding this comment

Uh oh!

srowen commented Mar 5, 2016

Uh oh!

tgravescs commented Mar 7, 2016

Uh oh!

tgravescs commented Mar 7, 2016

Uh oh!

andrewor14 commented Mar 28, 2016

Uh oh!

rdblue commented Mar 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants