Skip to content

[SPARK-13723] [YARN] Change behavior of --num-executors with dynamic allocation.#13338

Closed
rdblue wants to merge 3 commits intoapache:masterfrom
rdblue:SPARK-13723-num-executors-with-dynamic-allocation
Closed

[SPARK-13723] [YARN] Change behavior of --num-executors with dynamic allocation.#13338
rdblue wants to merge 3 commits intoapache:masterfrom
rdblue:SPARK-13723-num-executors-with-dynamic-allocation

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented May 26, 2016

What changes were proposed in this pull request?

This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors.

This changes was discussed on SPARK-13723. I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message.

How was this patch tested?

This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors.

@SparkQA
Copy link

SparkQA commented May 27, 2016

Test build #59423 has finished for PR 13338 at commit 5f69549.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could change to who take precedence to who when both set, using max may not be a good choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not overly concerned with this as long as its clear to the users which way we chose. It feels like the user would use --num-executors more easily or accidentally then specifying spark.dynamicAllocation.initialExecutors.

How about leaving it max and just print a warning if we find both set and tell them what we choose.

I'm also not sure there is a good way to do this right now with the ConfigEntry/Builder stuff thouh. @vanzin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the config API can handle both calls to math.max here... you could replace the inner one with a fallback config (initialExecutors falls back to the value of spark.executor.instances if not set), although even then the semantics are slightly different.

In any case, this might be more readable as:

Seq(
  conf.getInt("spark.dynamicAllocation.initialExecutors", 0),
  conf.getInt("spark.executor.instances", 0),
  getDynamicAllocationMinExecutors(conf)).max

Assuming I got the semantics right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by the config API can't handle both calls to max.

I'm happy to make the change for this to be more readable, but I think we should decide on semantics first. I think a reasonable way to handle this is to use the max of all 3, as implemented. The min value should clearly be handled this way -- starting at initial less than min causes an immediate change to the min -- so the question is how to handle both spark.executor.instances and spark.dynamicAllocation.initialExecutors.

I think it's reasonable to use either one in different situations so I don't think it makes much sense to complain if they're both set. My job could have initialExecutors set in its config, but I can run it with --num-executors to bump up the value. Given that use case, I don't think it would be a good idea to have initialExecutors override spark.executor.instances.

That leaves whether spark.executor.instances should override initialExecutors or whether the initial number should be the max of the two. I don't have a strong opinion here, but I opted for the max of the two values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by the config API can't handle both calls to max.

I was replying to Tom's question, you can ignore that part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with max and as I mentioned I agree that I think users might use or accidentally use --num-executors to bump up the value. Although I'd rather not encourage that behavior because its intermixing these things and --num-executors is ambiguous on the behavior (based on the dynamic allocation on or off), which is why I suggested the warning. The warning was also in case someone specified --num-executors and thought they were getting static number but instead had dynamic allocation on so its initial.

I would also be fine with just an info message here printing what allocation method we are using and the number chosen.

@rdblue rdblue changed the title SPARK-13723: Change behavior of --num-executors with dynamic allocation. [SPARK-13723] [YARN] Change behavior of --num-executors with dynamic allocation. Jun 2, 2016
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as stated below we should use DYN_ALLOCATION_MIN_EXECUTORS, DYN_ALLOCATION_INITIAL_EXECUTORS, and DYN_ALLOCATION_MAX_EXECUTORS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, don't add these methods. instead:

import org.apache.spark.internal.config._

conf.get(DYN_ALLOCATION_MIN_EXECUTORS)

Also recommend going through your change and modifying all the places that hardcode the config key to use the constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value of these constants over util methods? These constants live in the yarn module, but there are uses in the ExecutorAllocationManager in core. Rather than having the constants in two places, with possibly different defaults, I thought a util method was appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value of these constants over util methods?

They encapsulate key, type and default value already, so you'd avoid duplicating those in your code.

These constants live in the yarn module

No, they're in core/src/main/scala/org/apache/spark/internal/config/package.scala.

Rather than having the constants in two places, with possibly different defaults, I thought a util method was appropriate.

That's exactly what those constants are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so those aren't Strings. Thanks, I'll fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@tgravescs
Copy link
Contributor

I think we should also update the spark-submit --help for --num-executors.

@rdblue
Copy link
Contributor Author

rdblue commented Jun 10, 2016

Thanks for reviewing, everyone! I've made some comments and will update once we have consensus on util methods and semantics.

@rdblue rdblue force-pushed the SPARK-13723-num-executors-with-dynamic-allocation branch from 5f69549 to b27a5b8 Compare June 13, 2016 20:11
@rdblue
Copy link
Contributor Author

rdblue commented Jun 13, 2016

@tgravescs, where is the --help text for spark-submit? I'll update it.

@rdblue
Copy link
Contributor Author

rdblue commented Jun 13, 2016

I just pushed a version that addresses the comments so far (other than the spark-submit help text). Thanks for looking at this, @vanzin, @jerryshao, and @tgravescs!

@SparkQA
Copy link

SparkQA commented Jun 13, 2016

Test build #60423 has finished for PR 13338 at commit b27a5b8.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

Its in SparkSubmitArguments.scala function printUsageAndExit

Previously, using --num-executors or spark.executor.instances with
dynamic allocation enabled would disable dynamic allocation. This
updates the behavior so these settings change the initial number of
executors and do not disable dynamic allocation.
@rdblue rdblue force-pushed the SPARK-13723-num-executors-with-dynamic-allocation branch from b27a5b8 to 47cff98 Compare June 13, 2016 20:29
Seq(
conf.get(DYN_ALLOCATION_MIN_EXECUTORS),
conf.get(DYN_ALLOCATION_INITIAL_EXECUTORS),
conf.get(EXECUTOR_INSTANCES).getOrElse(0)).max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to support environment variable SPARK_EXECUTOR_INSTANCES? Since it is not officially deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXECUTOR_INSTANCES isn't deprecated. This is a conf (spark.executor.instances), its not documented but I'm pretty sure I've seen people using it instead of --num-executors and that is what SparkSubmit uses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I was mistaken it is documented on the yarn conf page.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I mean is env variable SPARK_EXECUTOR_INSTANCES, we should also consider this variable when getting dynamic allocation initial executors (in case someone is using this variable).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see. Well we didn't support it before in this routine and I would prefer not to add more support to env variables as I would rather see them all removed. So I would say no, at least not under this jira. The only place I see it being used is in the YarnSparkHadoopUtil. getInitialTargetExecutorNumber so I would say we file a separate jira to remove it and clean up comments in code in other places.

@SparkQA
Copy link

SparkQA commented Jun 13, 2016

Test build #60427 has finished for PR 13338 at commit bf22b5a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 13, 2016

Test build #60426 has finished for PR 13338 at commit 47cff98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

we need to update running-on-yarn.md the description for spark.executor.instances as it says dynamic allocation will be turned off when its specified.

otherwise LGTM.

@tgravescs
Copy link
Contributor

ping @rdblue

@tgravescs
Copy link
Contributor

I really want to get this into the 2.0 release since its a change in behavior if you don't have time I'll put up pr with minor changes.

@rdblue
Copy link
Contributor Author

rdblue commented Jun 23, 2016

@tgravescs, I've updated it. Sorry about the delay, for some reason the notifications for this issue didn't make it to my inbox so I wasn't seeing updates.

@SparkQA
Copy link

SparkQA commented Jun 23, 2016

Test build #61118 has finished for PR 13338 at commit 0352883.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

+1, thanks @rdblue

asfgit pushed a commit that referenced this pull request Jun 23, 2016
…llocation.

## What changes were proposed in this pull request?

This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors.

This changes was discussed on [SPARK-13723](https://issues.apache.org/jira/browse/SPARK-13723). I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message.

## How was this patch tested?

This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors.

Author: Ryan Blue <blue@apache.org>

Closes #13338 from rdblue/SPARK-13723-num-executors-with-dynamic-allocation.

(cherry picked from commit 738f134)
Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>
@asfgit asfgit closed this in 738f134 Jun 23, 2016
@rdblue
Copy link
Contributor Author

rdblue commented Jun 23, 2016

Thanks @tgravescs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants