[SPARK-13723] [YARN] Change behavior of --num-executors with dynamic allocation.#13338
[SPARK-13723] [YARN] Change behavior of --num-executors with dynamic allocation.#13338rdblue wants to merge 3 commits intoapache:masterfrom
Conversation
|
Test build #59423 has finished for PR 13338 at commit
|
There was a problem hiding this comment.
Maybe we could change to who take precedence to who when both set, using max may not be a good choice.
There was a problem hiding this comment.
I'm not overly concerned with this as long as its clear to the users which way we chose. It feels like the user would use --num-executors more easily or accidentally then specifying spark.dynamicAllocation.initialExecutors.
How about leaving it max and just print a warning if we find both set and tell them what we choose.
I'm also not sure there is a good way to do this right now with the ConfigEntry/Builder stuff thouh. @vanzin
There was a problem hiding this comment.
I don't think the config API can handle both calls to math.max here... you could replace the inner one with a fallback config (initialExecutors falls back to the value of spark.executor.instances if not set), although even then the semantics are slightly different.
In any case, this might be more readable as:
Seq(
conf.getInt("spark.dynamicAllocation.initialExecutors", 0),
conf.getInt("spark.executor.instances", 0),
getDynamicAllocationMinExecutors(conf)).max
Assuming I got the semantics right.
There was a problem hiding this comment.
I'm not sure what you mean by the config API can't handle both calls to max.
I'm happy to make the change for this to be more readable, but I think we should decide on semantics first. I think a reasonable way to handle this is to use the max of all 3, as implemented. The min value should clearly be handled this way -- starting at initial less than min causes an immediate change to the min -- so the question is how to handle both spark.executor.instances and spark.dynamicAllocation.initialExecutors.
I think it's reasonable to use either one in different situations so I don't think it makes much sense to complain if they're both set. My job could have initialExecutors set in its config, but I can run it with --num-executors to bump up the value. Given that use case, I don't think it would be a good idea to have initialExecutors override spark.executor.instances.
That leaves whether spark.executor.instances should override initialExecutors or whether the initial number should be the max of the two. I don't have a strong opinion here, but I opted for the max of the two values.
There was a problem hiding this comment.
I'm not sure what you mean by the config API can't handle both calls to max.
I was replying to Tom's question, you can ignore that part.
There was a problem hiding this comment.
I'm fine with max and as I mentioned I agree that I think users might use or accidentally use --num-executors to bump up the value. Although I'd rather not encourage that behavior because its intermixing these things and --num-executors is ambiguous on the behavior (based on the dynamic allocation on or off), which is why I suggested the warning. The warning was also in case someone specified --num-executors and thought they were getting static number but instead had dynamic allocation on so its initial.
I would also be fine with just an info message here printing what allocation method we are using and the number chosen.
There was a problem hiding this comment.
as stated below we should use DYN_ALLOCATION_MIN_EXECUTORS, DYN_ALLOCATION_INITIAL_EXECUTORS, and DYN_ALLOCATION_MAX_EXECUTORS
There was a problem hiding this comment.
yep, don't add these methods. instead:
import org.apache.spark.internal.config._
conf.get(DYN_ALLOCATION_MIN_EXECUTORS)
Also recommend going through your change and modifying all the places that hardcode the config key to use the constants.
There was a problem hiding this comment.
What's the value of these constants over util methods? These constants live in the yarn module, but there are uses in the ExecutorAllocationManager in core. Rather than having the constants in two places, with possibly different defaults, I thought a util method was appropriate.
There was a problem hiding this comment.
What's the value of these constants over util methods?
They encapsulate key, type and default value already, so you'd avoid duplicating those in your code.
These constants live in the yarn module
No, they're in core/src/main/scala/org/apache/spark/internal/config/package.scala.
Rather than having the constants in two places, with possibly different defaults, I thought a util method was appropriate.
That's exactly what those constants are.
There was a problem hiding this comment.
I see, so those aren't Strings. Thanks, I'll fix it.
|
I think we should also update the spark-submit --help for --num-executors. |
|
Thanks for reviewing, everyone! I've made some comments and will update once we have consensus on util methods and semantics. |
5f69549 to
b27a5b8
Compare
|
@tgravescs, where is the --help text for spark-submit? I'll update it. |
|
I just pushed a version that addresses the comments so far (other than the spark-submit help text). Thanks for looking at this, @vanzin, @jerryshao, and @tgravescs! |
|
Test build #60423 has finished for PR 13338 at commit
|
|
Its in SparkSubmitArguments.scala function printUsageAndExit |
Previously, using --num-executors or spark.executor.instances with dynamic allocation enabled would disable dynamic allocation. This updates the behavior so these settings change the initial number of executors and do not disable dynamic allocation.
b27a5b8 to
47cff98
Compare
| Seq( | ||
| conf.get(DYN_ALLOCATION_MIN_EXECUTORS), | ||
| conf.get(DYN_ALLOCATION_INITIAL_EXECUTORS), | ||
| conf.get(EXECUTOR_INSTANCES).getOrElse(0)).max |
There was a problem hiding this comment.
Do we need to support environment variable SPARK_EXECUTOR_INSTANCES? Since it is not officially deprecated.
There was a problem hiding this comment.
EXECUTOR_INSTANCES isn't deprecated. This is a conf (spark.executor.instances), its not documented but I'm pretty sure I've seen people using it instead of --num-executors and that is what SparkSubmit uses.
There was a problem hiding this comment.
sorry I was mistaken it is documented on the yarn conf page.
There was a problem hiding this comment.
I think what I mean is env variable SPARK_EXECUTOR_INSTANCES, we should also consider this variable when getting dynamic allocation initial executors (in case someone is using this variable).
There was a problem hiding this comment.
oh I see. Well we didn't support it before in this routine and I would prefer not to add more support to env variables as I would rather see them all removed. So I would say no, at least not under this jira. The only place I see it being used is in the YarnSparkHadoopUtil. getInitialTargetExecutorNumber so I would say we file a separate jira to remove it and clean up comments in code in other places.
|
Test build #60427 has finished for PR 13338 at commit
|
|
Test build #60426 has finished for PR 13338 at commit
|
|
we need to update running-on-yarn.md the description for spark.executor.instances as it says dynamic allocation will be turned off when its specified. otherwise LGTM. |
|
ping @rdblue |
|
I really want to get this into the 2.0 release since its a change in behavior if you don't have time I'll put up pr with minor changes. |
|
@tgravescs, I've updated it. Sorry about the delay, for some reason the notifications for this issue didn't make it to my inbox so I wasn't seeing updates. |
|
Test build #61118 has finished for PR 13338 at commit
|
|
+1, thanks @rdblue |
…llocation. ## What changes were proposed in this pull request? This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors. This changes was discussed on [SPARK-13723](https://issues.apache.org/jira/browse/SPARK-13723). I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message. ## How was this patch tested? This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors. Author: Ryan Blue <blue@apache.org> Closes #13338 from rdblue/SPARK-13723-num-executors-with-dynamic-allocation. (cherry picked from commit 738f134) Signed-off-by: Tom Graves <tgraves@yahoo-inc.com>
|
Thanks @tgravescs! |
What changes were proposed in this pull request?
This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors.
This changes was discussed on SPARK-13723. I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message.
How was this patch tested?
This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors.