-
Notifications
You must be signed in to change notification settings - Fork 29.1k
SPARK-13688: Add spark.dynamicAllocation.overrideNumInstances. #11528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When both the number of executor instances and dynamic allocation are configured, dynamic allocation is ignored and the number of executors is statically set. This introduces a configuration property to change this behavior so that cluster administrators can make dynamic allocation the default when users set --num-executors instead of spark.dynamicAllocation.minExecutors and unintentionally disable dynamic allocation for a job.
|
Can one of the admins verify this patch? |
|
I am against this change. We're creating a flag to control the behavior of two other flags. This level of complexity is difficult to maintain. We've already established the precedence order in SPARK-9092 so we shouldn't try to change that. Besides the simple workaround is just to not set one of the configs. |
|
@andrewor14, thanks for taking a look at this so quickly. From what we've seen, what you suggest isn't a viable work-around in practice because the two properties are set at different times and aren't obviously related. We default jobs to dynamic allocation, so that's an admin configuration most users don't see (unless they want to opt in for static). Users set up their applications and it isn't clear to at that time that requesting some number of executors results in static allocation. They also don't necessarily see the notification in their job logs. I think a reasonable interpretation is that --num-executors would control the minimum or the initial number of executors. It's fine with me that the default goes the other way, but we lose a lot of resources to accidental static allocation. I think this additional option is worth the trade-off, but I'm happy to discuss other approaches. The main problem with your suggestion is just that users aren't aware that the two are related. |
| def isDynamicAllocationEnabled(conf: SparkConf): Boolean = { | ||
| val numExecutor = conf.getInt("spark.executor.instances", 0) | ||
| val dynamicAllocationEnabled = conf.getBoolean("spark.dynamicAllocation.enabled", false) | ||
| val overrideExecutors = conf.getBoolean("spark.dynamicAllocation.overrideNumInstances", false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this just piles onto the confusion though, with yet another flag to control flags. I think this is a documentation issue, and it already generates warnings when you set these things together. I personally favor leaving it as is.
|
Letting I agree it's an issue for something like CDH that sets dynamic allocation by default, which is a good default. I still think the best solution is docs. |
|
I agree with the above that we shouldn't add another config for this. I would rather see the default behavior changed in the 2.x line. If dynamic allocation config is on then num-executors goes to max and initial # of executors. I think this would allow users to easily cap their usage and would still allow it to free up executors. It would also allow users doing ML start out with a # of executors and if they are actually caching the data the executors wouldn't be freed up. So you would get very similar behavior to if dynamic allocation was off. thoughts? We have been running into this issue a lot recently on our clusters where we see a lot of wasted resources and so the dynamic allocation would help but users are so used to specifying --num-executors they don't end up using it. |
|
filed https://issues.apache.org/jira/browse/SPARK-13723 for this so we can move discussion there |
|
OK, either way I think we've decided we're not going to proceed with the changes in this patch. @rdblue would you mind closing this PR? We'll continue the discussion on the JIRA. |
|
Close |
When both the number of executor instances and dynamic allocation are configured, dynamic allocation is ignored and the number of executors is statically set. This introduces a configuration property to change this behavior so that cluster administrators can make dynamic allocation the default when users set
--num-executorsinstead ofspark.dynamicAllocation.minExecutorsand unintentionally disable dynamic allocation for a job.This includes unit tests for the new cases and updates to documentation.