-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-9092] Fixed incompatibility when both num-executors and dynamic... #7657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Jenkins, this is ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is better as ! ... _conf.contains(...)
? Although I don't think 0 executors is allowed, this logic would not work if it were.
The docs in running-on-yarn.md
need to update to explain the new behavior a little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, +1 to @srowen's comment. We also need to add a comment here to explain why it's not enabled if we set spark.executor.instances
, something like
// Dynamic allocation and explicitly setting the number of executors are inherently
// incompatible. In environments where dynamic allocation is turned on by default,
// the latter should override the former (SPARK-9092).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is now duplicated in a few places, maybe it makes sense to make a Utils.isDynamicAllocationEnabled(conf)
method and move the comment there instead.
The thing I'm not 100% clear on is whether CC @sryza and/or @andrewor14 for thoughts |
It depends on whether it's client or cluster mode. In client mode, this is used:
So the command line arg is turned into a sys prop. In cluster mode, this is used:
So the handling is now in YARN's @neurons I suggest getting rid of "--num-executors" in |
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this test assert something stronger, that we actually end up with 6 executors?
+1 to @vanzin's suggestion. I believe this patch in its current state will not work in cluster mode. |
Also, should we at least log a warning that dynamic allocation is overshadowed and not actually used? There is not another way for the user to find out otherwise; at best they could wait for N seconds to see if their executors actually got killed. |
Test build #38736 has finished for PR 7657 at commit
|
Test build #39119 has finished for PR 7657 at commit
|
Test build #39397 has finished for PR 7657 at commit
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
===
Looking good. There's still one place in |
Test build #192 has finished for PR 7657 at commit
|
Test build #39559 has finished for PR 7657 at commit
|
retest this please |
Test build #1318 has finished for PR 7657 at commit
|
Last failure is legit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW you should also remove the numExecutors
field from this class; that should uncover a couple of other spots where you might need to also fix things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous behavior seems to be to only consider SPARK_WORKER_INSTANCES
if spark.executor.instances
is not set.
…c allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.
… Modified dependencies of this change deeper down in the code base.
…present before logWarning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you missed my previous comment so I'll just paste it here, since github makes it hard to find otherwise:
Actually, this code is not right after your changes. The value of spark.executor.instances, previously, was either the value set by the user (if dynamic allocation is disabled), or the value of spark.dynamicAllocation.initialExecutors if dynamic allocation is enabled (see ClientArguments.loadEnvironmentArgs).
So, instead, it should check both cases. Something like:
@volatile private var targetNumExecutors =
if (Utils.isDynamicAllocationEnabled(sparkConf)) {
sparkConf.getInt("spark.dynamicAllocation.initialExecutors", 0)
} else {
sparkConf.getInt("spark.executor.instances", YarnSparkHadoopUtil.DEFAULT_NUMBER_EXECUTORS)
}
The way these options are propagate is still a little confusing (and the lack of tests doesn't help), so I hope that's enough. There might be some other cleanup possible, but I'm not gonna ask you to go there.
Test build #40312 has finished for PR 7657 at commit
|
Test build #40376 has finished for PR 7657 at commit
|
Alright, LGTM. Let's try jenkins again. retest this please |
(I'll try to run some local tests later on too.) |
Thank you @vanzin and @andrewor14 for your reviews. |
Test build #40489 has finished for PR 7657 at commit
|
Tested this locally and it looks good. @andrewor14 did you have any other comments? Test failures are unrelated (pyspark?) so I'll just ignore them. |
Alright, will merge later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to interpolate here
Looks fine, though I won't have a chance to test this out on a real cluster and I think we should before the release. Feel free to merge it. |
I tested yesterday, client and cluster mode, conf and command line options. All seems fine. |
Test build #40661 has finished for PR 7657 at commit
|
42nd time is the charm, I guess. Merging. |
…c... … allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext. Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com> Closes #7657 from neurons/SPARK-9092. (cherry picked from commit 738f353) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Merged to master and 1.5. Thanks! |
…c... … allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext. Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com> Closes apache#7657 from neurons/SPARK-9092.
… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.