-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11555] spark on yarn spark-class --num-workers doesn't work #9523
Conversation
Note this should go into branch-1.6 and branch-1.5 also |
Test build #45226 has finished for PR 9523 at commit
|
This looks like the right fix (modulo the long line). Should the param be something like a "current num executors" or a "default"? its default value is a default value after all. It sort of seems like CC @jerryshao |
I fixed the scala style error. I'm fine with changing the name of the parameter but don't like default as its not always the default. It could be what the user specified. |
'default' with respect to the method... dunno, it seems strange that the purpose of the method is to compute a number of executors to use but it takes this as a param. Really it's the value to fall back on unless otherwise specified. I don't feel strongly; the fix is right. |
Test build #45247 has finished for PR 9523 at commit
|
Test build #1997 has finished for PR 9523 at commit
|
Test build #45254 has finished for PR 9523 at commit
|
LGTM. One day I'd like to see this duplication of command-line arguments vs. SparkConf entries go away... Merging to the 3 branches. |
I tested the various options with both spark-submit and spark-class of specifying number of executors in both client and cluster mode where it applied. --num-workers, --num-executors, spark.executor.instances, SPARK_EXECUTOR_INSTANCES, default nothing supplied Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #9523 from tgravescs/SPARK-11555. (cherry picked from commit f6680cd) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
I tested the various options with both spark-submit and spark-class of specifying number of executors in both client and cluster mode where it applied. --num-workers, --num-executors, spark.executor.instances, SPARK_EXECUTOR_INSTANCES, default nothing supplied Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #9523 from tgravescs/SPARK-11555. (cherry picked from commit f6680cd) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@tgravescs , sorry for missing this parameter So maybe we should deprecate this parameter, as @vanzin said current there're so many ways we could set yarn related configurations, it is hard to manage and easy to introduce bug, also confused a lot. |
Also this command is quite like a legacy code:
Is it better to move forward? Compatible with old code really makes this part of configuration messy and inconsistent with the arguments of |
@jerryshao |
Ahhh, very sorry I misunderstood this PR. Yes, it is my bad to ignore this command line argument ( BTW @srowen , such way of invoking yarn client already has a warning. |
Looking at the code again, looks like there's already some codes take care of this backward compatibility in yarn client. The problem is that it is not worked because So simply we could fix this by moving this part of code into here. // to maintain backwards-compatibility
if (!Utils.isDynamicAllocationEnabled(sparkConf)) {
sparkConf.set("spark.executor.instances", args.numExecutors.toString)
} That is more consistent with SparkSubmit, also cmd line args can override configuration What do you think? |
Yeah I meant these things all already generate a warning. I tend to agree with your additional change for this JIRA. Any other opinions? |
I tested the various options with both spark-submit and spark-class of specifying number of executors in both client and cluster mode where it applied.
--num-workers, --num-executors, spark.executor.instances, SPARK_EXECUTOR_INSTANCES, default nothing supplied