-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13885][YARN] Fix attempt id regression for Spark running on Yarn #11721
Conversation
…ation running in yarn cluster mode" This reverts commit 9e86e6e.
Test build #53183 has finished for PR 11721 at commit
|
@@ -374,12 +374,6 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli | |||
throw new SparkException("An application name must be set in your configuration") | |||
} | |||
|
|||
// System property spark.yarn.app.id must be set if user code ran by AM on a YARN cluster | |||
if (master == "yarn" && deployMode == "cluster" && !_conf.contains("spark.yarn.app.id")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this was introduced so that people don't try new SparkContext(new SparkConf().setMaster("yarn-cluster"))
and then complain when they get really odd error messages.
Is there another way to do this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, removing this will still get other error messages if trying to run cluster mode with such way, but maybe a little odd as you mentioned.
LGTM aside from small comment. @steveloughran should take a look since IIRC he wanted the full attempt id for something. |
I think |
@@ -133,12 +133,9 @@ private[spark] class ApplicationMaster( | |||
System.setProperty("spark.master", "yarn") | |||
System.setProperty("spark.submit.deployMode", "cluster") | |||
|
|||
// Propagate the application ID so that YarnClusterSchedulerBackend can pick it up. | |||
// Set this internal configuration to true if it is running on cluster mode, this | |||
// configuration will be checked in SparkContext to avoid misuse of yarn cluster mode. | |||
System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using configuration is a simple and safe way to check the misuse of configuration, so here I revert back this configuration spark.yarn.app.id
, but only use to check whether app is running on yarn-cluster mode correctly.
Test build #53281 has finished for PR 11721 at commit
|
Test build #53282 has finished for PR 11721 at commit
|
Ping @vazin, what about the current fix, would you please review again, thanks a lot. |
That vazin guy must be wondering why he was pinged. :-) Sorry, I've been busy with other things, I'll try to take a look today. |
LGTM, merging to master. Thanks! |
## What changes were proposed in this pull request? This regression is introduced in apache#9182, previously attempt id is simply as counter "1" or "2". With the change of apache#9182, it is changed to full name as "appattemtp-xxx-00001", this will affect all the parts which uses this attempt id, like event log file name, history server app url link. So here change it back to the counter to keep consistent with previous code. Also revert back this patch apache#11518, this patch fix the url link of history log according to the new way of attempt id, since here we change back to the previous way, so this patch is not necessary, here to revert it. Also clean "spark.yarn.app.id" and "spark.yarn.app.attemptId", since it is useless now. ## How was this patch tested? Test it with unit test and manually test different scenario: 1. application running in yarn-client mode. 2. application running in yarn-cluster mode. 3. application running in yarn-cluster mode with multiple attempts. Checked both the event log file name and url link. CC vanzin tgravescs , please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes apache#11721 from jerryshao/SPARK-13885.
What changes were proposed in this pull request?
This regression is introduced in #9182, previously attempt id is simply as counter "1" or "2". With the change of #9182, it is changed to full name as "appattemtp-xxx-00001", this will affect all the parts which uses this attempt id, like event log file name, history server app url link. So here change it back to the counter to keep consistent with previous code.
Also revert back this patch #11518, this patch fix the url link of history log according to the new way of attempt id, since here we change back to the previous way, so this patch is not necessary, here to revert it.
Also clean "spark.yarn.app.id" and "spark.yarn.app.attemptId", since it is useless now.
How was this patch tested?
Test it with unit test and manually test different scenario:
Checked both the event log file name and url link.
CC @vanzin @tgravescs , please help to review, thanks a lot.