Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13885][YARN] Fix attempt id regression for Spark running on Yarn #11721

Closed
wants to merge 4 commits into from

Conversation

jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

This regression is introduced in #9182, previously attempt id is simply as counter "1" or "2". With the change of #9182, it is changed to full name as "appattemtp-xxx-00001", this will affect all the parts which uses this attempt id, like event log file name, history server app url link. So here change it back to the counter to keep consistent with previous code.

Also revert back this patch #11518, this patch fix the url link of history log according to the new way of attempt id, since here we change back to the previous way, so this patch is not necessary, here to revert it.

Also clean "spark.yarn.app.id" and "spark.yarn.app.attemptId", since it is useless now.

How was this patch tested?

Test it with unit test and manually test different scenario:

  1. application running in yarn-client mode.
  2. application running in yarn-cluster mode.
  3. application running in yarn-cluster mode with multiple attempts.

Checked both the event log file name and url link.

CC @vanzin @tgravescs , please help to review, thanks a lot.

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53183 has finished for PR 11721 at commit 6339a01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -374,12 +374,6 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
throw new SparkException("An application name must be set in your configuration")
}

// System property spark.yarn.app.id must be set if user code ran by AM on a YARN cluster
if (master == "yarn" && deployMode == "cluster" && !_conf.contains("spark.yarn.app.id")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this was introduced so that people don't try new SparkContext(new SparkConf().setMaster("yarn-cluster")) and then complain when they get really odd error messages.

Is there another way to do this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, removing this will still get other error messages if trying to run cluster mode with such way, but maybe a little odd as you mentioned.

@vanzin
Copy link
Contributor

vanzin commented Mar 16, 2016

LGTM aside from small comment.

@steveloughran should take a look since IIRC he wanted the full attempt id for something.

@jerryshao
Copy link
Contributor Author

I think SchedulerExtensionService can still be worked, since I still maintain the full attempt id in YarnScheduler, only change to the simple counter for Spark scheduler.

@@ -133,12 +133,9 @@ private[spark] class ApplicationMaster(
System.setProperty("spark.master", "yarn")
System.setProperty("spark.submit.deployMode", "cluster")

// Propagate the application ID so that YarnClusterSchedulerBackend can pick it up.
// Set this internal configuration to true if it is running on cluster mode, this
// configuration will be checked in SparkContext to avoid misuse of yarn cluster mode.
System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using configuration is a simple and safe way to check the misuse of configuration, so here I revert back this configuration spark.yarn.app.id, but only use to check whether app is running on yarn-cluster mode correctly.

@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53281 has finished for PR 11721 at commit 0c34e0e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53282 has finished for PR 11721 at commit 188c83b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Ping @vazin, what about the current fix, would you please review again, thanks a lot.

@vanzin
Copy link
Contributor

vanzin commented Mar 18, 2016

That vazin guy must be wondering why he was pinged. :-) Sorry, I've been busy with other things, I'll try to take a look today.

@vanzin
Copy link
Contributor

vanzin commented Mar 18, 2016

LGTM, merging to master. Thanks!

@asfgit asfgit closed this in 3537782 Mar 18, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

This regression is introduced in apache#9182, previously attempt id is simply as counter "1" or "2". With the change of apache#9182, it is changed to full name as "appattemtp-xxx-00001", this will affect all the parts which uses this attempt id, like event log file name, history server app url link. So here change it back to the counter to keep consistent with previous code.

Also revert back this patch apache#11518, this patch fix the url link of history log according to the new way of attempt id, since here we change back to the previous way, so this patch is not necessary, here to revert it.

Also clean "spark.yarn.app.id" and "spark.yarn.app.attemptId", since it is useless now.

## How was this patch tested?

Test it with unit test and manually test different scenario:

1. application running in yarn-client mode.
2. application running in yarn-cluster mode.
3. application running in yarn-cluster mode with multiple attempts.

Checked both the event log file name and url link.

CC vanzin tgravescs , please help to review, thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes apache#11721 from jerryshao/SPARK-13885.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants