Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29880][CORE][YARN] Handle submit exception when submit to federation cluster #28688

Closed
wants to merge 3 commits into from

Conversation

caneGuy
Copy link
Contributor

@caneGuy caneGuy commented Jun 1, 2020

When we submit application to federation yarn cluster. Since getYarnClusterMetrics is not implemented. The submission will exit with failure.

ResourceRequestHelper.validateResources(sparkConf)
var appId: ApplicationId = null
try {
  launcherBackend.connect()
  yarnClient.init(hadoopConf)
  yarnClient.start()

 logInfo("Requesting a new application from cluster with %d NodeManagers"
      .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))`

Why are the changes needed?

Since hadoop federation cluster was deployed, spark application will submit with failure if we do not handle the exception.

How was this patch tested?

UT

@caneGuy
Copy link
Contributor Author

caneGuy commented Jun 1, 2020

@jiangxb1987 i reopen this for #26503 thanks

@SparkQA
Copy link

SparkQA commented Jun 1, 2020

Test build #123358 has finished for PR 28688 at commit bc1dd70.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

try {
logInfo(s"Requesting a new application from cluster" +
s" ${hadoopConf.get(YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS)}" +
s" with %d NodeManagers.".format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it still useful to print the number of node managers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it can call yarnClient.getYarnClusterMetrics.getNumNodeManagers without exception, i tend to keep the original log info.

s" with %d NodeManagers.".format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
} catch {
case NonFatal(e) =>
logWarning(s"Yarn client may not implement the given API $e")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe print a more general warning, like "Error requesting YARN cluster information: $e"

@SparkQA
Copy link

SparkQA commented Jun 1, 2020

Test build #123370 has finished for PR 28688 at commit 28a59a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@caneGuy
Copy link
Contributor Author

caneGuy commented Jun 2, 2020

sorry to bother you @jiangxb1987 could you help review this thanks

logInfo("Requesting a new application from cluster with %d NodeManagers"
.format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
try {
logInfo(s"Requesting a new application from cluster" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Remove the "s" because we don't include any expression in this line.

} catch {
case NonFatal(e) =>
logWarning(s"Failed to request YARN cluster information from cluster " +
s"${hadoopConf.get(YarnConfiguration.RM_ADDRESS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make a variable outside to represent the RM_ADDRESS value.

case NonFatal(e) =>
logWarning(s"Failed to request YARN cluster information from cluster " +
s"${hadoopConf.get(YarnConfiguration.RM_ADDRESS,
YarnConfiguration.DEFAULT_RM_ADDRESS)}" + " with excepation: $e")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: excepation -> exception

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Sep 11, 2020
@github-actions github-actions bot closed this Sep 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants