New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29880][CORE][YARN] Handle submit exception when submit to federation cluster #28688
Conversation
@jiangxb1987 i reopen this for #26503 thanks |
Test build #123358 has finished for PR 28688 at commit
|
try { | ||
logInfo(s"Requesting a new application from cluster" + | ||
s" ${hadoopConf.get(YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS)}" + | ||
s" with %d NodeManagers.".format(yarnClient.getYarnClusterMetrics.getNumNodeManagers)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it still useful to print the number of node managers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it can call yarnClient.getYarnClusterMetrics.getNumNodeManagers
without exception, i tend to keep the original log info.
s" with %d NodeManagers.".format(yarnClient.getYarnClusterMetrics.getNumNodeManagers)) | ||
} catch { | ||
case NonFatal(e) => | ||
logWarning(s"Yarn client may not implement the given API $e") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe print a more general warning, like "Error requesting YARN cluster information: $e"
Test build #123370 has finished for PR 28688 at commit
|
sorry to bother you @jiangxb1987 could you help review this thanks |
logInfo("Requesting a new application from cluster with %d NodeManagers" | ||
.format(yarnClient.getYarnClusterMetrics.getNumNodeManagers)) | ||
try { | ||
logInfo(s"Requesting a new application from cluster" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Remove the "s" because we don't include any expression in this line.
} catch { | ||
case NonFatal(e) => | ||
logWarning(s"Failed to request YARN cluster information from cluster " + | ||
s"${hadoopConf.get(YarnConfiguration.RM_ADDRESS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make a variable outside to represent the RM_ADDRESS value.
case NonFatal(e) => | ||
logWarning(s"Failed to request YARN cluster information from cluster " + | ||
s"${hadoopConf.get(YarnConfiguration.RM_ADDRESS, | ||
YarnConfiguration.DEFAULT_RM_ADDRESS)}" + " with excepation: $e") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: excepation
-> exception
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
When we submit application to federation yarn cluster. Since getYarnClusterMetrics is not implemented. The submission will exit with failure.
Why are the changes needed?
Since hadoop federation cluster was deployed, spark application will submit with failure if we do not handle the exception.
How was this patch tested?
UT