[SPARK-46920][YARN] Improve executor exit error message on YARN#44951
[SPARK-46920][YARN] Improve executor exit error message on YARN#44951pan3793 wants to merge 4 commits intoapache:masterfrom
Conversation
| case OOM => "OutOfMemoryError" | ||
| case UNCAUGHT_EXCEPTION => "Uncaught exception." | ||
| case UNCAUGHT_EXCEPTION_TWICE => "Uncaught exception, and logging the exception failed." | ||
| case OOM => "OutOfMemoryError." |
There was a problem hiding this comment.
Total nit but do we need these changes? they're not sentences.
Otherwise fine.
There was a problem hiding this comment.
thanks for review, removed.
|
+CC @tgravescs |
|
kindly ping @tgravescs and @yaooqinn, would you please take a look? |
| case UNCAUGHT_EXCEPTION => "Uncaught exception" | ||
| case UNCAUGHT_EXCEPTION_TWICE => "Uncaught exception, and logging the exception failed" | ||
| case UNCAUGHT_EXCEPTION => "Uncaught exception." | ||
| case UNCAUGHT_EXCEPTION_TWICE => "Uncaught exception, and logging the exception failed." |
There was a problem hiding this comment.
Still trivial but these are not sentences. Here and below there is no reason to end with a period.
There was a problem hiding this comment.
Emm... So you mean we should remove all tailing dots in this function, and do appending on the caller side if necessary?
There was a problem hiding this comment.
I don't think they're needed at all. Separate stuff with spaces. But it doesn't matter
There was a problem hiding this comment.
Okay, let me revert changes in this function, just keep it as-is.
|
kindly ping @tgravescs |
|
its a little unclear to me exactly what the user changes. The screen shots in the description don't make sense with the code changes made. From what I can tell you are adding in another field:
which is the ExecutorExitCode.explainExitCode(exitStatus). This seems fine to me. |
|
@tgravescs Thanks for checking, I replaced the verified result image. the inconsistent diagnostics message is caused by |
|
ok, changes seem fine to me |
|
Merged into master for Spark 4.0. Thanks @pan3793 @srowen @tgravescs @mridulm |
What changes were proposed in this pull request?
Improve executor exit error message on YARN, with additional explanation of exit code defined by Spark.
Why are the changes needed?
Spark defines its own exit codes, which have overlap with exit codes defined by YARN, thus diagnostics reported by YARN may be misleading. For example, exit code 56 is defined as
HEARTBEAT_FAILUREin Spark, butINVALID_DOCKER_IMAGE_NAMEin Hadoop, thus the error message displayed in UI is misleading.Does this PR introduce any user-facing change?
Yes, the UI displays more information when the executor runs on YARN exits without zero code.
How was this patch tested?
Because HEARTBEAT_FAILURE depends on the network and Driver's load, to simplify the test, I just use
select java_method('java.lang.System', 'exit', 56)to simulate the above case.Was this patch authored or co-authored using generative AI tooling?
No.