-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge #39654
Conversation
cc @mridulm |
.../network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating.
@@ -815,7 +815,7 @@ public MergeStatuses finalizeShuffleMerge(FinalizeShuffleMerge msg) { | |||
} catch (IOException ioe) { | |||
logger.warn("{} attempt {} shuffle {} shuffleMerge {}: exception while " + | |||
"finalizing shuffle partition {}", msg.appId, msg.appAttemptId, msg.shuffleId, | |||
msg.shuffleMergeId, partition.reduceId); | |||
msg.shuffleMergeId, partition.reduceId, ioe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, could you add the new error message to the PR description, @tedyu ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you didn't hit this issue before, why do we need to pay attention on this code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the added parameter is not referenced in the format string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mridulm
If my understanding is correct, the logger would show the last parameter if it is an exception.
See the following code from common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
} catch (Throwable t) {
logger.warn("Error in responding RPC callback", t);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including stack trace in this warn message, is not helpful.
It is not actionable by users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm taking back my approval for now because it seems that this PR's new code path is not tested until now.
From https://github.com/tedyu/spark/actions/runs/3961295903/jobs/6786585143
Not sure of the reason for the cancellation - maybe timeout ? |
That test shouldn't be related. |
To clarify @tedyu, I am fine with including the exception message (ex.getMessage) in the warn message - not the entire stack trace - so there is value in the PR ! |
9e00ec9
to
fccaf01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, will wait for CI to pass though.
Test failures were not related to the PR.
|
Yeah looks fine, just rerun tests |
@dongjoon-hyun |
I'll leave this to the other committers, @tedyu . |
Merged to master |
@dongjoon-hyun @srowen @mridulm |
What changes were proposed in this pull request?
This PR adds
ioe
to the warning log offinalizeShuffleMerge
.Why are the changes needed?
With
ioe
logged, user would have more clue as to the root cause.Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing test suite.