Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge #39654

Closed
wants to merge 1 commit into from

Conversation

tedyu
Copy link
Contributor

@tedyu tedyu commented Jan 19, 2023

What changes were proposed in this pull request?

This PR adds ioe to the warning log of finalizeShuffleMerge.

Why are the changes needed?

With ioe logged, user would have more clue as to the root cause.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing test suite.

@tedyu
Copy link
Contributor Author

tedyu commented Jan 19, 2023

cc @mridulm

@github-actions github-actions bot added the CORE label Jan 19, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SHUFFLE][MINOR] Include IOException in warning log of finalizeShuffleMerge [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge Jan 19, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating.

@@ -815,7 +815,7 @@ public MergeStatuses finalizeShuffleMerge(FinalizeShuffleMerge msg) {
} catch (IOException ioe) {
logger.warn("{} attempt {} shuffle {} shuffleMerge {}: exception while " +
"finalizing shuffle partition {}", msg.appId, msg.appAttemptId, msg.shuffleId,
msg.shuffleMergeId, partition.reduceId);
msg.shuffleMergeId, partition.reduceId, ioe);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, could you add the new error message to the PR description, @tedyu ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you didn't hit this issue before, why do we need to pay attention on this code path?

Copy link
Contributor

@mridulm mridulm Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the added parameter is not referenced in the format string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mridulm
If my understanding is correct, the logger would show the last parameter if it is an exception.
See the following code from common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java

        } catch (Throwable t) {
          logger.warn("Error in responding RPC callback", t);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including stack trace in this warn message, is not helpful.
It is not actionable by users

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking back my approval for now because it seems that this PR's new code path is not tested until now.

@tedyu
Copy link
Contributor Author

tedyu commented Jan 20, 2023

From https://github.com/tedyu/spark/actions/runs/3961295903/jobs/6786585143

Finished test(pypy3): pyspark.sql.tests.test_utils (10s)
Starting test(python3.9): pyspark.mllib.tests.test_algorithms (temp output: /__w/spark/spark/python/target/a23c65c0-281b-4044-8aab-2a99865b220b/python3.9__pyspark.mllib.tests.test_algorithms__oumchym_.log)
Error: The operation was canceled.

Not sure of the reason for the cancellation - maybe timeout ?

@HyukjinKwon
Copy link
Member

That test shouldn't be related.

@tedyu tedyu closed this Jan 21, 2023
@mridulm
Copy link
Contributor

mridulm commented Jan 21, 2023

To clarify @tedyu, I am fine with including the exception message (ex.getMessage) in the warn message - not the entire stack trace - so there is value in the PR !

@tedyu tedyu reopened this Jan 21, 2023
@tedyu tedyu force-pushed the shuffle-ioe branch 2 times, most recently from 9e00ec9 to fccaf01 Compare January 21, 2023 04:50
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, will wait for CI to pass though.

@tedyu
Copy link
Contributor Author

tedyu commented Jan 21, 2023

Test failures were not related to the PR.
https://github.com/tedyu/spark/actions/runs/3973317986/jobs/6811901738#step:9:23488

Error: Exception in thread "streaming-job-executor-0" java.lang.Error: java.lang.InterruptedException
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242)

@srowen
Copy link
Member

srowen commented Jan 21, 2023

Yeah looks fine, just rerun tests

@tedyu
Copy link
Contributor Author

tedyu commented Jan 21, 2023

@srowen @mridulm
Tests passed.

@tedyu
Copy link
Contributor Author

tedyu commented Jan 21, 2023

@dongjoon-hyun
Do you think this PR is in mergeable state ?

@dongjoon-hyun
Copy link
Member

I'll leave this to the other committers, @tedyu .

@srowen
Copy link
Member

srowen commented Jan 21, 2023

Merged to master

@srowen srowen closed this in 074894c Jan 21, 2023
@tedyu
Copy link
Contributor Author

tedyu commented Jan 21, 2023

@dongjoon-hyun @srowen @mridulm
Thanks for reviewing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants