Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1267][followup] improvement(client): The previous exception may be discarded when OOM occurs #1411

Merged
merged 2 commits into from
Jan 15, 2024

Conversation

rickyma
Copy link
Contributor

@rickyma rickyma commented Dec 30, 2023

What changes were proposed in this pull request?

Save the previous exception in advance to prevent it from being lost during the next retry.

Why are the changes needed?

This is the follow up pr of #1344

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

@rickyma
Copy link
Contributor Author

rickyma commented Dec 30, 2023

PTAL @zuston

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (995ed46) 53.24% compared to head (e6274c2) 55.15%.
Report is 17 commits behind head on master.

Files Patch % Lines
...ava/org/apache/uniffle/common/util/RetryUtils.java 75.00% 1 Missing and 1 partial ⚠️
...client/impl/grpc/ShuffleServerGrpcNettyClient.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1411      +/-   ##
============================================
+ Coverage     53.24%   55.15%   +1.91%     
- Complexity     2719     2773      +54     
============================================
  Files           419      403      -16     
  Lines         23966    21823    -2143     
  Branches       2042     2058      +16     
============================================
- Hits          12760    12036     -724     
+ Misses        10418     9048    -1370     
+ Partials        788      739      -49     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jerqi jerqi changed the title [#1267][FOLLOWUP] fix(client): The previous exception may be discarded when OOM occurs [#1267][followup] improvement(client): The previous exception may be discarded when OOM occurs Jan 2, 2024
@rickyma
Copy link
Contributor Author

rickyma commented Jan 3, 2024

@zuston Can you review this?

@jerqi jerqi requested a review from zuston January 3, 2024 08:28
@rickyma
Copy link
Contributor Author

rickyma commented Jan 5, 2024

Some details have been improved, please review. @zuston @jerqi

  1. Fixed the bug where the exception's stack trace was swallowed when calling RetryUtils.retryWithCondition.
  2. When OOM occurs, it will print out the stack trace of the last non-OOM exception, to prevent the root cause from being lost due to the occurrence of OOM.

@jerqi
Copy link
Contributor

jerqi commented Jan 6, 2024

cc @zuston

@@ -154,7 +155,7 @@ public RssSendShuffleDataResponse sendShuffleData(RssSendShuffleDataRequest requ
maxRetryAttempts,
t -> !(t instanceof OutOfMemoryError));
} catch (Throwable throwable) {
LOG.warn(throwable.getMessage());
LOG.warn(ExceptionUtils.getStackTrace(throwable));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using throwable directly ?

Copy link
Contributor Author

@rickyma rickyma Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because LOG.warn does not accept a Throwable type parameter, it only accepts a string.
As I mentioned above, I use ExceptionUtils.getStackTrace(e) just for the sake of minimal code changes and simplicity.
If you prefer using throwable directly, I will change it like below:
LOG.warn("Failed to send shuffle data due to ", throwable);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG.warn("Failed to send shuffle data due to ", throwable); +1

@rickyma rickyma requested a review from zuston January 6, 2024 17:49
@rickyma rickyma force-pushed the followup-issue-1267 branch 2 times, most recently from c1349ca to 6950f33 Compare January 8, 2024 11:14
@rickyma
Copy link
Contributor Author

rickyma commented Jan 12, 2024

Can you continue to review this PR? I think at least for my case, this PR is still very useful. @zuston

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, I missed this thread.

Left some comments. If you have any process, please ping me @rickyma

@@ -154,7 +155,7 @@ public RssSendShuffleDataResponse sendShuffleData(RssSendShuffleDataRequest requ
maxRetryAttempts,
t -> !(t instanceof OutOfMemoryError));
} catch (Throwable throwable) {
LOG.warn(throwable.getMessage());
LOG.warn(ExceptionUtils.getStackTrace(throwable));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG.warn("Failed to send shuffle data due to ", throwable); +1

@rickyma rickyma force-pushed the followup-issue-1267 branch 3 times, most recently from 99c6c40 to 45c422d Compare January 13, 2024 09:47
@rickyma
Copy link
Contributor Author

rickyma commented Jan 13, 2024

Done. @zuston

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your effort @rickyma

@zuston zuston merged commit 3180501 into apache:master Jan 15, 2024
36 checks passed
smallzhongfeng pushed a commit that referenced this pull request Feb 6, 2024
…in RetryUtils (#1500)

### What changes were proposed in this pull request?

INFO log level should be used in RetryUtils. Because these log messages are not actual errors.

### Why are the changes needed?

It's a followup PR for [#1411](#1411).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.
@rickyma rickyma deleted the followup-issue-1267 branch May 5, 2024 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants