Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1472][part-3] fix(client): Fix occasional IllegalReferenceCountException issues in extremely rare scenarios #1522

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

rickyma
Copy link
Contributor

@rickyma rickyma commented Feb 12, 2024

What changes were proposed in this pull request?

Improve the robustness of methods ShuffleDataResult.release() and ShuffleIndexResult.release() to fix occasional IllegalReferenceCountException issues in extremely rare scenarios.

Why are the changes needed?

A sub PR for: #1519

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

@codecov-commenter
Copy link

codecov-commenter commented Feb 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (576a925) 54.27% compared to head (5cfa1c8) 55.18%.
Report is 11 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1522      +/-   ##
============================================
+ Coverage     54.27%   55.18%   +0.91%     
+ Complexity     2807     2806       -1     
============================================
  Files           427      410      -17     
  Lines         24349    22057    -2292     
  Branches       2077     2081       +4     
============================================
- Hits          13215    12172    -1043     
+ Misses        10305     9125    -1180     
+ Partials        829      760      -69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@EnricoMi
Copy link
Contributor

EnricoMi commented Feb 12, 2024

What if we would make sure that this.buffer.release() is called at most once?

  public synchronized void release() {
    if (this.buffer != null) {
      this.buffer.release();
      this.buffer = null;
    }
  }

Then we wouldn't need to add logic that requires internal knowledge of this.buffer.release().

Copy link

github-actions bot commented Feb 12, 2024

Test Results

2 287 files  ±0  2 287 suites  ±0   4h 30m 0s ⏱️ +17s
  819 tests ±0    818 ✅ ±0   1 💤 ±0  0 ❌ ±0 
9 086 runs  ±0  9 073 ✅ ±0  13 💤 ±0  0 ❌ ±0 

Results for commit 5cfa1c8. ± Comparison against base commit 3954ce7.

♻️ This comment has been updated with latest results.

@rickyma
Copy link
Contributor Author

rickyma commented Feb 12, 2024

What if we would make sure that this.buffer.release() is called at most once?

  private boolean released = false;

  public synchronized void release() {
    if (! released && this.buffer != null) {
      this.buffer.release();
      released = true;
    }
  }

Then we wouldn't need to add logic that requires internal knowledge of this.buffer.release().

Your approach might not solve this issue. Because my previous code was written in a similar way to yours.
Currently, methods ShuffleDataResult.release() and ShuffleIndexResult.release() are called only once(at least, that's what the code suggests.), but IllegalReferenceCountException will still arise in extremely rare scenarios occasionally(not always). That means the refCnt of ByteBuf inside ShuffleDataResult and ShuffleIndexResult is already 0 when the first time methods ShuffleDataResult.release() and ShuffleIndexResult.release() are called.

The cause is unclear at the moment, we can investigate further in the future. But at least for now, the changes I've made will definitely not cause any problems and will solve this issue.

@rickyma
Copy link
Contributor Author

rickyma commented Feb 13, 2024

We need this fix currently. It will not cause any side effects. @jerqi

@EnricoMi
Copy link
Contributor

TBH, this looks more like a hack than a fix. Usually such issues won't be investigated further once a workaround is merged.

@rickyma
Copy link
Contributor Author

rickyma commented Feb 13, 2024

TBH, this looks more like a hack than a fix. Usually such issues won't be investigated further once a workaround is merged.

I can add TODO in comments for now. It is a tricky bug. We need to fix this immediately.

@@ -106,7 +106,7 @@ public boolean isEmpty() {
}

public void release() {
if (this.buffer != null) {
if (this.buffer != null && this.getDataBuf() != null && this.getDataBuf().refCnt() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some logs to help us to find the root cause of this issue?

…ntException issues in extremely rare scenarios
@rickyma
Copy link
Contributor Author

rickyma commented Feb 14, 2024

The bug is fixed.

It is more like a follow-up PR for #1455.

The previous approach could not distinguish between an EMPTY_BUFFER and a ByteBuf with a zero value of readableBytes.

@jerqi PTAL

@rickyma rickyma requested a review from jerqi February 14, 2024 08:48
Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jerqi jerqi merged commit bc60c12 into apache:master Feb 15, 2024
38 checks passed
@jerqi
Copy link
Contributor

jerqi commented Feb 15, 2024

Merged to master. Thanks @rickyma for contribution! Thanks @EnricoMi for review!

zuston pushed a commit that referenced this pull request Feb 24, 2024
…ler.readShuffleData (#1536)

### What changes were proposed in this pull request?

Fix IllegalReferenceCountException issues when exceptions happened in clientReadHandler.readShuffleData().

### Why are the changes needed?

A follow-up PR for: #1522

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.
@rickyma rickyma deleted the issue-1472-part-3 branch May 5, 2024 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants