Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-509] Fix Flaky Test: ShuffleBufferManagerTest#shuffleFlushThreshold #510

Merged
merged 11 commits into from
Jan 29, 2023

Conversation

xianjingfeng
Copy link
Member

@xianjingfeng xianjingfeng commented Jan 28, 2023

What changes were proposed in this pull request?

The reason is that the flush operation is asynchronous, the first block maybe been flushed when judging the memory size.
So the solution is to support active trigger flush.

Why are the changes needed?

Fix #509 Flaky Test: ShuffleBufferManagerTest#shuffleFlushThreshold.

Does this PR introduce any user-facing change?

No

How was this patch tested?

No need

Copy link
Contributor

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @xianjingfeng, some nits here.

@codecov-commenter
Copy link

codecov-commenter commented Jan 28, 2023

Codecov Report

Merging #510 (3757b34) into master (0a6605b) will increase coverage by 0.04%.
The diff coverage is 92.85%.

@@             Coverage Diff              @@
##             master     #510      +/-   ##
============================================
+ Coverage     59.73%   59.78%   +0.04%     
- Complexity     1764     1770       +6     
============================================
  Files           205      205              
  Lines         11527    11531       +4     
  Branches       1033     1033              
============================================
+ Hits           6886     6894       +8     
+ Misses         4234     4232       -2     
+ Partials        407      405       -2     
Impacted Files Coverage Δ
...org/apache/uniffle/server/ShuffleFlushManager.java 84.37% <92.85%> (+0.33%) ⬆️
...he/uniffle/server/storage/LocalStorageManager.java 87.00% <0.00%> (+1.12%) ⬆️
...pache/hadoop/mapreduce/task/reduce/RssFetcher.java 92.30% <0.00%> (+1.53%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help describe more about this fix in description? From current code, this fix looks a little bit strange.

@xianjingfeng
Copy link
Member Author

Could you help describe more about this fix in description? From current code, this fix looks a little bit strange.

Because the flush operation is asynchronous, the first block maybe been flushed when judging the memory size.

kaijchen
kaijchen previously approved these changes Jan 29, 2023
Copy link
Contributor

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @xianjingfeng for the fix, and thanks @advancedxy for the review.

Copy link
Contributor

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR @xianjingfeng. The motivation is good, but here is some concerns.

"serverId", mockShuffleServer, storageManager) {

@Override
protected void startEventProcesser() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should override startEventProcesser?

The processEvents could be override with your previous change to the ShuffleBufferManager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this UT, if we start EventProcesser, events will be flushed automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/incubator-uniffle/pull/510/files/a803bf816ff79e16834ab6f42258373b71e71f5e
If you apply this code to your test class method, the event flush could be controlled externally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/incubator-uniffle/pull/510/files/a803bf816ff79e16834ab6f42258373b71e71f5e If you apply this code to your test class method, the event flush could be controlled externally?

Not exactly. At least we can control when to begin flush.

Copy link
Contributor

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is some thoughts to further reduce the exposed methods.

@kaijchen
Copy link
Contributor

Thanks @xianjingfeng for updating the PR, @advancedxy what do you think?

advancedxy
advancedxy previously approved these changes Jan 29, 2023
Copy link
Contributor

@advancedxy advancedxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

processEventThread.setName("ProcessEventThread");
processEventThread.setDaemon(true);
processEventThread.start();
startEventProcessor();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's move L85 before this line since these two lines are related.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

zuston
zuston previously approved these changes Jan 29, 2023
@kaijchen
Copy link
Contributor

Maybe it's better to not expose flush() to production code: kaijchen@cb9a0be

@advancedxy
Copy link
Contributor

Thanks @xianjingfeng for updating the PR, @advancedxy what do you think?

Thanks @xianjingfeng for your help. This code generally looks good to me with one minor comment. And I believe after introduce the flush method, some other tests may also be refactored to use the flush method instead of waitForFlush. But that should be addressed in another issue/pr.

@xianjingfeng xianjingfeng dismissed stale reviews from zuston and advancedxy via 8493b66 January 29, 2023 09:50
@xianjingfeng
Copy link
Member Author

Maybe it's better to not expose flush() to production code: kaijchen@cb9a0be

Done

advancedxy
advancedxy previously approved these changes Jan 29, 2023
Copy link
Contributor

@advancedxy advancedxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug introduced in the last commit.

Copy link
Contributor

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @xianjingfeng.

@kaijchen kaijchen merged commit 6076226 into apache:master Jan 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky Test: ShuffleBufferManagerTest#shuffleFlushThreshold
5 participants