Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-456] Avoid removeResources for multiple times #459

Merged
merged 2 commits into from
Jan 10, 2023

Conversation

xianjingfeng
Copy link
Member

@xianjingfeng xianjingfeng commented Jan 9, 2023

What changes were proposed in this pull request?

If Resource had been removed, avoid remove twice.

Why are the changes needed?

When some appIds' removeResource took too much time, the expiredAppCleanupExecutorService in ShuffleTaskManager
would check and detect the same appId is expired multiple times. Therefore the same appId might be added to expiredAppIdQueue multiple times. This PR fixes #456

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2023

Codecov Report

Merging #459 (de115e3) into master (ebaff6a) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master     #459      +/-   ##
============================================
+ Coverage     58.74%   58.75%   +0.01%     
- Complexity     1664     1666       +2     
============================================
  Files           199      199              
  Lines         11236    11239       +3     
  Branches        999     1000       +1     
============================================
+ Hits           6601     6604       +3     
  Misses         4243     4243              
  Partials        392      392              
Impacted Files Coverage Δ
.../org/apache/uniffle/server/ShuffleTaskManager.java 76.56% <100.00%> (+0.22%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@advancedxy
Copy link
Contributor

The fix looks good to me.
But could you elaborate a bit more in the pr description's Why are the changes needed section.

If I understand the issue and this fix correctly, the reason why some appId is called multiple times is that:

When some appIds' removeResource took too much time, the `expiredAppCleanupExecutorService` in ShuffleTaskManager
would check and detect the same appId is expired multiple times. Therefore the same appId might be added to `expiredAppIdQueue` multiple time.

@jerqi jerqi linked an issue Jan 9, 2023 that may be closed by this pull request
3 tasks
Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, left minor comment

@@ -753,7 +807,8 @@ public void checkAndClearLeakShuffleDataTest(@TempDir File tempDir) throws Excep
assertTrue(appIdsOnDisk.contains(appId));

// make sure heartbeat timeout and resources are removed
Thread.sleep(5000);
Awaitility.await().timeout(10, TimeUnit.SECONDS).until(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is involved in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not related with this PR, could you help open another PR to fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In original logic, shuffleTaskInfos.remove(appId) was invoked after remove all resources, but it was invoked before remove all resources in this pr. It becomes a flaky test in this pr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

@xianjingfeng
Copy link
Member Author

The fix looks good to me. But could you elaborate a bit more in the pr description's Why are the changes needed section.

If I understand the issue and this fix correctly, the reason why some appId is called multiple times is that:

When some appIds' removeResource took too much time, the `expiredAppCleanupExecutorService` in ShuffleTaskManager
would check and detect the same appId is expired multiple times. Therefore the same appId might be added to `expiredAppIdQueue` multiple time.

Done. Thanks your describe.

Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zuston zuston merged commit 3f166f4 into apache:master Jan 10, 2023
@zuston
Copy link
Member

zuston commented Jan 10, 2023

Merged. Thanks @xianjingfeng @advancedxy @jerqi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Avoid removeResources for multiple times
5 participants