Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-392] Fix the bug in the shuffle data cleanup checker that causes false reports of disk corruption #393

Merged
merged 2 commits into from
Dec 8, 2022

Conversation

zuston
Copy link
Member

@zuston zuston commented Dec 8, 2022

What changes were proposed in this pull request?

[ISSUE-392] Fix the bug in the shuffle data cleanup checker that causes false reports of disk corruption #393

Why are the changes needed?

Fix the bug in the shuffle data checker that causes false reports of disk corruption during cleanup.

[INFO] 2022-12-07 16:27:51,411 leakShuffleDataChecker ShuffleTaskManager checkLeakShuffleData - Start check leak shuffle data
[INFO] 2022-12-07 16:27:51,416 leakShuffleDataChecker LocalFileDeleteHandler delete - Delete shuffle data for appId[check] with /data1/uniffle/data/check cost 0 ms
[INFO] 2022-12-07 16:27:51,420 leakShuffleDataChecker LocalFileDeleteHandler delete - Delete shuffle data for appId[check] with /data2/uniffle/data/check cost 0 ms
[INFO] 2022-12-07 16:27:51,420 leakShuffleDataChecker LocalFileDeleteHandler delete - Delete shuffle data for appId[check] with /data3/uniffle/data/check cost 0 ms
[INFO] 2022-12-07 16:27:51,420 leakShuffleDataChecker LocalFileDeleteHandler delete - Delete shuffle data for appId[check] with /data4/uniffle/data/check cost 0 ms
[INFO] 2022-12-07 16:27:51,420 leakShuffleDataChecker ShuffleTaskManager checkLeakShuffleData - Finish check leak shuffle data
[ERROR] 2022-12-07 16:27:51,685 HealthCheckService LocalStorageChecker checkStorageReadAndWrite - Storage read and write error
java.io.FileNotFoundException: /data4/uniffle/data/check/test (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at org.apache.uniffle.server.LocalStorageChecker$StorageInfo.checkStorageReadAndWrite(LocalStorageChecker.java:180)
        at org.apache.uniffle.server.LocalStorageChecker.checkIsHealthy(LocalStorageChecker.java:73)
        at org.apache.uniffle.server.HealthCheck.check(HealthCheck.java:84)
        at org.apache.uniffle.server.HealthCheck.lambda$new$0(HealthCheck.java:70)
        at java.lang.Thread.run(Thread.java:745)
[INFO] 2022-12-07 16:27:51,685 HealthCheckService LocalStorageChecker checkIsHealthy - shuffle server become unhealthy

Does this PR introduce any user-facing change?

No

How was this patch tested?

  1. UTs

@codecov-commenter
Copy link

codecov-commenter commented Dec 8, 2022

Codecov Report

Merging #393 (9f3a8dd) into master (8847ece) will decrease coverage by 0.03%.
The diff coverage is 25.00%.

@@             Coverage Diff              @@
##             master     #393      +/-   ##
============================================
- Coverage     58.80%   58.77%   -0.04%     
  Complexity     1602     1602              
============================================
  Files           193      193              
  Lines         10939    10939              
  Branches        955      955              
============================================
- Hits           6433     6429       -4     
- Misses         4128     4132       +4     
  Partials        378      378              
Impacted Files Coverage Δ
...rg/apache/uniffle/storage/common/LocalStorage.java 43.22% <0.00%> (ø)
...org/apache/uniffle/server/LocalStorageChecker.java 70.19% <33.33%> (ø)
...org/apache/uniffle/server/ShuffleFlushManager.java 78.01% <0.00%> (-2.10%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@advancedxy advancedxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except one minor comment

@@ -164,7 +164,8 @@ boolean checkStorageReadAndWrite() {
if (storage.isCorrupted()) {
return false;
}
File checkDir = new File(storageDir, "check");
// Use the hidden file to avoid being cleanup
File checkDir = new File(storageDir, ".check");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, make .check a final static variable, and we can reference this name in test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zuston zuston merged commit 3ec3f41 into apache:master Dec 8, 2022
@zuston
Copy link
Member Author

zuston commented Dec 8, 2022

Thanks @advancedxy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants