-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement] Should match from pathToStorages when appId does not exist in appIdToStorages #168
Conversation
12040ca
to
19849bb
Compare
Codecov Report
@@ Coverage Diff @@
## master #168 +/- ##
============================================
+ Coverage 58.29% 58.31% +0.02%
- Complexity 1262 1266 +4
============================================
Files 158 158
Lines 8397 8409 +12
Branches 779 782 +3
============================================
+ Hits 4895 4904 +9
- Misses 3251 3253 +2
- Partials 251 252 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
} | ||
} catch (Exception e) { | ||
LOG.error("Some error happened when fileSystem got the file status."); | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use printStackTrace
, it print the error stack in the stdout, could we print the error stack in the log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
It seems that we only call the method |
if (!appIdToStorages.containsKey(appId)) { | ||
String msg = "Can't find HDFS storage for appId[" + appId + "]"; | ||
LOG.error(msg); | ||
// outside should deal with null situation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we remove the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add back.
Because we found the log |
4c54fae
to
478d875
Compare
Why will the method |
String msg = "Can't find HDFS storage for appId[" + appId + "]"; | ||
LOG.error(msg); | ||
// outside should deal with null situation | ||
// todo: it's better to have a fake storage for null situation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we remove this todo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add back.
You can take a look at this thread |
I know |
Yes, normally, an app calls the logic of |
Why does the expiredAppIdQueue add the appId again? |
This problem was fixed immediately after it appeared last week. Unfortunately, there was no backup log during the redeployment. However, according to the results of last week, we think that the appId was rewritten into the buffer of |
I just doubt that it's not a root cause. Maybe we can merge this pr first. Please resolve the comment https://github.com/apache/incubator-uniffle/pull/168/files#r951078297 |
At present, this is an occasional problem, and the impact is relatively controllable.We will observe it first, and any subsequent progress will be returned to the community. @jerqi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this pr first. Thanks @smallzhongfeng
OK, thanks for your patient review. @jerqi |
What changes were proposed in this pull request?
From the audit log of HDFS, it can be seen that when the HDFS path of this app was last deleted at 18:00:55, the log in the
shuffleServer
found that the error aboutfile could not be found
, and the file would continue to be written. At last we found that when some appId cache was removed inappIdToStorages
, and thenHdfsStorageManager
callsremoveResources
will cause storagePath to not be deleted.Why are the changes needed?
When the cache of
appIdToStorages
removed, the remote path can be deleted normally.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added ut.