-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ISSUE-163][FEATURE] Write to hdfs when local disk can't be write #235
Conversation
Codecov Report
@@ Coverage Diff @@
## master #235 +/- ##
============================================
- Coverage 59.71% 58.99% -0.72%
+ Complexity 1377 1336 -41
============================================
Files 166 166
Lines 8918 8570 -348
Branches 853 840 -13
============================================
- Hits 5325 5056 -269
+ Misses 3318 3233 -85
- Partials 275 281 +6
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
server/src/main/java/org/apache/uniffle/server/storage/MultiStorageManager.java
Outdated
Show resolved
Hide resolved
There are some flaky ut
|
server/src/main/java/org/apache/uniffle/server/ShuffleDataFlushEvent.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/uniffle/server/storage/MultiStorageManager.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/uniffle/server/storage/MultiStorageManager.java
Outdated
Show resolved
Hide resolved
I think this PR is a good improvement! We also need this PR to avoid the problem of full local disk, although we dont hope to enable the big block directly written to HDFS. |
Do you have time to invest this PR, I hope this can be introduced in our company internal version, looking forward to be merged assp. @xianjingfeng |
# Conflicts: # server/src/main/java/org/apache/uniffle/server/storage/MultiStorageManager.java
server/src/main/java/org/apache/uniffle/server/storage/SingleStorageManager.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/uniffle/server/ShuffleDataFlushEvent.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/uniffle/server/storage/MultiStorageManager.java
Show resolved
Hide resolved
.../src/main/java/org/apache/uniffle/server/storage/AbstractStorageManagerFallbackStrategy.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/uniffle/server/storage/DefaultStorageManagerFallbackStrategy.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/uniffle/server/storage/MultiStorageManager.java
Show resolved
Hide resolved
server/src/main/java/org/apache/uniffle/server/storage/MultiStorageManager.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/uniffle/server/storage/MultiStorageManager.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/uniffle/server/storage/DefaultStorageManagerFallbackStrategy.java
Outdated
Show resolved
Hide resolved
int nextIdx = -1; | ||
for (int i = 0; i < candidates.length; i++) { | ||
if (current == candidates[i]) { | ||
nextIdx = (i + 1) % candidates.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we merge these two loops into one loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think two loops is better understood and not easy to make mistakes, and there is no difference in the performance between the two ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's easy logic. One loop is enough. If you insist on it, I'm also ok for two loops.
server/src/main/java/org/apache/uniffle/server/storage/MultiStorageManager.java
Show resolved
Hide resolved
public static final ConfigOption<String> MULTISTORAGE_FALLBACK_STRATEGY_CLASS = ConfigOptions | ||
.key("rss.server.multistorage.fallback.strategy.class") | ||
.stringType() | ||
.noDefaultValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should choose origin behavior as default value.
Could we add some docs for this config option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @zuston Do you have another suggestion?
Wait for CI |
@zuston Gently ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Merged. @xianjingfeng Thanks for your contribution |
CreateShuffleWriteHandlerRequest request = storage.getCreateWriterHandlerRequest( | ||
event.getAppId(), event.getShuffleId(), event.getStartPartition()); | ||
storage = storageManager.selectStorage(event); | ||
handler = storage.getOrCreateWriteHandler(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reviewing this part, I think this is a bit strange that it changes the external objects reference of storage
and handler
. This will make the external invoker like ShuffleFlushManager
confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reviewing this part, I think this is a bit strange that change the external objects reference of
storage
andhandler
. This will make the external invoker likeShuffleFlushManager
confused.
We encapsulate the behavior. ShuffleFlushManger
don't need care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have better solution, you can propose it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I have no ideas. I just want to use this fallback strategy and review this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I have no ideas. I just want to use this fallback strategy and review this part.
Flush process worry me a lot. I'm not satisfied about the code quality. But I don't have idea how to improve them. Maybe @LuciferYang @advancedxy can help us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two directions to improve this part of MultiStorageManager
- Remove the
storageManagerCache
, it looks unused and bring some problems. One possible problem I have seen is that, when one event enters into pending queue due to storage cannot write, it will not have a chance to get a new fallback strategy due to cache. - Avoid changing the external reference object in
write
method. If we want to fallback write to other storage. We could use the invoking sequence like. (selectStorage -> write) if failed and then (selectStorage -> write), instead of selectStorage -> write (failed) -> write(choose other storage in this method). That means we should change the fallback strategy invoked fromwrite
method toselectStorage
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I have no ideas. I just want to use this fallback strategy and review this part.
Flush process worry me a lot. I'm not satisfied about the code quality. But I don't have idea how to improve them. Maybe @LuciferYang @advancedxy can help us.
@jerqi I'm just browsing the code. Let's discuss details later when I get more context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerqi Busy at the end of the year. I'll think this on the weekend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two directions to improve this part of MultiStorageManager
- Remove the
storageManagerCache
, it looks unused and bring some problems. One possible problem I have seen is that, when one event enters into pending queue due to storage cannot write, it will not have a chance to get a new fallback strategy due to cache.- Avoid changing the external reference object in
write
method. If we want to fallback write to other storage. We could use the invoking sequence like. (selectStorage -> write) if failed and then (selectStorage -> write), instead of selectStorage -> write (failed) -> write(choose other storage in this method). That means we should change the fallback strategy invoked fromwrite
method toselectStorage
method.
Agree
What changes were proposed in this pull request?
Write to hdfs when local disk can't be write
Why are the changes needed?
There should be a fallback mechanism when disk can't be write. #163
Does this PR introduce any user-facing change?
No
How was this patch tested?
Already added