[Improvement] Avoid selecting storage which has reached the high watermark #424

zuston · 2022-12-15T06:28:41Z

What changes were proposed in this pull request?

Replace selecting storage every time with selection cache to avoid selection not being idempotent in some cases
Avoid selecting storage which has reached the high watermark, which is based on above optimization

Why are the changes needed?

In current codebase, it's possible to select the local storage of reaching the high watermark in LocalStorageManager.

This strategy is unreasonable. And it makes many apps fallback to HDFS, because they select one high watermark storage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UTs

xianjingfeng · 2022-12-15T07:29:23Z

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

+    try {
+      LocalStorage storage = partitionsOfStorage.get(appId).get(shuffleId).get(partitionId);
+      if (storage.isCorrupted()) {
+        throw new RuntimeException("LocalStorage: " + storage.getBasePath() + " is corrupted.");


We had remove this exception in #281, why put it back?

I dont catch the point of removing this exception. Once storage is corrupted, there is no need to use next storage, because uniffle dont support dynamic storage switch, especially for a corrupted storage

In currnet codebase. if one storage is corrupted, data will be written to another storage. In this case, we will lost some data of this replica, but client can still read some data. But if exception thrown here, all data will be drop. If we use multi replicas, it will be useful.

Make sense for multiple replica.

But for single replica, it's not necessary. Moreover, if in single replica and enable fallback strategy, switching other storage will cause data lost. cc @jerqi

I prefer fast fail.

Gentle ping @jerqi WDYT. Besides, after this PR, I think I will introduce the feature of dynamic switching the localdisk for one single, which will solve your multiple replica problem.

@advancedxy Due to this change, it will make diskErrorTest fail. So I remove it temporarily

For this case:
In fact, I'm prefer to keep the old logic in this pr.
If we want to optimize with dynamic switching. we can introduce this change in that pr.

However, I think we should reconsider the corrupted cases. Once one replica of local storage is corrupted. The
whole partition could not be trusted. ShuffleReadClient should switch to another replica as soon as possible.
Maybe there's some logic I didn't follow?

Sorry, I missed this thread.

The whole partition could not be trusted. ShuffleReadClient should switch to another replica as soon as possible.
Maybe there's some logic I didn't follow?

Let me explain more detail about this. Firstly it need to be corrected that the disk corruption will influence the writing process in current case. Of course, it will cause the later reading if having multiple replica.

So if having single replica and using the MEMORY_LOCALFILE or LOCALFILE type, once disk is corrupted, which means the data lost, we should make relevant job fast fail. But in current codebase, it ignore this failure and retry and then fail. Of course, this will not cause big problems.

But when using MEMORY_LOCALFILE_HDFS and single replica, this will cause rest events flushing to fallback storage, like from localdisk to HDFS. Actually, this is unnecessary, as partial data lost.

Of course, this could be useful for multiple replica. But I think we could support this mechanism after dynamic selection, which will be better.

ShuffleReadClient should switch to another replica as soon as possible.

Yes. I think this is reasonable.

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

server/src/main/java/org/apache/uniffle/server/ShuffleServerGrpcService.java

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

codecov-commenter · 2022-12-16T03:49:06Z

Codecov Report

Merging #424 (6673783) into master (877b4ed) will increase coverage by 0.74%.
The diff coverage is 62.36%.

@@             Coverage Diff              @@
##             master     #424      +/-   ##
============================================
+ Coverage     58.43%   59.18%   +0.74%     
+ Complexity     1613     1493     -120     
============================================
  Files           195      183      -12     
  Lines         11063     9805    -1258     
  Branches        976      853     -123     
============================================
- Hits           6465     5803     -662     
+ Misses         4220     3651     -569     
+ Partials        378      351      -27

Impacted Files	Coverage Δ
...pache/uniffle/server/ShuffleServerGrpcService.java	`0.80% <0.00%> (-0.01%)`	⬇️
...he/uniffle/server/buffer/ShuffleBufferManager.java	`82.74% <ø> (ø)`
...rg/apache/uniffle/storage/common/LocalStorage.java	`45.89% <ø> (+2.66%)`	⬆️
...pache/uniffle/storage/common/LocalStorageMeta.java	`74.28% <0.00%> (-1.45%)`	⬇️
.../org/apache/uniffle/server/ShuffleTaskManager.java	`74.66% <31.57%> (-2.57%)`	⬇️
...rg/apache/uniffle/server/ShuffleDataReadEvent.java	`90.00% <75.00%> (-10.00%)`	⬇️
...he/uniffle/server/storage/LocalStorageManager.java	`88.35% <85.71%> (-2.89%)`	⬇️
.../main/java/org/apache/uniffle/common/UnionKey.java	`87.50% <87.50%> (ø)`
...org/apache/uniffle/storage/common/HdfsStorage.java	`0.00% <0.00%> (ø)`
...pache/hadoop/mapreduce/task/reduce/RssShuffle.java
... and 19 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

jerqi · 2022-12-16T05:55:42Z

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

-  private List<LocalStorage> unCorruptedStorages = Lists.newArrayList();
-  private final Set<String> corruptedStorages = Sets.newConcurrentHashSet();
+
+  private final Map<PartitionUnionKey, LocalStorage> partitionsOfStorage;


Maybe it will occupy many memory.

From our dashboard, there are only a few thousand partitions running at the same time

And if we use the nested concurrent hashmap, I'm worry about the problem of thread safe.

From our dashboard, there are only a few thousand partitions running at the same time

For large spark apps, it's common to have ~10K shuffle partitions and it's just one app.

However maybe we have not reached this kind of scale.

So far, yes

integration-test/common/src/test/java/org/apache/uniffle/test/DiskErrorToleranceTest.java

advancedxy · 2022-12-16T07:06:29Z

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

-  private List<LocalStorage> unCorruptedStorages = Lists.newArrayList();
-  private final Set<String> corruptedStorages = Sets.newConcurrentHashSet();
+
+  private final Map<PartitionUnionKey, LocalStorage> partitionsOfStorage;


From our dashboard, there are only a few thousand partitions running at the same time

For large spark apps, it's common to have ~10K shuffle partitions and it's just one app.

However maybe we have not reached this kind of scale.

server/src/main/java/org/apache/uniffle/server/ShuffleDataReadEvent.java

zuston · 2022-12-16T11:25:34Z

Could you help review again? @advancedxy @jerqi @advancedxy

And I'm doing some multiple disk selection for one partition based on this. I think I could push this forward quickly.

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

server/src/main/java/org/apache/uniffle/server/ShuffleDataReadEvent.java

server/src/main/java/org/apache/uniffle/server/ShuffleServerGrpcService.java

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

zuston · 2022-12-18T03:14:24Z

Fix the potential bug of thread race condition. @advancedxy

integration-test/common/src/test/java/org/apache/uniffle/test/DiskErrorToleranceTest.java

server/src/main/java/org/apache/uniffle/server/ShuffleDataReadEvent.java

server/src/main/java/org/apache/uniffle/server/ShuffleServerGrpcService.java

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

advancedxy · 2022-12-21T06:26:43Z

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

+        (key, localStorage) -> {
+          // If this is the first time to select storage or existing storage is corrupted,
+          // we should refresh the cache.
+          if (localStorage == null || localStorage.isCorrupted()) {


On a second thought, in which case localStorage would be isCorrupted? Since in L152 - L155, we would already throw an exception there?

For an event, if the storage is selected but event don't write any data to this storage (maybe the event is in pending queue), that means we could replace it to new storage, which won't cause data lost.

For an event, if the storage is selected but event don't write any data to this storage (maybe the event is in pending queue), that means we could replace it to new storage, which won't cause data lost.

This only make sense when L154-L156 doesn't throw an exception, right?

Emm... I have removed the throw exception logic, so it has kept consistent with the original logic in this part.

yeah, I know. What bothered me is that:

Previously L154-L156 throws an exception, I cannot image in which case, the localStorage.isCorrupted holds true

In latest code, this makes sense.

Did i miss something?

Previously L154-L156 throws an exception, I cannot image in which case, the localStorage.isCorrupted holds true

In the previous version commit, when localStorage.isCorrupted() == true and storage.containsWriteHandler(appId, shuffleId, partitionId) == false, the code will enter the part you mentioned.

Do I catch you thought?

Ah, I see. Thanks for the clarifying.

advancedxy · 2022-12-21T06:29:59Z

Left a minor comment, otherwise I'm ok with this PR.

@jerqi please take another look if you have time.

common/src/main/java/org/apache/uniffle/common/UnionKey.java

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java

jerqi · 2022-12-21T06:58:37Z

server/src/main/java/org/apache/uniffle/server/ShuffleTaskManager.java

+    int[] range = ShuffleStorageUtils.getPartitionRange(partitionId, partitionNumPerRange, partitionNum);
+    Storage storage = storageManager.selectStorage(new ShuffleDataReadEvent(appId, shuffleId, partitionId, range[0]));
+    if (storage == null) {
+      throw new FileNotFoundException("No such data in current storage manager.");


Why do we throw a FileNotFoundException?

The exception of FileNotFoundException could be handled by getLocalShuffleIndex grpc layer, because maybe there is no local storage when directly flushing to HDFS.

zuston · 2022-12-21T08:29:03Z

Updated @jerqi

jerqi

LGTM, thanks @zuston @advancedxy , Let @advancedxy merge this pr.

advancedxy · 2022-12-21T12:08:53Z

Merged, thanks @zuston

[Improvement] Avoid selecting storage when it reaches the high watermark

277db05

zuston changed the title ~~[Improvement] Avoid selecting storage when it reaches the high watermark~~ [Improvement] Avoid selecting storage which has reached the high watermark Dec 15, 2022

zuston requested review from jerqi, xianjingfeng and advancedxy December 15, 2022 06:35

Allow storage is null

548c124

xianjingfeng reviewed Dec 15, 2022

View reviewed changes

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java Outdated Show resolved Hide resolved

xianjingfeng reviewed Dec 15, 2022

View reviewed changes

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java Outdated Show resolved Hide resolved

xianjingfeng reviewed Dec 15, 2022

View reviewed changes

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java Outdated Show resolved Hide resolved

zuston added 2 commits December 15, 2022 16:05

fix

bf35651

optimize computeIfAbsent

1b19c26

advancedxy requested changes Dec 15, 2022

View reviewed changes

zuston added 5 commits December 15, 2022 21:04

fix concurrency problem

1c1f496

fix

d67958c

fix

507fb71

fix

85d6a1b

fix checkstyle

129088b

jerqi reviewed Dec 16, 2022

View reviewed changes

server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java Outdated Show resolved Hide resolved

zuston requested a review from advancedxy December 16, 2022 04:06

remove its exception catch

8df0fad

zuston requested review from jerqi and xianjingfeng December 16, 2022 05:51

jerqi reviewed Dec 16, 2022

View reviewed changes

zuston requested a review from jerqi December 16, 2022 06:53

advancedxy reviewed Dec 16, 2022

View reviewed changes

zuston mentioned this pull request Dec 16, 2022

feat: support multiple disk selection for one partition when using local storage #435

Closed

advancedxy reviewed Dec 16, 2022

View reviewed changes

zuston added 2 commits December 16, 2022 20:29

optimize by advancedxy

100efe2

Use unionKey

e99967d

zuston requested a review from advancedxy December 16, 2022 12:58

zuston added 4 commits December 17, 2022 21:37

Ensure the thread safe when selecting storage

4a7a759

fix checkstyle

3b5ff10

minor fix

a8620f0

fix

745ec3e

zuston force-pushed the main-1 branch from e859e54 to 745ec3e Compare December 18, 2022 03:13

advancedxy reviewed Dec 21, 2022

View reviewed changes

fix again

a3b19ce

zuston requested a review from advancedxy December 21, 2022 03:55

use the startPartition directly

0510ded

advancedxy reviewed Dec 21, 2022

View reviewed changes

advancedxy mentioned this pull request Dec 21, 2022

[Improvement] refactor getPartitionRange to calculate range directly #443

Closed

3 tasks

jerqi reviewed Dec 21, 2022

View reviewed changes

common/src/main/java/org/apache/uniffle/common/UnionKey.java Outdated Show resolved Hide resolved

jerqi reviewed Dec 21, 2022

View reviewed changes

zuston requested a review from jerqi December 21, 2022 07:18

zuston added 2 commits December 21, 2022 15:22

switching storage for corrupted storage

cb591af

rename from toKey to buildKey

6673783

jerqi approved these changes Dec 21, 2022

View reviewed changes

zuston requested a review from advancedxy December 21, 2022 11:47

advancedxy approved these changes Dec 21, 2022

View reviewed changes

advancedxy merged commit 5321292 into apache:master Dec 21, 2022

advancedxy mentioned this pull request Dec 27, 2022

[Improvement] Avoid selecting storage which has reached the high watermark #423

Closed

3 tasks

zuston mentioned this pull request May 7, 2023

[#845] feat(storage): Introduce available space based storage choosing policy #847

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Avoid selecting storage which has reached the high watermark #424

[Improvement] Avoid selecting storage which has reached the high watermark #424

zuston commented Dec 15, 2022 •

edited

Loading

xianjingfeng Dec 15, 2022 •

edited

Loading

zuston Dec 15, 2022

xianjingfeng Dec 15, 2022

zuston Dec 15, 2022

zuston Dec 16, 2022

zuston Dec 16, 2022 •

edited

Loading

advancedxy Dec 16, 2022

zuston Dec 16, 2022

codecov-commenter commented Dec 16, 2022 •

edited

Loading

jerqi Dec 16, 2022

zuston Dec 16, 2022

zuston Dec 16, 2022

advancedxy Dec 16, 2022 •

edited

Loading

zuston Dec 16, 2022

advancedxy Dec 16, 2022 •

edited

Loading

zuston commented Dec 16, 2022

zuston commented Dec 18, 2022

advancedxy Dec 21, 2022

zuston Dec 21, 2022 •

edited

Loading

advancedxy Dec 21, 2022

zuston Dec 21, 2022 •

edited

Loading

advancedxy Dec 21, 2022

zuston Dec 21, 2022 •

edited

Loading

advancedxy Dec 21, 2022

advancedxy commented Dec 21, 2022

jerqi Dec 21, 2022

zuston Dec 21, 2022

zuston commented Dec 21, 2022

jerqi left a comment •

edited

Loading

advancedxy commented Dec 21, 2022

[Improvement] Avoid selecting storage which has reached the high watermark #424

[Improvement] Avoid selecting storage which has reached the high watermark #424

Conversation

zuston commented Dec 15, 2022 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

xianjingfeng Dec 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuston Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Dec 16, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

zuston commented Dec 16, 2022

zuston commented Dec 18, 2022

Choose a reason for hiding this comment

zuston Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuston Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuston Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy commented Dec 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuston commented Dec 21, 2022

jerqi left a comment • edited Loading

Choose a reason for hiding this comment

advancedxy commented Dec 21, 2022

zuston commented Dec 15, 2022 •

edited

Loading

xianjingfeng Dec 15, 2022 •

edited

Loading

zuston Dec 16, 2022 •

edited

Loading

codecov-commenter commented Dec 16, 2022 •

edited

Loading

advancedxy Dec 16, 2022 •

edited

Loading

advancedxy Dec 16, 2022 •

edited

Loading

zuston Dec 21, 2022 •

edited

Loading

zuston Dec 21, 2022 •

edited

Loading

zuston Dec 21, 2022 •

edited

Loading

jerqi left a comment •

edited

Loading