[ISSUE-124] Add fallback mechanism for blocks read inconsistent #276

xianjingfeng · 2022-10-24T08:57:38Z

What changes were proposed in this pull request?

Add fallback mechanism for blocks read inconsistent

Why are the changes needed?

When the data in this first server is damaged, application will fail. #124 #129

Does this PR introduce any user-facing change?

No

How was this patch tested?

Already added

codecov-commenter · 2022-10-24T11:07:20Z

Codecov Report

Merging #276 (b3dd7f3) into master (2f34733) will increase coverage by 0.12%.
The diff coverage is 4.49%.

@@             Coverage Diff              @@
##             master     #276      +/-   ##
============================================
+ Coverage     58.45%   58.57%   +0.12%     
  Complexity     1570     1570              
============================================
  Files           193      192       -1     
  Lines         10833    10803      -30     
  Branches        951      942       -9     
============================================
- Hits           6332     6328       -4     
+ Misses         4127     4100      -27     
- Partials        374      375       +1

Impacted Files	Coverage Δ
...a/org/apache/uniffle/common/ShuffleServerInfo.java	`75.00% <0.00%> (-25.00%)`	⬇️
.../java/org/apache/uniffle/common/util/RssUtils.java	`60.58% <0.00%> (-2.78%)`	⬇️
...uniffle/storage/factory/ShuffleHandlerFactory.java	`0.00% <0.00%> (ø)`
...torage/handler/impl/ComposedClientReadHandler.java	`0.00% <0.00%> (ø)`
...orage/handler/impl/LocalFileClientReadHandler.java	`59.37% <0.00%> (-6.15%)`	⬇️
...ge/handler/impl/MultiReplicaClientReadHandler.java	`0.00% <0.00%> (ø)`
...le/storage/handler/impl/HdfsClientReadHandler.java	`75.00% <60.00%> (-1.93%)`	⬇️
...che/uniffle/client/impl/ShuffleReadClientImpl.java	`89.69% <100.00%> (-0.41%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

jerqi · 2022-10-24T11:08:28Z

What's the relation between this pr and #129 ?

xianjingfeng · 2022-10-24T11:27:37Z

What's the relation between this pr and #129 ?

As what we discuss in #129, #129 is not the best and it will be split into multiple prs and this pr is one of them.

client/src/main/java/org/apache/uniffle/client/impl/ShuffleReadClientImpl.java

...age/src/main/java/org/apache/uniffle/storage/handler/impl/MemoryQuorumClientReadHandler.java

storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsClientReadHandler.java

jerqi · 2022-10-26T08:55:53Z

One question: Should we use the concept of FALLBACK. In my opinion, we use fallback mechanism to process error or exception, but it's normal for us to read memory, localfile and hdfs at the same time.

xianjingfeng · 2022-10-26T09:47:17Z

One question: Should we use the concept of FALLBACK. In my opinion, we use fallback mechanism to process error or exception, but it's normal for us to read memory, localfile and hdfs at the same time.

So, what is your opinion?

jerqi · 2022-10-26T09:54:04Z

One question: Should we use the concept of FALLBACK. In my opinion, we use fallback mechanism to process error or exception, but it's normal for us to read memory, localfile and hdfs at the same time.

So, what is your opinion?

Should we use method next() instead of fallback()?
We shouldn't have the concept of maxFallbackTimes.

xianjingfeng · 2022-10-26T10:23:30Z

Should we use method next() instead of fallback()?

We shouldn't have the concept of maxFallbackTimes.

How about fallback() -> nextRound() and maxFallbackTimes -> maxRounds?

jerqi · 2022-10-26T11:07:59Z

Should we use method next() instead of fallback()?

We shouldn't have the concept of maxFallbackTimes.

How about fallback() -> nextRound() and maxFallbackTimes -> maxRounds?

If you have three replicas, every replica have memory, disk and hdfs. Whether maxRounds 3 is enough to read all the data?

xianjingfeng · 2022-10-26T11:45:54Z

If you have three replicas, every replica have memory, disk and hdfs. Whether maxRounds 3 is enough to read all the data?

No guarantee. For example, the blocks is incomplete after first round, and than can't read from any shuffle server which store the missing blocks

jerqi · 2022-10-26T11:47:09Z

If you have three replicas, every replica have memory, disk and hdfs. Whether maxRounds 3 is enough to read all the data?

No guarantee. For example, the blocks is incomplete after first round, and than can't read from any shuffle server which store the missing blocks

So I feel that maxRounds isn't reasonable for this situation.

xianjingfeng · 2022-10-26T12:31:52Z

So I feel that maxRounds isn't reasonable for this situation.

I have another solution：

Set maxFailTimes and failTime in every handler
When readShuffleData fail, ++failTime
When readShuffleData success, failTime = 0
If failTime >= maxFailTimes , finished=true
What do you think?

jerqi · 2022-10-27T02:18:36Z

So I feel that maxRounds isn't reasonable for this situation.

I have another solution：

Set maxFailTimes and failTime in every handler

When readShuffleData fail, ++failTime

When readShuffleData success, failTime = 0

If failTime >= maxFailTimes , finished=true
What do you think?

MaxFailureTime will tolerate the logic of replicas, this is my biggest concern.

xianjingfeng · 2022-10-27T06:12:39Z

MaxFailureTime will tolerate the logic of replicas, this is my biggest concern.

I don't understand

jerqi · 2022-10-27T06:20:34Z

MaxFailureTime will tolerate the logic of replicas, this is my biggest concern.

I don't understand

For replica logic, if we use 7 replicas, we should read 4 replica successfully, but if maxFailure is 3 . Although we have 4 correct replicas, the application will fail because the app may read 3 wrong replicas.

xianjingfeng · 2022-10-27T06:24:14Z

For replica logic, if we use 7 replicas, we should read 4 replica successfully, but if maxFailure is 3 . Although we have 4 correct replicas, the application will fail because the app may read 3 wrong replicas.

I mean maxFailTimes is for each replica.

jerqi · 2022-10-27T06:27:34Z

For replica logic, if we use 7 replicas, we should read 4 replica successfully, but if maxFailure is 3 . Although we have 4 correct replicas, the application will fail because the app may read 3 wrong replicas.

I mean maxFailTimes is for each replica.

It seems ok.

# Conflicts: # client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java # client-spark/spark2/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java # client-spark/spark2/src/test/java/org/apache/spark/shuffle/reader/RssShuffleReaderTest.java # client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java # client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java # client-spark/spark3/src/test/java/org/apache/spark/shuffle/reader/RssShuffleReaderTest.java

frankliee · 2022-11-24T15:20:37Z

common/src/main/java/org/apache/uniffle/common/ShuffleServerInfo.java

+  // Only for test
+  public ShuffleServerInfo(String host, int port) {
+    this.id = host + "-" + port;
+    this.host = host;


It is a little strange to add a constructor just for test, we can just use
new ShuffleServerInfo(host + "_" + String.valueOf(port), host, port)

For unification and convenience. If we don't do this, we need modify many uts.

frankliee · 2022-11-24T15:37:28Z

storage/src/main/java/org/apache/uniffle/storage/factory/ShuffleHandlerFactory.java

+    if (CollectionUtils.isEmpty(request.getShuffleServerInfoList())) {
+      throw new RuntimeException("Shuffle servers should not be empty!");
+    }
+    if (request.getShuffleServerInfoList().size() > 1) {


I agree with @jerqi, the current logic is too complicated.
It is better to use an unified code path (by the way, one server is a special case of multiple servers)

I prefer to add a global data structure in composed handler, may be called "progress".
It stores the information of consumed replicas and servers.
We could add the fallback in composed handler, and the each layer of handler can restart from the the last by reading the progress.

I think @xianjingfeng current implement is ok. We need a replicaHandler concept as a upper layer of composite handler.

I prefer to add a global data structure in composed handler, may be called "progress".
It stores the information of consumed replicas and servers.
We could add the fallback in composed handler, and the each layer of handler can restart from the the last by reading the progress.

This logic has the same problem as the previous version of this pr. If we read fail from the memory handler and read successful from the localfile handler. And then, the memory data flush to localfile and we read from memory again, some data maybe lost.

…Handler`

xianjingfeng · 2022-11-25T09:59:16Z

PTAL @jerqi

...age/src/main/java/org/apache/uniffle/storage/handler/impl/MultiReplicaClientReadHandler.java

storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsClientReadHandler.java

...age/src/main/java/org/apache/uniffle/storage/handler/impl/MultiReplicaClientReadHandler.java

storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsShuffleReadHandler.java

jerqi · 2022-11-26T03:25:09Z

LGTM except for minor issues , cc @Gustfh Do you have another suggestion?

jerqi

LGTM, let's wait for a moment. If Gus don't reply us, I'll merge this pr next Tuesday.

jerqi · 2022-11-28T16:04:00Z

Merged. thanks all.

### What changes were proposed in this pull request? Skip blocks which not in expected blockId range when read from memory. ### Why are the changes needed? 1.If we use AQE, every task will read data from all partitions. 2.If the data of the first shuffle server is incomplete, we need to read from another server if #276 is merged. Both of the above situations will lead to read redundant data from shuffle server. ### Does this PR introduce _any_ user-facing change? Set `rss.client.read.block.skip.strategy` to `BLOCKID_RANGE`. ### How was this patch tested? Already added

xianjingfeng added 2 commits October 24, 2022 16:10

Add fallback mechanism for Blocks read inconsistent

18b1372

fix flaky ut

0e369de

jerqi requested a review from frankliee October 24, 2022 11:07

fix code style

a102e27

jerqi requested a review from zuston October 25, 2022 02:26

jerqi mentioned this pull request Oct 25, 2022

[Problem] Inconsistent blocks when reading shuffle data #198

Closed

jerqi reviewed Oct 25, 2022

View reviewed changes

client/src/main/java/org/apache/uniffle/client/impl/ShuffleReadClientImpl.java Outdated Show resolved Hide resolved

jerqi requested a review from kaijchen October 25, 2022 02:36

jerqi changed the title ~~Add fallback mechanism for blocks read inconsistent~~ [ISSUE-124] Add fallback mechanism for blocks read inconsistent Oct 25, 2022

jerqi reviewed Oct 25, 2022

View reviewed changes

...age/src/main/java/org/apache/uniffle/storage/handler/impl/MemoryQuorumClientReadHandler.java Outdated Show resolved Hide resolved

Add an option for max fallback times

a2ebcd7

frankliee reviewed Oct 26, 2022

View reviewed changes

storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsClientReadHandler.java Outdated Show resolved Hide resolved

storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsClientReadHandler.java Outdated Show resolved Hide resolved

optimize

3d72893

xianjingfeng changed the title ~~[ISSUE-124] Add fallback mechanism for blocks read inconsistent~~ [WIP][ISSUE-124] Add fallback mechanism for blocks read inconsistent Nov 24, 2022

xianjingfeng added 5 commits November 24, 2022 17:02

fix ut

5c421f8

Merge branch 'master' into issue_124_1

bfbfbf5

fix ut

1a78c2e

Remove maxHandlerFailTimes and isFinished

56b3ece

Remove useless changes

b166a2f

frankliee reviewed Nov 24, 2022

View reviewed changes

xianjingfeng added 2 commits November 25, 2022 13:49

Remove LocalFileQuorumClientReadHandler and `MemoryQuorumClientRead…

7596ee9

…Handler`

Optimize ut

93587bc

xianjingfeng changed the title ~~[WIP][ISSUE-124] Add fallback mechanism for blocks read inconsistent~~ [ISSUE-124] Add fallback mechanism for blocks read inconsistent Nov 25, 2022

jerqi reviewed Nov 25, 2022

View reviewed changes

...age/src/main/java/org/apache/uniffle/storage/handler/impl/MultiReplicaClientReadHandler.java Show resolved Hide resolved

Optimize

e086ae7

jerqi reviewed Nov 25, 2022

View reviewed changes

storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsClientReadHandler.java Outdated Show resolved Hide resolved

jerqi reviewed Nov 25, 2022

View reviewed changes

storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsClientReadHandler.java Outdated Show resolved Hide resolved

jerqi reviewed Nov 25, 2022

View reviewed changes

...age/src/main/java/org/apache/uniffle/storage/handler/impl/MultiReplicaClientReadHandler.java Outdated Show resolved Hide resolved

Optimize

e077cb8

xianjingfeng closed this Nov 25, 2022

xianjingfeng reopened this Nov 25, 2022

Merge branch 'apache:master' into issue_124_1

622e4aa

jerqi reviewed Nov 26, 2022

View reviewed changes

storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsShuffleReadHandler.java Outdated Show resolved Hide resolved

Remove useless changes

b3dd7f3

jerqi approved these changes Nov 26, 2022

View reviewed changes

jerqi mentioned this pull request Nov 28, 2022

[Improvement][AQE] Support getting memory data skip by upstream task ids #358

Merged

jerqi merged commit ad51341 into apache:master Nov 28, 2022

jerqi mentioned this pull request Nov 28, 2022

[Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks #124

Closed

xianjingfeng deleted the issue_124_1 branch March 1, 2023 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE-124] Add fallback mechanism for blocks read inconsistent #276

[ISSUE-124] Add fallback mechanism for blocks read inconsistent #276

xianjingfeng commented Oct 24, 2022

codecov-commenter commented Oct 24, 2022 •

edited

Loading

jerqi commented Oct 24, 2022

xianjingfeng commented Oct 24, 2022

jerqi commented Oct 26, 2022

xianjingfeng commented Oct 26, 2022

jerqi commented Oct 26, 2022

xianjingfeng commented Oct 26, 2022

jerqi commented Oct 26, 2022 •

edited

Loading

xianjingfeng commented Oct 26, 2022

jerqi commented Oct 26, 2022

xianjingfeng commented Oct 26, 2022

jerqi commented Oct 27, 2022

xianjingfeng commented Oct 27, 2022

jerqi commented Oct 27, 2022

xianjingfeng commented Oct 27, 2022

jerqi commented Oct 27, 2022

frankliee Nov 24, 2022

xianjingfeng Nov 25, 2022

frankliee Nov 24, 2022 •

edited

Loading

jerqi Nov 24, 2022

xianjingfeng Nov 25, 2022

xianjingfeng commented Nov 25, 2022

jerqi commented Nov 26, 2022

jerqi left a comment

jerqi commented Nov 28, 2022

[ISSUE-124] Add fallback mechanism for blocks read inconsistent #276

[ISSUE-124] Add fallback mechanism for blocks read inconsistent #276

Conversation

xianjingfeng commented Oct 24, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

codecov-commenter commented Oct 24, 2022 • edited Loading

Codecov Report

jerqi commented Oct 24, 2022

xianjingfeng commented Oct 24, 2022

jerqi commented Oct 26, 2022

xianjingfeng commented Oct 26, 2022

jerqi commented Oct 26, 2022

xianjingfeng commented Oct 26, 2022

jerqi commented Oct 26, 2022 • edited Loading

xianjingfeng commented Oct 26, 2022

jerqi commented Oct 26, 2022

xianjingfeng commented Oct 26, 2022

jerqi commented Oct 27, 2022

xianjingfeng commented Oct 27, 2022

jerqi commented Oct 27, 2022

xianjingfeng commented Oct 27, 2022

jerqi commented Oct 27, 2022

frankliee Nov 24, 2022

Choose a reason for hiding this comment

xianjingfeng Nov 25, 2022

Choose a reason for hiding this comment

frankliee Nov 24, 2022 • edited Loading

Choose a reason for hiding this comment

jerqi Nov 24, 2022

Choose a reason for hiding this comment

xianjingfeng Nov 25, 2022

Choose a reason for hiding this comment

xianjingfeng commented Nov 25, 2022

jerqi commented Nov 26, 2022

jerqi left a comment

Choose a reason for hiding this comment

jerqi commented Nov 28, 2022

codecov-commenter commented Oct 24, 2022 •

edited

Loading

jerqi commented Oct 26, 2022 •

edited

Loading

frankliee Nov 24, 2022 •

edited

Loading