Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-124] Add fallback mechanism for blocks read inconsistent #276

Merged
merged 32 commits into from
Nov 28, 2022

Conversation

xianjingfeng
Copy link
Member

What changes were proposed in this pull request?

Add fallback mechanism for blocks read inconsistent

Why are the changes needed?

When the data in this first server is damaged, application will fail. #124 #129

Does this PR introduce any user-facing change?

No

How was this patch tested?

Already added

@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2022

Codecov Report

Merging #276 (b3dd7f3) into master (2f34733) will increase coverage by 0.12%.
The diff coverage is 4.49%.

@@             Coverage Diff              @@
##             master     #276      +/-   ##
============================================
+ Coverage     58.45%   58.57%   +0.12%     
  Complexity     1570     1570              
============================================
  Files           193      192       -1     
  Lines         10833    10803      -30     
  Branches        951      942       -9     
============================================
- Hits           6332     6328       -4     
+ Misses         4127     4100      -27     
- Partials        374      375       +1     
Impacted Files Coverage Δ
...a/org/apache/uniffle/common/ShuffleServerInfo.java 75.00% <0.00%> (-25.00%) ⬇️
.../java/org/apache/uniffle/common/util/RssUtils.java 60.58% <0.00%> (-2.78%) ⬇️
...uniffle/storage/factory/ShuffleHandlerFactory.java 0.00% <0.00%> (ø)
...torage/handler/impl/ComposedClientReadHandler.java 0.00% <0.00%> (ø)
...orage/handler/impl/LocalFileClientReadHandler.java 59.37% <0.00%> (-6.15%) ⬇️
...ge/handler/impl/MultiReplicaClientReadHandler.java 0.00% <0.00%> (ø)
...le/storage/handler/impl/HdfsClientReadHandler.java 75.00% <60.00%> (-1.93%) ⬇️
...che/uniffle/client/impl/ShuffleReadClientImpl.java 89.69% <100.00%> (-0.41%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jerqi jerqi requested a review from frankliee October 24, 2022 11:07
@jerqi
Copy link
Contributor

jerqi commented Oct 24, 2022

What's the relation between this pr and #129 ?

@xianjingfeng
Copy link
Member Author

What's the relation between this pr and #129 ?

As what we discuss in #129, #129 is not the best and it will be split into multiple prs and this pr is one of them.

@jerqi jerqi requested a review from kaijchen October 25, 2022 02:36
@jerqi jerqi changed the title Add fallback mechanism for blocks read inconsistent [ISSUE-124] Add fallback mechanism for blocks read inconsistent Oct 25, 2022
@jerqi
Copy link
Contributor

jerqi commented Oct 26, 2022

One question: Should we use the concept of FALLBACK. In my opinion, we use fallback mechanism to process error or exception, but it's normal for us to read memory, localfile and hdfs at the same time.

@xianjingfeng
Copy link
Member Author

One question: Should we use the concept of FALLBACK. In my opinion, we use fallback mechanism to process error or exception, but it's normal for us to read memory, localfile and hdfs at the same time.

So, what is your opinion?

@jerqi
Copy link
Contributor

jerqi commented Oct 26, 2022

One question: Should we use the concept of FALLBACK. In my opinion, we use fallback mechanism to process error or exception, but it's normal for us to read memory, localfile and hdfs at the same time.

So, what is your opinion?

  1. Should we use method next() instead of fallback()?
  2. We shouldn't have the concept of maxFallbackTimes.

@xianjingfeng
Copy link
Member Author

  • Should we use method next() instead of fallback()?
  • We shouldn't have the concept of maxFallbackTimes.

How about fallback() -> nextRound() and maxFallbackTimes -> maxRounds?

@jerqi
Copy link
Contributor

jerqi commented Oct 26, 2022

  • Should we use method next() instead of fallback()?
  • We shouldn't have the concept of maxFallbackTimes.

How about fallback() -> nextRound() and maxFallbackTimes -> maxRounds?

If you have three replicas, every replica have memory, disk and hdfs. Whether maxRounds 3 is enough to read all the data?

@xianjingfeng
Copy link
Member Author

If you have three replicas, every replica have memory, disk and hdfs. Whether maxRounds 3 is enough to read all the data?

No guarantee. For example, the blocks is incomplete after first round, and than can't read from any shuffle server which store the missing blocks

@jerqi
Copy link
Contributor

jerqi commented Oct 26, 2022

If you have three replicas, every replica have memory, disk and hdfs. Whether maxRounds 3 is enough to read all the data?

No guarantee. For example, the blocks is incomplete after first round, and than can't read from any shuffle server which store the missing blocks

So I feel that maxRounds isn't reasonable for this situation.

@xianjingfeng
Copy link
Member Author

So I feel that maxRounds isn't reasonable for this situation.

I have another solution:

  1. Set maxFailTimes and failTime in every handler
  2. When readShuffleData fail, ++failTime
  3. When readShuffleData success, failTime = 0
  4. If failTime >= maxFailTimes , finished=true
    What do you think?

@jerqi
Copy link
Contributor

jerqi commented Oct 27, 2022

So I feel that maxRounds isn't reasonable for this situation.

I have another solution:

  1. Set maxFailTimes and failTime in every handler
  2. When readShuffleData fail, ++failTime
  3. When readShuffleData success, failTime = 0
  4. If failTime >= maxFailTimes , finished=true
    What do you think?

MaxFailureTime will tolerate the logic of replicas, this is my biggest concern.

@xianjingfeng
Copy link
Member Author

MaxFailureTime will tolerate the logic of replicas, this is my biggest concern.

I don't understand

@jerqi
Copy link
Contributor

jerqi commented Oct 27, 2022

MaxFailureTime will tolerate the logic of replicas, this is my biggest concern.

I don't understand

For replica logic, if we use 7 replicas, we should read 4 replica successfully, but if maxFailure is 3 . Although we have 4 correct replicas, the application will fail because the app may read 3 wrong replicas.

@xianjingfeng
Copy link
Member Author

For replica logic, if we use 7 replicas, we should read 4 replica successfully, but if maxFailure is 3 . Although we have 4 correct replicas, the application will fail because the app may read 3 wrong replicas.

I mean maxFailTimes is for each replica.

@jerqi
Copy link
Contributor

jerqi commented Oct 27, 2022

For replica logic, if we use 7 replicas, we should read 4 replica successfully, but if maxFailure is 3 . Although we have 4 correct replicas, the application will fail because the app may read 3 wrong replicas.

I mean maxFailTimes is for each replica.

It seems ok.

# Conflicts:
#	client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java
#	client-spark/spark2/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java
#	client-spark/spark2/src/test/java/org/apache/spark/shuffle/reader/RssShuffleReaderTest.java
#	client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java
#	client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java
#	client-spark/spark3/src/test/java/org/apache/spark/shuffle/reader/RssShuffleReaderTest.java
@xianjingfeng xianjingfeng changed the title [ISSUE-124] Add fallback mechanism for blocks read inconsistent [WIP][ISSUE-124] Add fallback mechanism for blocks read inconsistent Nov 24, 2022
// Only for test
public ShuffleServerInfo(String host, int port) {
this.id = host + "-" + port;
this.host = host;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little strange to add a constructor just for test, we can just use
new ShuffleServerInfo(host + "_" + String.valueOf(port), host, port)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unification and convenience. If we don't do this, we need modify many uts.

if (CollectionUtils.isEmpty(request.getShuffleServerInfoList())) {
throw new RuntimeException("Shuffle servers should not be empty!");
}
if (request.getShuffleServerInfoList().size() > 1) {
Copy link
Contributor

@frankliee frankliee Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jerqi, the current logic is too complicated.
It is better to use an unified code path (by the way, one server is a special case of multiple servers)

I prefer to add a global data structure in composed handler, may be called "progress".
It stores the information of consumed replicas and servers.
We could add the fallback in composed handler, and the each layer of handler can restart from the the last by reading the progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @xianjingfeng current implement is ok. We need a replicaHandler concept as a upper layer of composite handler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to add a global data structure in composed handler, may be called "progress".
It stores the information of consumed replicas and servers.
We could add the fallback in composed handler, and the each layer of handler can restart from the the last by reading the progress.

This logic has the same problem as the previous version of this pr. If we read fail from the memory handler and read successful from the localfile handler. And then, the memory data flush to localfile and we read from memory again, some data maybe lost.

@xianjingfeng xianjingfeng changed the title [WIP][ISSUE-124] Add fallback mechanism for blocks read inconsistent [ISSUE-124] Add fallback mechanism for blocks read inconsistent Nov 25, 2022
@xianjingfeng
Copy link
Member Author

PTAL @jerqi

@jerqi
Copy link
Contributor

jerqi commented Nov 26, 2022

LGTM except for minor issues , cc @Gustfh Do you have another suggestion?

Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's wait for a moment. If Gus don't reply us, I'll merge this pr next Tuesday.

@jerqi
Copy link
Contributor

jerqi commented Nov 28, 2022

Merged. thanks all.

jerqi pushed a commit that referenced this pull request Dec 11, 2022
### What changes were proposed in this pull request?
Skip blocks which not in expected blockId range when read from memory.

### Why are the changes needed?
1.If we use AQE, every task will read data from all partitions.
2.If the data of the first shuffle server is incomplete, we need to read from another server if #276 is merged. 
Both of the above situations will lead to  read redundant data from shuffle server.
### Does this PR introduce _any_ user-facing change?
Set `rss.client.read.block.skip.strategy` to `BLOCKID_RANGE`.

### How was this patch tested?
Already added
@xianjingfeng xianjingfeng deleted the issue_124_1 branch March 1, 2023 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants