Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file #275

Merged
merged 4 commits into from
Oct 27, 2022

Conversation

leixm
Copy link
Contributor

@leixm leixm commented Oct 21, 2022

What changes were proposed in this pull request?

For issue#239, Fix inconsistent blocks when reading shuffle data.

Why are the changes needed?

This problem will cause reading shuffle data failed.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Already added UT

@leixm leixm changed the title [Problem] RssUtils#transIndexDataToSegments should consider the length of the data file [ISSUE-239][Problem] RssUtils#transIndexDataToSegments should consider the length of the data file Oct 21, 2022
@codecov-commenter
Copy link

codecov-commenter commented Oct 21, 2022

Codecov Report

Merging #275 (b007acb) into master (7a2f0ef) will increase coverage by 0.21%.
The diff coverage is 70.37%.

@@             Coverage Diff              @@
##             master     #275      +/-   ##
============================================
+ Coverage     59.71%   59.92%   +0.21%     
- Complexity     1377     1384       +7     
============================================
  Files           166      166              
  Lines          8918     8936      +18     
  Branches        853      854       +1     
============================================
+ Hits           5325     5355      +30     
+ Misses         3318     3302      -16     
- Partials        275      279       +4     
Impacted Files Coverage Δ
...pache/uniffle/server/ShuffleServerGrpcService.java 0.87% <0.00%> (-0.01%) ⬇️
...e/storage/handler/impl/HdfsShuffleReadHandler.java 55.73% <50.00%> (-1.41%) ⬇️
.../java/org/apache/uniffle/common/util/RssUtils.java 69.04% <66.66%> (-0.28%) ⬇️
...e/uniffle/storage/handler/impl/HdfsFileReader.java 81.81% <80.00%> (-0.95%) ⬇️
...orage/handler/impl/LocalFileServerReadHandler.java 78.33% <80.00%> (+0.36%) ⬆️
.../org/apache/uniffle/common/ShuffleIndexResult.java 100.00% <100.00%> (ø)
...orage/handler/impl/LocalFileClientReadHandler.java 65.51% <0.00%> (+65.51%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@leixm
Copy link
Contributor Author

leixm commented Oct 21, 2022

@zuston Can you help review this pr.

@@ -218,6 +220,13 @@ private static List<ShuffleDataSegment> transIndexDataToSegments(byte[] indexDat

bufferSegments.add(new BufferSegment(blockId, bufferOffset, length, uncompressLength, crc, taskAttemptId));
bufferOffset += length;
totalLength += length;

// If ShuffleServer is flushing the file at this time, the length in the index file record may be greater
Copy link
Member

@zuston zuston Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this problem only occur that the map tasks are all finished and the data stored in memory is flushed to localfile/HDFS, right? And in this time, the spark client read the redundant index data. Right?

Analyzed from this perspective, if u drop these redundant data, does it will cause the data missing problem due to in hdfs client buffer insteading of memory/HDFS. I think it wont. The flushing data only will be flushed to HDFS and then removed from memory. And the method of dataWriter.close() in HdfsShuffleWriteHandler will ensure the data flushed to HDFS.

So this change is OK and wont cause data lost. But I have a question that why not calling the dataWriter.flush and indexWriter.flush when writing one block to solve this problem. Does this will make performance regession?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unreasonable to do a flush every time a block is written, so that the buffer of the hdfs client will not work, and it will make performance regession.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. This changelog looks OK for me

@jerqi jerqi changed the title [ISSUE-239][Problem] RssUtils#transIndexDataToSegments should consider the length of the data file [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file Oct 21, 2022
@jerqi
Copy link
Contributor

jerqi commented Oct 21, 2022

Will it occur similar situation for local file storage type?

@leixm
Copy link
Contributor Author

leixm commented Oct 24, 2022

Our production environment does not use localfile, so I am not quite sure if the same problem would exist.

@leixm
Copy link
Contributor Author

leixm commented Oct 24, 2022

The key question is whether it is possible to have more blocks in the index file than in the data file, which seems possible based on the code analysis, but I suggest to refer to the actual production environment.

@zuston
Copy link
Member

zuston commented Oct 24, 2022

Will it occur similar situation for local file storage type?

I think it also will happen in localfile type.

@leixm
Copy link
Contributor Author

leixm commented Oct 24, 2022

Will it occur similar situation for local file storage type?

I think it also will happen in localfile type.

We can consider the length of the data file when ShuffleServer#getLocalShuffleIndex generates index information, what do you think?

@zuston
Copy link
Member

zuston commented Oct 24, 2022

We can consider the length of the data file when ShuffleServer#getLocalShuffleIndex generates index information, what do you think?

It's OK for me.

@jerqi
Copy link
Contributor

jerqi commented Oct 24, 2022

We can consider the length of the data file when ShuffleServer#getLocalShuffleIndex generates index information, what do you think?

It's OK for me.

+1

@jerqi
Copy link
Contributor

jerqi commented Oct 26, 2022

LGTM, wait for CI. Do you have another suggestion? @zuston

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@jerqi jerqi merged commit 2429c67 into apache:master Oct 27, 2022
kaijchen pushed a commit that referenced this pull request Nov 12, 2022
…e length of the data file (#275)

### What changes were proposed in this pull request?
For issue#239, Fix inconsistent blocks when reading shuffle data.

### Why are the changes needed?
This problem will cause reading shuffle data failed.


### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
Already added UT

Co-authored-by: leixianming <leixianming@didiglobal.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Problem] RssUtils#transIndexDataToSegments should consider the length of the data file
4 participants