[ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file #275

leixm · 2022-10-21T02:40:30Z

What changes were proposed in this pull request?

For issue#239, Fix inconsistent blocks when reading shuffle data.

Why are the changes needed?

This problem will cause reading shuffle data failed.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Already added UT

…h of the data file

codecov-commenter · 2022-10-21T03:02:30Z

Codecov Report

Merging #275 (b007acb) into master (7a2f0ef) will increase coverage by 0.21%.
The diff coverage is 70.37%.

@@             Coverage Diff              @@
##             master     #275      +/-   ##
============================================
+ Coverage     59.71%   59.92%   +0.21%     
- Complexity     1377     1384       +7     
============================================
  Files           166      166              
  Lines          8918     8936      +18     
  Branches        853      854       +1     
============================================
+ Hits           5325     5355      +30     
+ Misses         3318     3302      -16     
- Partials        275      279       +4

Impacted Files	Coverage Δ
...pache/uniffle/server/ShuffleServerGrpcService.java	`0.87% <0.00%> (-0.01%)`	⬇️
...e/storage/handler/impl/HdfsShuffleReadHandler.java	`55.73% <50.00%> (-1.41%)`	⬇️
.../java/org/apache/uniffle/common/util/RssUtils.java	`69.04% <66.66%> (-0.28%)`	⬇️
...e/uniffle/storage/handler/impl/HdfsFileReader.java	`81.81% <80.00%> (-0.95%)`	⬇️
...orage/handler/impl/LocalFileServerReadHandler.java	`78.33% <80.00%> (+0.36%)`	⬆️
.../org/apache/uniffle/common/ShuffleIndexResult.java	`100.00% <100.00%> (ø)`
...orage/handler/impl/LocalFileClientReadHandler.java	`65.51% <0.00%> (+65.51%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

leixm · 2022-10-21T07:10:39Z

@zuston Can you help review this pr.

zuston · 2022-10-21T09:00:25Z

common/src/main/java/org/apache/uniffle/common/util/RssUtils.java

@@ -218,6 +220,13 @@ private static List<ShuffleDataSegment> transIndexDataToSegments(byte[] indexDat

        bufferSegments.add(new BufferSegment(blockId, bufferOffset, length, uncompressLength, crc, taskAttemptId));
        bufferOffset += length;
+        totalLength += length;
+
+        // If ShuffleServer is flushing the file at this time, the length in the index file record may be greater


I think this problem only occur that the map tasks are all finished and the data stored in memory is flushed to localfile/HDFS, right? And in this time, the spark client read the redundant index data. Right?

Analyzed from this perspective, if u drop these redundant data, does it will cause the data missing problem due to in hdfs client buffer insteading of memory/HDFS. I think it wont. The flushing data only will be flushed to HDFS and then removed from memory. And the method of dataWriter.close() in HdfsShuffleWriteHandler will ensure the data flushed to HDFS.

So this change is OK and wont cause data lost. But I have a question that why not calling the dataWriter.flush and indexWriter.flush when writing one block to solve this problem. Does this will make performance regession?

It seems unreasonable to do a flush every time a block is written, so that the buffer of the hdfs client will not work, and it will make performance regession.

Make sense. This changelog looks OK for me

jerqi · 2022-10-21T14:17:06Z

Will it occur similar situation for local file storage type?

leixm · 2022-10-24T02:58:27Z

Our production environment does not use localfile, so I am not quite sure if the same problem would exist.

leixm · 2022-10-24T03:03:25Z

The key question is whether it is possible to have more blocks in the index file than in the data file, which seems possible based on the code analysis, but I suggest to refer to the actual production environment.

zuston · 2022-10-24T07:02:43Z

Will it occur similar situation for local file storage type?

I think it also will happen in localfile type.

leixm · 2022-10-24T07:28:28Z

Will it occur similar situation for local file storage type?

I think it also will happen in localfile type.

We can consider the length of the data file when ShuffleServer#getLocalShuffleIndex generates index information, what do you think?

zuston · 2022-10-24T07:38:53Z

We can consider the length of the data file when ShuffleServer#getLocalShuffleIndex generates index information, what do you think?

It's OK for me.

jerqi · 2022-10-24T07:43:08Z

We can consider the length of the data file when ShuffleServer#getLocalShuffleIndex generates index information, what do you think?

It's OK for me.

+1

storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsFileReader.java

jerqi · 2022-10-26T11:44:49Z

LGTM, wait for CI. Do you have another suggestion? @zuston

zuston

LGTM +1

…e length of the data file (#275) ### What changes were proposed in this pull request? For issue#239, Fix inconsistent blocks when reading shuffle data. ### Why are the changes needed? This problem will cause reading shuffle data failed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Already added UT Co-authored-by: leixianming <leixianming@didiglobal.com>

[Problem] RssUtils#transIndexDataToSegments should consider the lengt…

be1385b

…h of the data file

leixm changed the title ~~[Problem] RssUtils#transIndexDataToSegments should consider the length of the data file~~ [ISSUE-239][Problem] RssUtils#transIndexDataToSegments should consider the length of the data file Oct 21, 2022

jerqi requested a review from zuston October 21, 2022 03:04

jerqi linked an issue Oct 21, 2022 that may be closed by this pull request

[Problem] RssUtils#transIndexDataToSegments should consider the length of the data file #239

Closed

zuston reviewed Oct 21, 2022

View reviewed changes

jerqi changed the title ~~[ISSUE-239][Problem] RssUtils#transIndexDataToSegments should consider the length of the data file~~ [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file Oct 21, 2022

leixianming added 2 commits October 25, 2022 11:10

Fix LOCALFILE

ecc4654

Add license

b007acb

jerqi reviewed Oct 26, 2022

View reviewed changes

storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsFileReader.java Outdated Show resolved Hide resolved

Fix HdfsFileReader#getFileLen

8178c20

zuston approved these changes Oct 27, 2022

View reviewed changes

jerqi approved these changes Oct 27, 2022

View reviewed changes

jerqi merged commit 2429c67 into apache:master Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file #275

[ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file #275

leixm commented Oct 21, 2022

codecov-commenter commented Oct 21, 2022 •

edited

leixm commented Oct 21, 2022

zuston Oct 21, 2022 •

edited

leixm Oct 21, 2022

zuston Oct 21, 2022

jerqi commented Oct 21, 2022

leixm commented Oct 24, 2022

leixm commented Oct 24, 2022

zuston commented Oct 24, 2022

leixm commented Oct 24, 2022

zuston commented Oct 24, 2022

jerqi commented Oct 24, 2022

jerqi commented Oct 26, 2022

zuston left a comment

[ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file #275

[ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file #275

Conversation

leixm commented Oct 21, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

codecov-commenter commented Oct 21, 2022 • edited

Codecov Report

leixm commented Oct 21, 2022

zuston Oct 21, 2022 • edited

Choose a reason for hiding this comment

leixm Oct 21, 2022

Choose a reason for hiding this comment

zuston Oct 21, 2022

Choose a reason for hiding this comment

jerqi commented Oct 21, 2022

leixm commented Oct 24, 2022

leixm commented Oct 24, 2022

zuston commented Oct 24, 2022

leixm commented Oct 24, 2022

zuston commented Oct 24, 2022

jerqi commented Oct 24, 2022

jerqi commented Oct 26, 2022

zuston left a comment

Choose a reason for hiding this comment

codecov-commenter commented Oct 21, 2022 •

edited

zuston Oct 21, 2022 •

edited