-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file #275
Conversation
…h of the data file
Codecov Report
@@ Coverage Diff @@
## master #275 +/- ##
============================================
+ Coverage 59.71% 59.92% +0.21%
- Complexity 1377 1384 +7
============================================
Files 166 166
Lines 8918 8936 +18
Branches 853 854 +1
============================================
+ Hits 5325 5355 +30
+ Misses 3318 3302 -16
- Partials 275 279 +4
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@zuston Can you help review this pr. |
@@ -218,6 +220,13 @@ private static List<ShuffleDataSegment> transIndexDataToSegments(byte[] indexDat | |||
|
|||
bufferSegments.add(new BufferSegment(blockId, bufferOffset, length, uncompressLength, crc, taskAttemptId)); | |||
bufferOffset += length; | |||
totalLength += length; | |||
|
|||
// If ShuffleServer is flushing the file at this time, the length in the index file record may be greater |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this problem only occur that the map tasks are all finished and the data stored in memory is flushed to localfile/HDFS, right? And in this time, the spark client read the redundant index data. Right?
Analyzed from this perspective, if u drop these redundant data, does it will cause the data missing problem due to in hdfs client buffer insteading of memory/HDFS. I think it wont. The flushing data only will be flushed to HDFS and then removed from memory. And the method of dataWriter.close()
in HdfsShuffleWriteHandler
will ensure the data flushed to HDFS.
So this change is OK and wont cause data lost. But I have a question that why not calling the dataWriter.flush
and indexWriter.flush
when writing one block to solve this problem. Does this will make performance regession?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems unreasonable to do a flush every time a block is written, so that the buffer of the hdfs client will not work, and it will make performance regession.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. This changelog looks OK for me
Will it occur similar situation for local file storage type? |
Our production environment does not use localfile, so I am not quite sure if the same problem would exist. |
The key question is whether it is possible to have more blocks in the index file than in the data file, which seems possible based on the code analysis, but I suggest to refer to the actual production environment. |
I think it also will happen in localfile type. |
We can consider the length of the data file when ShuffleServer#getLocalShuffleIndex generates index information, what do you think? |
It's OK for me. |
+1 |
storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsFileReader.java
Outdated
Show resolved
Hide resolved
LGTM, wait for CI. Do you have another suggestion? @zuston |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
…e length of the data file (#275) ### What changes were proposed in this pull request? For issue#239, Fix inconsistent blocks when reading shuffle data. ### Why are the changes needed? This problem will cause reading shuffle data failed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Already added UT Co-authored-by: leixianming <leixianming@didiglobal.com>
What changes were proposed in this pull request?
For issue#239, Fix inconsistent blocks when reading shuffle data.
Why are the changes needed?
This problem will cause reading shuffle data failed.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Already added UT