-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1341] fix(mr): Fix MR Combiner ArrayIndexOutOfBoundsException Bug. #1666
Conversation
@sunshineJK Would you please review this PR? |
Test Results 2 384 files + 84 2 384 suites +84 4h 38m 28s ⏱️ + 37m 10s Results for commit 76e15b8. ± Comparison against base commit 45ad0b8. This pull request removes 1 and adds 3 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
Could you a ut for this case? |
@qijiale76 hi, i made some tests and found out that this PR works fine with input datas-212.txt (212MB) or any file size <= 212MB, but failed with datas-213.txt (213MB ). org.apache.hadoop.mapred.MapTask$NewOutputCollector@66971f6b
org.apache.uniffle.common.exception.RssException: Can't support sequence [4097], the max value should be 4095
at org.apache.hadoop.mapreduce.RssMRUtils.getBlockId(RssMRUtils.java:237)
at org.apache.hadoop.mapred.SortWriteBufferManager.createShuffleBlock(SortWriteBufferManager.java:390) other info: hadoop client: hadoop-3.2.1
nodemanager: hadoop-3.2.1
rss: master branch (Apri 19, 2024) with or without PR 1666 |
@qijiale76 Can you add UT? |
client-mr/core/src/main/java/org/apache/hadoop/mapred/SortWriteBuffer.java
Outdated
Show resolved
Hide resolved
client-mr/core/src/main/java/org/apache/hadoop/mapred/SortWriteBuffer.java
Outdated
Show resolved
Hide resolved
LGTM+1 |
This pull request will copy data one more time. Will it only copy data when we don't use the combine? |
|
OK, I got it. |
Merged to master & branch 0.9 . Thanks @qijiale76 @Lobo2008 @zhengchenyu |
…1666) ### What changes were proposed in this pull request? The current implementation of the SortBufferIterator.getKey() and .getValue() methods in the SortWriteBuffer class assumes that keys and values are always stored within a single buffer. This assumption can lead to runtime exceptions, specifically ArrayIndexOutOfBoundsException, when keys or values span multiple WrappedBuffer instances. In the current implementation, due to the execution of the `compact()`, the data of the key will not span buffers. However, the data of the value may be located on different buffers from the key and may span multiple buffers. This PR update getKey() to use fetchDataFromBuffers() to retrieve the key data. And this PR update getValue() to first adjust the starting index and offset based on the length of the key and then use fetchDataFromBuffers() to retrieve the value data. Another solution to this bug is to modify `addRecord()` method by keeping key and value in the same WrappedBuffer. ### Why are the changes needed? Fix: #1341 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested manually. I'll submit a new integration test later.
… Bug. (apache#1666) ### What changes were proposed in this pull request? The current implementation of the SortBufferIterator.getKey() and .getValue() methods in the SortWriteBuffer class assumes that keys and values are always stored within a single buffer. This assumption can lead to runtime exceptions, specifically ArrayIndexOutOfBoundsException, when keys or values span multiple WrappedBuffer instances. In the current implementation, due to the execution of the `compact()`, the data of the key will not span buffers. However, the data of the value may be located on different buffers from the key and may span multiple buffers. This PR update getKey() to use fetchDataFromBuffers() to retrieve the key data. And this PR update getValue() to first adjust the starting index and offset based on the length of the key and then use fetchDataFromBuffers() to retrieve the value data. Another solution to this bug is to modify `addRecord()` method by keeping key and value in the same WrappedBuffer. ### Why are the changes needed? Fix: apache#1341 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested manually. I'll submit a new integration test later.
What changes were proposed in this pull request?
The current implementation of the SortBufferIterator.getKey() and .getValue() methods in the SortWriteBuffer class assumes that keys and values are always stored within a single buffer. This assumption can lead to runtime exceptions, specifically ArrayIndexOutOfBoundsException, when keys or values span multiple WrappedBuffer instances.
In the current implementation, due to the execution of the
compact()
, the data of the key will not span buffers. However, the data of the value may be located on different buffers from the key and may span multiple buffers.This PR update getKey() to use fetchDataFromBuffers() to retrieve the key data. And this PR update getValue() to first adjust the starting index and offset based on the length of the key and then use fetchDataFromBuffers() to retrieve the value data.
Another solution to this bug is to modify
addRecord()
method by keeping key and value in the same WrappedBuffer.Why are the changes needed?
Fix: #1341
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Tested manually. I'll submit a new integration test later.