New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result #7837
Conversation
i -> String.valueOf(vector.get(i))).collect(Collectors.toList())) + "]"; | ||
|
||
assertEquals( | ||
"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little strange to construct a String
for this assertion. I think that it would be better to collect the values into a int[]
array and use Assertions#assertArrayEquals
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised the code accordingly. Thanks for the good suggestion.
What do we think about adding tests that use randomly generated data? |
Sounds reasonable. I have added test cases for random data. Please check. |
This pull request looks good to me. 👍 |
int strLength = random.nextInt(20) + 1; | ||
byte[] str = new byte[strLength]; | ||
random.nextBytes(str); | ||
return new String(str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this can generate strings that are not valid UTF-8. VarCharVector
does not enforce valid UTF-*, but the Arrow specification does specify UTF-8, so it could be confusing for someone who reads this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. I have revised the code so that random strings can only contain alphabetic characters.
/** | ||
* Utilities for sorting related utilities. | ||
*/ | ||
public class TestSortingUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily for this PR but factoring out random data generators not specific to sorting would be nice to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(in the vector pacakage along with Vector Populator)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the good suggestion.
I have extracted the utility of generating random data to a separate class (RandomDataGenerator). Please check it.
@@ -44,6 +45,13 @@ public void sortOutOfPlace(V srcVector, V dstVector, VectorValueComparator<V> co | |||
ArrowBuf dstValidityBuffer = dstVector.getValidityBuffer(); | |||
ArrowBuf dstValueBuffer = dstVector.getDataBuffer(); | |||
|
|||
// check buffer size | |||
Preconditions.checkArgument(dstValidityBuffer.capacity() * 8 >= srcVector.getValueCount(), | |||
"No enough capacity for the validity buffer of the dst vector"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: here and other locations "No" -> "Not". It might also pay to include actual values using format strings in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Thank you.
I have revised the code accordingly.
@liyafan82 one small comment on a typo/better error message otherwise this looks good to me. |
dstOffsetBuffer.capacity() >= srcVector.getValueCount() * BaseVariableWidthVector.OFFSET_WIDTH, | ||
"Not enough capacity for the offset buffer of the dst vector. " + | ||
"Expected capacity %s, actual capacity %s", | ||
srcVector.getValueCount() * BaseVariableWidthVector.OFFSET_WIDTH, dstValueBuffer.capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: dstValueBuffer
-> dstOffsetBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Thank you.
@liyafan82 any reason why this can't be merged? |
@emkornfield I don't think so. |
Merging this. Thanks to all reviews for the good comments. |
…rong result See https://issues.apache.org/jira/browse/ARROW-9554 Closes apache#7837 from liyafan82/fly_0727_st Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: liyafan82 <fan_li_ya@foxmail.com>
…rong result See https://issues.apache.org/jira/browse/ARROW-9554 Closes apache#7837 from liyafan82/fly_0727_st Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: liyafan82 <fan_li_ya@foxmail.com>
See https://issues.apache.org/jira/browse/ARROW-9554