Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result #7837

Closed
wants to merge 4 commits into from

Conversation

liyafan82
Copy link
Contributor

@github-actions
Copy link

i -> String.valueOf(vector.get(i))).collect(Collectors.toList())) + "]";

assertEquals(
"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, " +
Copy link

@chairmank chairmank Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little strange to construct a String for this assertion. I think that it would be better to collect the values into a int[] array and use Assertions#assertArrayEquals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised the code accordingly. Thanks for the good suggestion.

@chairmank
Copy link

What do we think about adding tests that use randomly generated data?

@liyafan82
Copy link
Contributor Author

What do we think about adding tests that use randomly generated data?

Sounds reasonable. I have added test cases for random data. Please check.

@chairmank
Copy link

This pull request looks good to me. 👍

int strLength = random.nextInt(20) + 1;
byte[] str = new byte[strLength];
random.nextBytes(str);
return new String(str);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this can generate strings that are not valid UTF-8. VarCharVector does not enforce valid UTF-*, but the Arrow specification does specify UTF-8, so it could be confusing for someone who reads this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I have revised the code so that random strings can only contain alphabetic characters.

/**
* Utilities for sorting related utilities.
*/
public class TestSortingUtil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily for this PR but factoring out random data generators not specific to sorting would be nice to have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in the vector pacakage along with Vector Populator)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the good suggestion.
I have extracted the utility of generating random data to a separate class (RandomDataGenerator). Please check it.

@@ -44,6 +45,13 @@ public void sortOutOfPlace(V srcVector, V dstVector, VectorValueComparator<V> co
ArrowBuf dstValidityBuffer = dstVector.getValidityBuffer();
ArrowBuf dstValueBuffer = dstVector.getDataBuffer();

// check buffer size
Preconditions.checkArgument(dstValidityBuffer.capacity() * 8 >= srcVector.getValueCount(),
"No enough capacity for the validity buffer of the dst vector");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and other locations "No" -> "Not". It might also pay to include actual values using format strings in the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Thank you.
I have revised the code accordingly.

@emkornfield
Copy link
Contributor

@liyafan82 one small comment on a typo/better error message otherwise this looks good to me.

dstOffsetBuffer.capacity() >= srcVector.getValueCount() * BaseVariableWidthVector.OFFSET_WIDTH,
"Not enough capacity for the offset buffer of the dst vector. " +
"Expected capacity %s, actual capacity %s",
srcVector.getValueCount() * BaseVariableWidthVector.OFFSET_WIDTH, dstValueBuffer.capacity());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dstValueBuffer -> dstOffsetBuffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Thank you.

@emkornfield
Copy link
Contributor

@liyafan82 any reason why this can't be merged?

@liyafan82
Copy link
Contributor Author

@liyafan82 any reason why this can't be merged?

@emkornfield I don't think so.
I am going to merge this in a few days, if there is no more comment.
Thank all reviewers for the effort and good comments!

@liyafan82
Copy link
Contributor Author

Merging this. Thanks to all reviews for the good comments.

@liyafan82 liyafan82 closed this in 3fb1356 Aug 24, 2020
emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 16, 2020
…rong result

See https://issues.apache.org/jira/browse/ARROW-9554

Closes apache#7837 from liyafan82/fly_0727_st

Authored-by: liyafan82 <fan_li_ya@foxmail.com>
Signed-off-by: liyafan82 <fan_li_ya@foxmail.com>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…rong result

See https://issues.apache.org/jira/browse/ARROW-9554

Closes apache#7837 from liyafan82/fly_0727_st

Authored-by: liyafan82 <fan_li_ya@foxmail.com>
Signed-off-by: liyafan82 <fan_li_ya@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants