New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9377: [Java] Support unsigned dictionary indices #7817
Conversation
try (VarCharVector dictionaryVector = new VarCharVector("dict vector", allocator)) { | ||
setVector(dictionaryVector, "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"); | ||
Dictionary dictionary1 = new Dictionary(dictionaryVector, | ||
new DictionaryEncoding(/*id=*/10L, /*ordered=*/false, /*indexType=*/ new ArrowType.Int(8, false))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please provide a parameter comment for bit_width and "false". here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter comments added. Thanks for your kind reminder.
try (UInt2Vector vector = new UInt2Vector("vector", allocator)) { | ||
vector.allocateNew(2); | ||
|
||
vector.setWithPossibleTruncate(0, 0xffffL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a constant someplace (or can this be moved to a constant on Uint8 vector) to better indicating you are using MAX_INT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
I've extracted constants for maximum unsigned integers, and placed them in corresponding UIntXvector classes.
@Test | ||
public void testUInt1RoundTrip() throws IOException { | ||
try (UInt1Vector encodedVector1 = (UInt1Vector) createEncodedVector(8)) { | ||
setVector(encodedVector1, (byte) 1, (byte) 3, (byte) 5, (byte) 7, (byte) 9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should max value round trips be be added here and below as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. I have revised the test case to include max for UInt1 and UInt2.
public void init() { | ||
allocator = new RootAllocator(Long.MAX_VALUE); | ||
dictionaryVector = new VarCharVector("dict vector", allocator); | ||
setVector(dictionaryVector, "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be nice to verify dictionary for at least uint1 work for vectors of length > 127
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I have reivsed the test case to to make vector length > 127. Pls check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments on improving test coverage otherwise I think this seems reasonable.
Thanks @liyafan82 |
See https://issues.apache.org/jira/browse/ARROW-9377 Closes apache#7817 from liyafan82/fly_0721_uns Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
See https://issues.apache.org/jira/browse/ARROW-9377 Closes apache#7817 from liyafan82/fly_0721_uns Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
See https://issues.apache.org/jira/browse/ARROW-9377