-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-4532: [Java] fix bug causing very large varchar value buffers #3613
Conversation
may be edit the commit message slightly to describe the problem? |
Yes. It helps for the changelog when other developers can see when a bug they run into was fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are style that might help somebody new to the code base ramp up quicker (i.e. me) so take them with a grain of salt.
for (int i = 0; i < numValues; i++) { | ||
int start = fromVector.getstartOffset(i); | ||
int end = fromVector.getstartOffset(i + 1); | ||
toVector.setSafe(i, 1, start, end, fromDataBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making this a named constant or putting in comments what the value represents might make the test easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
fromVector.setValueCount(numValues); | ||
ArrowBuf fromDataBuffer = fromVector.getDataBuffer(); | ||
assertEquals(BaseAllocator.nextPowerOfTwo(numValues * 11), fromDataBuffer.capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test seems like it is potentially brittle if the allocator expansion algorithm was changed. Is this an intrinsic property of this method? Maybe make this an inequality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
fromVector.setInitialCapacity(numValues); | ||
fromVector.allocateNew(); | ||
for (int i = 0; i < numValues; ++i) { | ||
fromVector.setSafe(i, "hello world".getBytes(), 0, 11); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might make the test easier to read if 0 and 11 where constants or had comments next to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
int end = fromVector.getstartOffset(i + 1); | ||
toVector.setSafe(i, 1, start, end, fromDataBuffer); | ||
} | ||
assertEquals(fromDataBuffer.capacity(), toVector.getDataBuffer().capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might pay to add a comment why the capacity is expected to be equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
MinorType.VARCHAR, allocator); | ||
final VarCharVector toVector = newVector(VarCharVector.class, EMPTY_SCHEMA_PATH, | ||
MinorType.VARCHAR, allocator)) { | ||
fromVector.setInitialCapacity(numValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might make the unit test easier to read if you separated each section by a comment.
i.e.
//Setup
// Execute
// Verify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've added comments.
@@ -1394,6 +1394,35 @@ public void testFillEmptiesNotOverfill() { | |||
} | |||
} | |||
|
|||
@Test | |||
public void testSetSafeWithArrowBuf() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add to the title doesntAllocateExcessiveMemory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@pravindra thanks! |
Codecov Report
@@ Coverage Diff @@
## master #3613 +/- ##
==========================================
- Coverage 87.78% 87.77% -0.01%
==========================================
Files 673 673
Lines 82776 82839 +63
Branches 1069 1069
==========================================
+ Hits 72661 72713 +52
- Misses 10004 10011 +7
- Partials 111 115 +4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
Thanks @pravindra for the comments, they are really good for understanding what is going on!
The varchar/varbinary vectors have a setSafe variant that takes an arrow buf and a start/end offsets. The method needs to expand the buffer to make space for 'end - start' bytes. but, due to a bug, it was expanding it to accommodate 'end' bytes, thereby making the value buffer much larger than required. Author: Pindikura Ravindra <ravindra@dremio.com> Closes apache#3613 from pravindra/varchar and squashes the following commits: 1b4f224 <Pindikura Ravindra> ARROW-4532: fix review comments 8f88e3a <Pindikura Ravindra> ARROW-4532: fix a size compute bug
The varchar/varbinary vectors have a setSafe variant that takes an arrow buf and a start/end offsets. The method needs to expand the buffer to make space for 'end - start' bytes. but, due to a bug, it was expanding it to accommodate 'end' bytes, thereby making the value buffer much larger than required.