New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1807: [Java] consolidate bufs to reduce heap #3121
Conversation
- for fixed-len vectors, alloc a combined arrow buf for value and validity. - Remove the read-write locks in AllocationMgr, they contribute about 150 bytes to the heap, and aren't very useful since there isn't much contention.
Codecov Report
@@ Coverage Diff @@
## master #3121 +/- ##
==========================================
- Coverage 87.09% 87.07% -0.03%
==========================================
Files 492 492
Lines 69160 69160
==========================================
- Hits 60233 60219 -14
- Misses 8830 8840 +10
- Partials 97 101 +4
Continue to review full report at Codecov.
|
int validityBufferSlice = (int)validityBufferSize; | ||
|
||
/* allocate combined buffer */ | ||
ArrowBuf buffer = allocator.buffer(valueBufferSlice + validityBufferSlice); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit why does this reduce heap usage? We still end up with two ArrowBuf objects, but the fact that they are slices of a single ArrowBuf allows them to share some heap data structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
The current heap cost (shallow counts) for buffers in a IntVector is:
io.netty.buffer.ArrowBuf -> 2 * 109 = 218
io.netty.buffer.PooledUnsafeDirectBuf -> 2 * 116 = 232
io.netty.buffer.UnsafeDirectLittleEndian -> 2 * 48 = 96
io.netty.util.Recycler$DefaultHandle -> 2 * 41 = 82
arrow.memory.AllocationManager -> 2 * 100 = 200
arrow.memory.AllocationManager$BufferLedger -> 2 * 80 = 160
java.util.concurrent.locks.ReentrantReadWriteLock* -> 2 * 180 = 360
arrow.memory.AutoCloseableLock -> 4 * 24 = 96
arrow.memory.LowCostIdentityHashMap -> 2 * 32 = 64
Before Total = 1508 bytes
My change removes the locks, and shares all objects above except ArrowBuf
io.netty.buffer.ArrowBuf -> 2 * 109 = 218
io.netty.buffer.PooledUnsafeDirectBuf -> 1 * 116 = 116
io.netty.buffer.UnsafeDirectLittleEndian -> 1 * 48 = 48
io.netty.util.Recycler$DefaultHandle -> 1 * 41 = 41
arrow.memory.AllocationManager -> 1 * 76 = 76
arrow.memory.AllocationManager$BufferLedger -> 1 * 80 = 80
arrow.memory.LowCostIdentityHashMap -> 1 * 32 = 32
After total = 611 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
We had recently done the exact same optimization inside Dremio code by slicing a single large ArrowBuf and handing over the resulting buffers to vectors. This reduced the heap overhead (due to large volume of ArrowBufs) from 1GB to 250MB for very memory intensive aggregation queries.
This reduced heap overhead drastically but the downside was OOMs became frequent since we are asking for a very largebuffer (due to combining both into one and allocator's power of 2 semantics). So we implemented two additional variants of the algorithm to optimize usage of both heap and direct memory.
I don't think those concerns are applicable here in the context of a single vector.
However, I am just trying to recall that when we created this JIRA with the goal to reduce heap usage in vectors, I think the proposal was to just have a single buffer as opposed to having two buffers sliced from a single buffer and then getting rid of latter.
For example, in case of fixed width vectors, we can pack validity and data into a single buffer. For variable width vectors, we can pack offsets and validity into a single buffer. Similarly for list, we can combine offset and validity into one buffer.
I am wondering if that is even needed now since the heap reduction due to sliced buffer technique is significant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if that is even needed now since the heap reduction due to sliced buffer
technique is significant
yes, @siddharthteotia. We could save another 80 odd bytes on the heap by using a single buffer but it causes complexity in the code due to the following reasons
- There are places in the code where the caller assumes a FixedWidthVector to contain two arrow buffers (one for validity and one for value).
- When we do a splitAndTransfer of an existing FixedWidthVector, the validity and data portions will not be contiguous. If the split happens at an unaligned boundary, we allocate a new validity buffer, but retain the value buffer. so, there will be too many cases to deal with (both contiguous, both in same buffer but not contiguous, in different buffers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes combining two buffers into 1 will require a lot of changes -- both in arrow and downstream consumers like Arrow where we assume the number of buffers.
I am merging this
Thanks @siddharthteotia |
- for fixed-len vectors, alloc a combined arrow buf for value and validity. - Remove the read-write locks in AllocationMgr, they contribute about 150 bytes to the heap, and aren't very useful since there isn't much contention.
value and validity.
contribute about 150 bytes to the heap, and aren't very useful
since there isn't much contention.