ARROW-1807: [Java] consolidate bufs to reduce heap #3121

pravindra · 2018-12-07T12:51:12Z

for fixed-len vectors, alloc a combined arrow buf for
value and validity.
Remove the read-write locks in AllocationMgr, they
contribute about 150 bytes to the heap, and aren't very useful
since there isn't much contention.

- for fixed-len vectors, alloc a combined arrow buf for value and validity. - Remove the read-write locks in AllocationMgr, they contribute about 150 bytes to the heap, and aren't very useful since there isn't much contention.

codecov-io · 2018-12-07T13:35:39Z

Codecov Report

Merging #3121 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3121      +/-   ##
==========================================
- Coverage   87.09%   87.07%   -0.03%     
==========================================
  Files         492      492              
  Lines       69160    69160              
==========================================
- Hits        60233    60219      -14     
- Misses       8830     8840      +10     
- Partials       97      101       +4

Impacted Files	Coverage Δ
go/arrow/math/int64_avx2_amd64.go	`0% <0%> (-100%)`	⬇️
go/arrow/memory/memory_avx2_amd64.go	`0% <0%> (-100%)`	⬇️
go/arrow/math/float64_avx2_amd64.go	`0% <0%> (-100%)`	⬇️
go/arrow/math/uint64_avx2_amd64.go	`0% <0%> (-100%)`	⬇️
go/arrow/memory/memory_amd64.go	`28.57% <0%> (-14.29%)`	⬇️
go/arrow/math/math_amd64.go	`31.57% <0%> (-5.27%)`	⬇️
cpp/src/arrow/csv/column-builder.cc	`95.45% <0%> (-1.95%)`	⬇️
cpp/src/plasma/thirdparty/ae/ae.c	`72.03% <0%> (-0.95%)`	⬇️
go/arrow/math/float64_amd64.go	`33.33% <0%> (ø)`	⬆️
go/arrow/math/int64_amd64.go	`33.33% <0%> (ø)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 072df89...40ead47. Read the comment docs.

icexelloss · 2018-12-07T20:41:12Z

java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java

+    int validityBufferSlice = (int)validityBufferSize;
+
+    /* allocate combined buffer */
+    ArrowBuf buffer = allocator.buffer(valueBufferSlice + validityBufferSlice);


Can you explain a bit why does this reduce heap usage? We still end up with two ArrowBuf objects, but the fact that they are slices of a single ArrowBuf allows them to share some heap data structure?

Yes.

The current heap cost (shallow counts) for buffers in a IntVector is:

io.netty.buffer.ArrowBuf -> 2 * 109 = 218
io.netty.buffer.PooledUnsafeDirectBuf -> 2 * 116 = 232
io.netty.buffer.UnsafeDirectLittleEndian -> 2 * 48 = 96
io.netty.util.Recycler$DefaultHandle -> 2 * 41 = 82
arrow.memory.AllocationManager -> 2 * 100 = 200
arrow.memory.AllocationManager$BufferLedger -> 2 * 80 = 160
java.util.concurrent.locks.ReentrantReadWriteLock* -> 2 * 180 = 360
arrow.memory.AutoCloseableLock -> 4 * 24 = 96
arrow.memory.LowCostIdentityHashMap -> 2 * 32 = 64

Before Total = 1508 bytes

My change removes the locks, and shares all objects above except ArrowBuf

io.netty.buffer.ArrowBuf -> 2 * 109 = 218
io.netty.buffer.PooledUnsafeDirectBuf -> 1 * 116 = 116
io.netty.buffer.UnsafeDirectLittleEndian -> 1 * 48 = 48
io.netty.util.Recycler$DefaultHandle -> 1 * 41 = 41
arrow.memory.AllocationManager -> 1 * 76 = 76
arrow.memory.AllocationManager$BufferLedger -> 1 * 80 = 80
arrow.memory.LowCostIdentityHashMap -> 1 * 32 = 32

After total = 611 bytes

+1

We had recently done the exact same optimization inside Dremio code by slicing a single large ArrowBuf and handing over the resulting buffers to vectors. This reduced the heap overhead (due to large volume of ArrowBufs) from 1GB to 250MB for very memory intensive aggregation queries.

https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/aggregate/vectorized/AccumulatorSet.java#L93

This reduced heap overhead drastically but the downside was OOMs became frequent since we are asking for a very largebuffer (due to combining both into one and allocator's power of 2 semantics). So we implemented two additional variants of the algorithm to optimize usage of both heap and direct memory.

I don't think those concerns are applicable here in the context of a single vector.

However, I am just trying to recall that when we created this JIRA with the goal to reduce heap usage in vectors, I think the proposal was to just have a single buffer as opposed to having two buffers sliced from a single buffer and then getting rid of latter.

For example, in case of fixed width vectors, we can pack validity and data into a single buffer. For variable width vectors, we can pack offsets and validity into a single buffer. Similarly for list, we can combine offset and validity into one buffer.

I am wondering if that is even needed now since the heap reduction due to sliced buffer technique is significant

I am wondering if that is even needed now since the heap reduction due to sliced buffer
technique is significant

yes, @siddharthteotia. We could save another 80 odd bytes on the heap by using a single buffer but it causes complexity in the code due to the following reasons

There are places in the code where the caller assumes a FixedWidthVector to contain two arrow buffers (one for validity and one for value).

When we do a splitAndTransfer of an existing FixedWidthVector, the validity and data portions will not be contiguous. If the split happens at an unaligned boundary, we allocate a new validity buffer, but retain the value buffer. so, there will be too many cases to deal with (both contiguous, both in same buffer but not contiguous, in different buffers).

yes combining two buffers into 1 will require a lot of changes -- both in arrow and downstream consumers like Arrow where we assume the number of buffers.

I am merging this

pravindra · 2018-12-15T02:01:22Z

Thanks @siddharthteotia

- for fixed-len vectors, alloc a combined arrow buf for value and validity. - Remove the read-write locks in AllocationMgr, they contribute about 150 bytes to the heap, and aren't very useful since there isn't much contention.

ARROW-1807: [Java] consolidate bufs to reduce heap

40ead47

- for fixed-len vectors, alloc a combined arrow buf for value and validity. - Remove the read-write locks in AllocationMgr, they contribute about 150 bytes to the heap, and aren't very useful since there isn't much contention.

icexelloss reviewed Dec 7, 2018

View reviewed changes

pravindra changed the title ~~ARROW-1807: [WIP] [Java] consolidate bufs to reduce heap~~ ARROW-1807: [Java] consolidate bufs to reduce heap Dec 14, 2018

siddharthteotia approved these changes Dec 15, 2018

View reviewed changes

siddharthteotia merged commit ce12fb5 into apache:master Dec 15, 2018

asfimport mentioned this pull request Mar 12, 2019

[JAVA] Reduce Heap Usage (Phase 3): consolidate buffers #17804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-1807: [Java] consolidate bufs to reduce heap #3121

ARROW-1807: [Java] consolidate bufs to reduce heap #3121

pravindra commented Dec 7, 2018

codecov-io commented Dec 7, 2018

icexelloss Dec 7, 2018

pravindra Dec 8, 2018 •

edited

siddharthteotia Dec 12, 2018

pravindra Dec 14, 2018

siddharthteotia Dec 15, 2018

pravindra commented Dec 15, 2018

ARROW-1807: [Java] consolidate bufs to reduce heap #3121

ARROW-1807: [Java] consolidate bufs to reduce heap #3121

Conversation

pravindra commented Dec 7, 2018

codecov-io commented Dec 7, 2018

Codecov Report

icexelloss Dec 7, 2018

Choose a reason for hiding this comment

pravindra Dec 8, 2018 • edited

Choose a reason for hiding this comment

siddharthteotia Dec 12, 2018

Choose a reason for hiding this comment

pravindra Dec 14, 2018

Choose a reason for hiding this comment

siddharthteotia Dec 15, 2018

Choose a reason for hiding this comment

pravindra commented Dec 15, 2018

pravindra Dec 8, 2018 •

edited