Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1807: [Java] consolidate bufs to reduce heap #3121

Merged
merged 1 commit into from Dec 15, 2018

Conversation

pravindra
Copy link
Contributor

  • for fixed-len vectors, alloc a combined arrow buf for
    value and validity.
  • Remove the read-write locks in AllocationMgr, they
    contribute about 150 bytes to the heap, and aren't very useful
    since there isn't much contention.

- for fixed-len vectors, alloc a combined arrow buf for
  value and validity.
- Remove the read-write locks in AllocationMgr, they
  contribute about 150 bytes to the heap, and aren't very useful
  since there isn't much contention.
@codecov-io
Copy link

Codecov Report

Merging #3121 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3121      +/-   ##
==========================================
- Coverage   87.09%   87.07%   -0.03%     
==========================================
  Files         492      492              
  Lines       69160    69160              
==========================================
- Hits        60233    60219      -14     
- Misses       8830     8840      +10     
- Partials       97      101       +4
Impacted Files Coverage Δ
go/arrow/math/int64_avx2_amd64.go 0% <0%> (-100%) ⬇️
go/arrow/memory/memory_avx2_amd64.go 0% <0%> (-100%) ⬇️
go/arrow/math/float64_avx2_amd64.go 0% <0%> (-100%) ⬇️
go/arrow/math/uint64_avx2_amd64.go 0% <0%> (-100%) ⬇️
go/arrow/memory/memory_amd64.go 28.57% <0%> (-14.29%) ⬇️
go/arrow/math/math_amd64.go 31.57% <0%> (-5.27%) ⬇️
cpp/src/arrow/csv/column-builder.cc 95.45% <0%> (-1.95%) ⬇️
cpp/src/plasma/thirdparty/ae/ae.c 72.03% <0%> (-0.95%) ⬇️
go/arrow/math/float64_amd64.go 33.33% <0%> (ø) ⬆️
go/arrow/math/int64_amd64.go 33.33% <0%> (ø) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 072df89...40ead47. Read the comment docs.

int validityBufferSlice = (int)validityBufferSize;

/* allocate combined buffer */
ArrowBuf buffer = allocator.buffer(valueBufferSlice + validityBufferSlice);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit why does this reduce heap usage? We still end up with two ArrowBuf objects, but the fact that they are slices of a single ArrowBuf allows them to share some heap data structure?

Copy link
Contributor Author

@pravindra pravindra Dec 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

The current heap cost (shallow counts) for buffers in a IntVector is:

io.netty.buffer.ArrowBuf -> 2 * 109 = 218
io.netty.buffer.PooledUnsafeDirectBuf -> 2 * 116 = 232
io.netty.buffer.UnsafeDirectLittleEndian -> 2 * 48 = 96
io.netty.util.Recycler$DefaultHandle -> 2 * 41 = 82
arrow.memory.AllocationManager -> 2 * 100 = 200
arrow.memory.AllocationManager$BufferLedger -> 2 * 80 = 160
java.util.concurrent.locks.ReentrantReadWriteLock* -> 2 * 180 = 360
arrow.memory.AutoCloseableLock -> 4 * 24 = 96
arrow.memory.LowCostIdentityHashMap -> 2 * 32 = 64

Before Total = 1508 bytes

My change removes the locks, and shares all objects above except ArrowBuf

io.netty.buffer.ArrowBuf -> 2 * 109 = 218
io.netty.buffer.PooledUnsafeDirectBuf -> 1 * 116 = 116
io.netty.buffer.UnsafeDirectLittleEndian -> 1 * 48 = 48
io.netty.util.Recycler$DefaultHandle -> 1 * 41 = 41
arrow.memory.AllocationManager -> 1 * 76 = 76
arrow.memory.AllocationManager$BufferLedger -> 1 * 80 = 80
arrow.memory.LowCostIdentityHashMap -> 1 * 32 = 32

After total = 611 bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

We had recently done the exact same optimization inside Dremio code by slicing a single large ArrowBuf and handing over the resulting buffers to vectors. This reduced the heap overhead (due to large volume of ArrowBufs) from 1GB to 250MB for very memory intensive aggregation queries.

https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/aggregate/vectorized/AccumulatorSet.java#L93

This reduced heap overhead drastically but the downside was OOMs became frequent since we are asking for a very largebuffer (due to combining both into one and allocator's power of 2 semantics). So we implemented two additional variants of the algorithm to optimize usage of both heap and direct memory.

I don't think those concerns are applicable here in the context of a single vector.

However, I am just trying to recall that when we created this JIRA with the goal to reduce heap usage in vectors, I think the proposal was to just have a single buffer as opposed to having two buffers sliced from a single buffer and then getting rid of latter.

For example, in case of fixed width vectors, we can pack validity and data into a single buffer. For variable width vectors, we can pack offsets and validity into a single buffer. Similarly for list, we can combine offset and validity into one buffer.

I am wondering if that is even needed now since the heap reduction due to sliced buffer technique is significant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if that is even needed now since the heap reduction due to sliced buffer
technique is significant

yes, @siddharthteotia. We could save another 80 odd bytes on the heap by using a single buffer but it causes complexity in the code due to the following reasons

  • There are places in the code where the caller assumes a FixedWidthVector to contain two arrow buffers (one for validity and one for value).
  • When we do a splitAndTransfer of an existing FixedWidthVector, the validity and data portions will not be contiguous. If the split happens at an unaligned boundary, we allocate a new validity buffer, but retain the value buffer. so, there will be too many cases to deal with (both contiguous, both in same buffer but not contiguous, in different buffers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes combining two buffers into 1 will require a lot of changes -- both in arrow and downstream consumers like Arrow where we assume the number of buffers.

I am merging this

@pravindra pravindra changed the title ARROW-1807: [WIP] [Java] consolidate bufs to reduce heap ARROW-1807: [Java] consolidate bufs to reduce heap Dec 14, 2018
@siddharthteotia siddharthteotia merged commit ce12fb5 into apache:master Dec 15, 2018
@pravindra
Copy link
Contributor Author

Thanks @siddharthteotia

cav71 pushed a commit to cav71/arrow that referenced this pull request Dec 22, 2018
- for fixed-len vectors, alloc a combined arrow buf for
  value and validity.
- Remove the read-write locks in AllocationMgr, they
  contribute about 150 bytes to the heap, and aren't very useful
  since there isn't much contention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants