PARQUET-852: Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder#401
PARQUET-852: Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder#401JohnPJenkins wants to merge 1 commit intoapache:masterfrom
Conversation
| totalFullSlabSize += slabSize; | ||
| if (slabSize < bitWidth * MAX_SLAB_SIZE_MULT) { | ||
| slabSize *= 2; | ||
| } |
There was a problem hiding this comment.
This looks fine
@rdblue @isnotinvain @piyushnarang does it look good to you?
@JohnPJenkins did you check the performance impact of this?
There was a problem hiding this comment.
I did not, only the memory impact. I'll give parquet-benchmarks a try.
piyushnarang
left a comment
There was a problem hiding this comment.
Changes look good to me. If possible to add a unit test or two to verify behavior, that would be great.
Like @julienledem mentioned, would be great if you could verify that this doesn't adversely impact performance - parquet benchmarks / real jobs.
|
@piyushnarang There appears to be such a test already, at https://github.com/apache/parquet-mr/blob/master/parquet-encoding/src/test/java/org/apache/parquet/column/values/bitpacking/TestByteBasedBitPackingEncoder.java . Looks like I can simply increase the upper limit of writes to encompass the full progression of buffer sizes. I'll push an update shortly. |
0c2efe5 to
f192ab0
Compare
|
Performance between the old and new code is equivalent modulo noise, at least as reported by parquet-benchmarks. |
|
@JohnPJenkins One thing you could add in the test is verifying that the slab count in the end is a reasonable number. @piyushnarang @isnotinvain @rdblue If that looks good to you, we should go ahead and merge |
f192ab0 to
73ec00a
Compare
73ec00a to
334acec
Compare
|
I updated the tests to check both expected slab count (by exposing a package-private accessor) and the expected buffer size. It caught a latent bug in getBufferSize - not-yet-packed values weren't being included in the size. Additionally, I tweaked the code to support encoders with a bit width of 0 for completeness - it was tested in the previous code and the generated bit packer code supports it (with a no-op). |
|
LGTM. |
|
👍 thanks for adding the updated test :-) |
…oder https://issues.apache.org/jira/browse/PARQUET-852 Author: John Jenkins <jjenkins@kcg.com> Closes #401 from JohnPJenkins/PARQUET-852 and squashes the following commits: 334acec [John Jenkins] PARQUET-852: Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder
https://issues.apache.org/jira/browse/PARQUET-852