Skip to content

PARQUET-852: Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder#401

Closed
JohnPJenkins wants to merge 1 commit intoapache:masterfrom
JohnPJenkins:PARQUET-852
Closed

PARQUET-852: Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder#401
JohnPJenkins wants to merge 1 commit intoapache:masterfrom
JohnPJenkins:PARQUET-852

Conversation

@JohnPJenkins
Copy link
Copy Markdown
Contributor

totalFullSlabSize += slabSize;
if (slabSize < bitWidth * MAX_SLAB_SIZE_MULT) {
slabSize *= 2;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine
@rdblue @isnotinvain @piyushnarang does it look good to you?
@JohnPJenkins did you check the performance impact of this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not, only the memory impact. I'll give parquet-benchmarks a try.

Copy link
Copy Markdown

@piyushnarang piyushnarang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. If possible to add a unit test or two to verify behavior, that would be great.
Like @julienledem mentioned, would be great if you could verify that this doesn't adversely impact performance - parquet benchmarks / real jobs.

@JohnPJenkins
Copy link
Copy Markdown
Contributor Author

@piyushnarang There appears to be such a test already, at https://github.com/apache/parquet-mr/blob/master/parquet-encoding/src/test/java/org/apache/parquet/column/values/bitpacking/TestByteBasedBitPackingEncoder.java . Looks like I can simply increase the upper limit of writes to encompass the full progression of buffer sizes. I'll push an update shortly.

@JohnPJenkins
Copy link
Copy Markdown
Contributor Author

Performance between the old and new code is equivalent modulo noise, at least as reported by parquet-benchmarks.

@julienledem
Copy link
Copy Markdown
Member

@JohnPJenkins One thing you could add in the test is verifying that the slab count in the end is a reasonable number.

@piyushnarang @isnotinvain @rdblue If that looks good to you, we should go ahead and merge

@JohnPJenkins
Copy link
Copy Markdown
Contributor Author

I updated the tests to check both expected slab count (by exposing a package-private accessor) and the expected buffer size. It caught a latent bug in getBufferSize - not-yet-packed values weren't being included in the size.

Additionally, I tweaked the code to support encoders with a bit width of 0 for completeness - it was tested in the previous code and the generated bit packer code supports it (with a no-op).

@julienledem
Copy link
Copy Markdown
Member

LGTM.
@piyushnarang @rdblue ?

@piyushnarang
Copy link
Copy Markdown

👍 thanks for adding the updated test :-)

@asfgit asfgit closed this in 1de41ef May 12, 2017
asfgit pushed a commit that referenced this pull request May 12, 2017
…oder

https://issues.apache.org/jira/browse/PARQUET-852

Author: John Jenkins <jjenkins@kcg.com>

Closes #401 from JohnPJenkins/PARQUET-852 and squashes the following commits:

334acec [John Jenkins] PARQUET-852: Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants