Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Box RleDecoder index buffer (#1061) #1062

Merged
merged 2 commits into from
Dec 20, 2021

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Dec 19, 2021

Which issue does this PR close?

Closes #1061.

Rationale for this change

See ticket

What changes are included in this PR?

Changes the RleDecoder to box its index buffer, this reduces surprises from a 4Kb RleDecoder struct, and reduces heap allocations if RleDecoder is itself heap allocated and RleDecoder::get_batch_with_dict is never called (e.g. for decoding levels).

Running the parquet benchmarks showed no discernible performance impact of this change, as expected

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Dec 19, 2021
@codecov-commenter
Copy link

Codecov Report

Merging #1062 (6b73a7d) into master (99b7d01) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1062      +/-   ##
==========================================
- Coverage   82.31%   82.31%   -0.01%     
==========================================
  Files         168      168              
  Lines       49056    49056              
==========================================
- Hits        40379    40378       -1     
- Misses       8677     8678       +1     
Impacted Files Coverage Δ
parquet/src/encodings/rle.rs 92.71% <100.00%> (ø)
arrow/src/datatypes/datatype.rs 65.95% <0.00%> (-0.43%) ⬇️
parquet_derive/src/parquet_field.rs 65.98% <0.00%> (-0.23%) ⬇️
arrow/src/datatypes/field.rs 53.68% <0.00%> (+0.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99b7d01...6b73a7d. Read the comment docs.

@@ -319,7 +319,7 @@ pub struct RleDecoder {
bit_reader: Option<BitReader>,

// Buffer used when `bit_reader` is not `None`, for batch reading.
index_buf: [i32; 1024],
index_buf: Option<Box<[i32; 1024]>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it, could we extract the 1024 to a const?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #1070

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tustvold

@@ -440,6 +440,8 @@ impl RleDecoder {

let mut values_read = 0;
while values_read < max_values {
let index_buf = self.index_buf.get_or_insert_with(|| Box::new([0; 1024]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL get_or_insert_with 👍

@alamb
Copy link
Contributor

alamb commented Dec 20, 2021

Merging this in to try and clear the PR backlog -- I implemented @Dandandan 's suggestion in #1070

@alamb alamb merged commit 127fb29 into apache:master Dec 20, 2021
alamb pushed a commit that referenced this pull request Dec 21, 2021
* Box RleDecoder index buffer (#1061)

* Format
alamb added a commit that referenced this pull request Dec 22, 2021
* Box RleDecoder index buffer (#1061)

* Format

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RleDecoder Inline Buffer
4 participants