Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate bitmask on BooleanBufferBuilder::resize: #1183

Merged
merged 3 commits into from
Jan 17, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Jan 15, 2022

Which issue does this PR close?

Relates to #1037 .

Rationale for this change

When splitting a null bitmask off, the code added in #1054 would return the entire bitmask that was read. This isn't a problem as ArrayData doesn't care about trailing data in buffers, in fact it is critical for packed bitmasks and array data slices to work, but it can be a tad surprising.

What changes are included in this PR?

Modifies DefinitionLevelBuffer::split_bitmask to truncate the returned Bitmap for the avoidance of confusion. Also uses a more efficient append_packed_range to construct the remainder buffer.

Are there any user-facing changes?

Adds a BooleanBufferBuilder::resize method

@github-actions github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels Jan 15, 2022

#[test]
fn test_split_off() {
let t = Type::primitive_type_builder("col", PhysicalType::INT32)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find a more ergonomic way to construct this type

@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2022

Codecov Report

Merging #1183 (abdc565) into master (66b84f3) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1183      +/-   ##
==========================================
+ Coverage   82.66%   82.67%   +0.01%     
==========================================
  Files         173      173              
  Lines       50902    50929      +27     
==========================================
+ Hits        42077    42105      +28     
+ Misses       8825     8824       -1     
Impacted Files Coverage Δ
arrow/src/array/builder.rs 86.76% <100.00%> (+0.14%) ⬆️
...rquet/src/arrow/record_reader/definition_levels.rs 88.55% <100.00%> (+0.99%) ⬆️
arrow/src/datatypes/datatype.rs 66.38% <0.00%> (-0.43%) ⬇️
arrow/src/buffer/immutable.rs 98.92% <0.00%> (ø)
arrow/src/datatypes/native.rs 66.66% <0.00%> (ø)
arrow/src/array/transform/mod.rs 85.69% <0.00%> (+0.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 66b84f3...abdc565. Read the comment docs.

#[inline]
pub fn resize(&mut self, len: usize) {
let len_bytes = bit_util::ceil(len, 8);
self.buffer.resize(len_bytes, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also set self.len?

Copy link
Contributor Author

@tustvold tustvold Jan 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes 🤦

Will fix and add a test

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tustvold and @jhorstmann

@alamb alamb merged commit 713bd6d into apache:master Jan 17, 2022
@alamb alamb changed the title Truncate bitmask on split Truncate bitmask on BooleanBufferBuilder::resize: Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants