Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BooleanBufferBuilder::append_packed (#1038) #1039

Merged
merged 7 commits into from
Jan 11, 2022

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #1038.

Rationale for this change

See ticket

What changes are included in this PR?

This copies the code from IOx into arrow proper

Are there any user-facing changes?

BooleanBufferBuilder::append_packed is public

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 13, 2021
@codecov-commenter
Copy link

Codecov Report

Merging #1039 (bf04099) into master (e0abda2) will increase coverage by 0.02%.
The diff coverage is 98.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1039      +/-   ##
==========================================
+ Coverage   82.31%   82.33%   +0.02%     
==========================================
  Files         168      168              
  Lines       49031    49094      +63     
==========================================
+ Hits        40359    40421      +62     
- Misses       8672     8673       +1     
Impacted Files Coverage Δ
arrow/src/array/builder.rs 87.06% <98.41%> (+0.42%) ⬆️
arrow/src/array/transform/mod.rs 85.10% <0.00%> (-0.14%) ⬇️
arrow/src/datatypes/field.rs 53.68% <0.00%> (+0.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0abda2...bf04099. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow/src/array/builder.rs Outdated Show resolved Hide resolved
@jhorstmann
Copy link
Contributor

Logic looks correct to me. There is code with a similar goal already in transform/utils.rs which is used when concatenating boolean arrays and null bitmaps. That code uses chunked iteration, so in theory could append 64 bits at once, but it's also a bit more complex and indirect. Could be interesting to benchmark both approaches.

@tustvold
Copy link
Contributor Author

tustvold commented Dec 16, 2021

Added packed_append_range, also from IOx, as I needed it for #1037

@jhorstmann I will look into benchmarking the approaches and report back. I'll mark this as a draft until I find time for this

@tustvold tustvold marked this pull request as draft December 16, 2021 15:23
@tustvold
Copy link
Contributor Author

Turns out this currently runs into #1051 which is fixed by #1052

@alamb
Copy link
Contributor

alamb commented Dec 20, 2021

#1052 is merged now

@@ -18,6 +18,7 @@
#[cfg(feature = "test_utils")]
pub mod bench_util;
pub mod bit_chunk_iterator;
pub(crate) mod bit_mask;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be made pub mod and then we could use it in IOx but wasn't sure about this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it as pub(crate) until the next round of parquet finagling is complete and then make a separate decision (PR) about if we should make it public or not

@@ -33,49 +26,6 @@ pub(super) fn resize_for_bits(buffer: &mut MutableBuffer, len: usize) {
}
}

/// sets all bits on `write_data` on the range `[offset_write..offset_write+len]` to be equal to the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to arrow::util::bit_mask so that it can be used within arrow::array::builder

@@ -0,0 +1,193 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split out from arrow::array::transform::util

@tustvold
Copy link
Contributor Author

As per @jhorstmann 's great suggestion, this instead now lifts the already extant code up from arrow::array::transform::util into arrow::util::bit_mask. Aside from being less code, it also appears to be faster 🎉

@tustvold tustvold marked this pull request as ready for review January 11, 2022 13:45
@@ -2713,7 +2734,8 @@ mod tests {
let buffer = b.finish();
assert_eq!(1, buffer.len());

let mut b = BooleanBufferBuilder::new(4);
// Overallocate capacity
let mut b = BooleanBufferBuilder::new(8);
Copy link
Contributor Author

@tustvold tustvold Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional test for #1051

Edit: I think this is actually just the diff being unhelpful - this code exists on master...

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the changes and they look good to me. Thanks for the reviews @jhorstmann and @Dandandan

Will merge this when CI is happy

Extend, _MutableArrayData,
utils::{resize_for_bits, set_bits},
};
use crate::util::bit_mask::set_bits;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be explicit the bit_mask crate is not exposed publically

@@ -18,6 +18,7 @@
#[cfg(feature = "test_utils")]
pub mod bench_util;
pub mod bit_chunk_iterator;
pub(crate) mod bit_mask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it as pub(crate) until the next round of parquet finagling is complete and then make a separate decision (PR) about if we should make it public or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BooleanBufferBuilder Append Packed
5 participants