Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change ArrayDataBuilder::null_bit_buffer to accept Option<Buffer> rather than Buffer #1739

Merged
merged 6 commits into from
May 27, 2022

Conversation

HaoYang670
Copy link
Contributor

@HaoYang670 HaoYang670 commented May 24, 2022

Signed-off-by: remzi 13716567376yh@gmail.com

Which issue does this PR close?

Closes #1737.

Rationale for this change

  1. More user-friendly: expose the type of null_bit_buffer, so that users know null_bit_buffer is an optional field.
  2. Avoid writing code like this:
let mut builder = ArrayDataBuilder::new(...).add_buffer(...);
if let Some(null_bit_buffer) = null_bit_buffer {
   builder.null_bit_buffer(null_bit_buffer.clone())
}
...

Instead, we can directly write like this now:

let builder = ArrayDataBuilder::new(...).add_buffer(...).null_bit_buffer(null_bit_buffer.cloned());
  1. Maybe faster: avoid some useless pattern matching.
  2. Fewer muts are safer. Less code is better.

What changes are included in this PR?

Change the type of buf from Buffer to Option<Buffer>

Are there any user-facing changes?

Yes!

Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 marked this pull request as draft May 24, 2022 14:34
@github-actions github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels May 24, 2022
@codecov-commenter
Copy link

codecov-commenter commented May 24, 2022

Codecov Report

Merging #1739 (1020fce) into master (b265780) will decrease coverage by 0.00%.
The diff coverage is 96.04%.

@@            Coverage Diff             @@
##           master    #1739      +/-   ##
==========================================
- Coverage   83.36%   83.35%   -0.01%     
==========================================
  Files         196      196              
  Lines       56147    56122      -25     
==========================================
- Hits        46805    46782      -23     
+ Misses       9342     9340       -2     
Impacted Files Coverage Δ
integration-testing/src/lib.rs 0.00% <0.00%> (ø)
arrow/src/compute/kernels/filter.rs 88.33% <83.33%> (ø)
arrow/src/array/array.rs 89.67% <100.00%> (ø)
arrow/src/array/array_binary.rs 93.27% <100.00%> (-0.03%) ⬇️
arrow/src/array/array_dictionary.rs 91.91% <100.00%> (ø)
arrow/src/array/array_list.rs 96.16% <100.00%> (ø)
arrow/src/array/array_map.rs 84.81% <100.00%> (ø)
arrow/src/array/array_primitive.rs 95.06% <100.00%> (ø)
arrow/src/array/array_string.rs 97.72% <100.00%> (-0.01%) ⬇️
arrow/src/array/array_struct.rs 88.40% <100.00%> (-0.05%) ⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b265780...1020fce. Read the comment docs.

Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 marked this pull request as ready for review May 25, 2022 01:35
@HaoYang670
Copy link
Contributor Author

cc @alamb @tustvold @viirya
Please help to review. Thank you!

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this, but as this is a breaking change should probably get feedback from others.

Left a suggestion on how to make the null_count > 0 construction that appears in various places less verbose

builder = builder.null_bit_buffer(null_bit_buffer);
}
.add_buffer(self.values_builder.finish())
.null_bit_buffer(if null_count > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be written as (null_count > 0).then(|| null_bit_buffer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@alamb alamb added the api-change Changes to the arrow API label May 26, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks like a very nice change to me. While it will result in some code churn I think the API is low enough level it should not be super widely used and the churn will result in cleaner code downstream 👍

Thank you @HaoYang670

cc @jhorstmann (whose project I think uses these lower level APIs)

@@ -74,15 +74,11 @@ pub fn string_concat<Offset: OffsetSizeTrait>(
output_offsets.append(Offset::from_usize(output_values.len()).unwrap());
}

let mut builder =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this code is moved, this PR now has a conflict sadly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

@@ -287,24 +287,22 @@ fn create_primitive_array(
let array_data = match data_type {
Utf8 | Binary | LargeBinary | LargeUtf8 => {
// read 3 buffers
let mut builder = ArrayData::builder(data_type.clone())
ArrayData::builder(data_type.clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 the new pattern certainly look nicer in my opinion

@jhorstmann
Copy link
Contributor

Fully agree, this looks much more fluent than before.

@HaoYang670 HaoYang670 requested review from tustvold and alamb May 27, 2022 01:36
builder = builder.null_bit_buffer(null_bit_buffer.unwrap());
}
.add_buffer(self.values_builder.finish())
.null_bit_buffer(if null_count > 0 {
Copy link
Contributor

@tustvold tustvold May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed a .then opportunity here FWIW. Same for line below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.then is not suitable here because the type of null_bit_buffer is Option<Buffer>. Using .then will get a result of Option<Option<Buffer>>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this called .unwrap() which is probably safe given the null count, although I can't remember what NullArrays do. Either way you could then just call .flatten(). Not a big deal 😁

@tustvold tustvold merged commit 9722f06 into apache:master May 27, 2022
@HaoYang670 HaoYang670 deleted the refactor_null_bit_buffer branch May 27, 2022 10:16
@alamb alamb changed the title Rewrite ArrayDataBuilder::null_bit_buffer Change ArrayDataBuilder::null_bit_buffer to accept Option<Buffer> rather than Buffer May 27, 2022
@alamb alamb removed the parquet Changes to the parquet crate label May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArrayDataBuilder::null_bit_buffer should accept Option<Buffer> as input type
5 participants