feat: add has_non_empty_nulls helper function in OffsetBuffer#9711
Conversation
|
Would appreciate a review |
| return last_offset != initial_offset; | ||
| } | ||
|
|
||
| let mut valid_slices_iter = null_buffer.valid_slices(); |
There was a problem hiding this comment.
this code would be a lot simpler if there were a NullBuffer::invalid_slices 🤔
There was a problem hiding this comment.
yes but I feel the code is pretty straight forward and this have comments as well that explaining the logic
| /// # Panics | ||
| /// | ||
| /// Panics if the length of the `null_buffer` does not equal `self.len() - 1`. | ||
| pub fn is_there_null_pointing_to_non_empty_value( |
There was a problem hiding this comment.
I think this function name is a little excessively long and doesn't fit with the rest of this codebase. Can you please rename it to something shorter and more concise. Here are some ideas:
- has_null_data
- has_non_empty_nulls
- nulls_have_values
- has_populated_nulls
There was a problem hiding this comment.
renamed to has_non_empty_nulls
| // --------------------------------------------------------------- | ||
|
|
||
| #[test] | ||
| fn null_pointing_none_null_buffer() { |
There was a problem hiding this comment.
Do we need all these tests? It seems like a single test for each representative case would be more than adequate and trim this PR down substantially
…://github.com/rluvaton/arrow-rs into add-is-there-null-pointing-to-non-empty-value
is_there_null_pointing_to_non_empty_value helper function in OffsetBufferhas_non_empty_nulls helper function in OffsetBuffer
Which issue does this PR close?
N/A
Rationale for this change
In variable-length array types (e.g.,
StringArray,ListArray), null entries may have non-empty offset ranges, meaning the underlying data buffer contains data behind nulls. This matters when wanting to work on the underlying values of variable length data for example when unwrapping (flattening) a list array, as the child values are exposed, including those behind null entries. If null entries point to non-empty ranges, the unwrapped values will contain data that may not bemeaningful to operate on and could cause errors (e.g., division by zero in the child values).
Usages when this will be helpful:
What changes are included in this PR?
Add
OffsetBuffer::is_there_null_pointing_to_non_empty_valuemethod that checks if any null positions correspond to non-empty offsetranges
Are these changes tested?
Yes
Are there any user-facing changes?
Yes, a new public method
OffsetBuffer::is_there_null_pointing_to_non_empty_valueis added.Related to: