Skip to content

feat: add has_non_empty_nulls helper function in OffsetBuffer#9711

Merged
alamb merged 5 commits intoapache:mainfrom
rluvaton:add-is-there-null-pointing-to-non-empty-value
Apr 27, 2026
Merged

feat: add has_non_empty_nulls helper function in OffsetBuffer#9711
alamb merged 5 commits intoapache:mainfrom
rluvaton:add-is-there-null-pointing-to-non-empty-value

Conversation

@rluvaton
Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A

Rationale for this change

In variable-length array types (e.g., StringArray, ListArray), null entries may have non-empty offset ranges, meaning the underlying data buffer contains data behind nulls. This matters when wanting to work on the underlying values of variable length data for example when unwrapping (flattening) a list array, as the child values are exposed, including those behind null entries. If null entries point to non-empty ranges, the unwrapped values will contain data that may not be
meaningful to operate on and could cause errors (e.g., division by zero in the child values).

Usages when this will be helpful:

  • flattening list array
  • casting lists/map - we don't wanna cast values that are not used so this is a check if there is one
  • explode on list - we don't want the null values behind it so this give us a check if it exists (will have another pr to cleanup empty values)
  • gc on lists/map/strings to remove unneeded data

What changes are included in this PR?

Add OffsetBuffer::is_there_null_pointing_to_non_empty_value method that checks if any null positions correspond to non-empty offset
ranges

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, a new public method OffsetBuffer::is_there_null_pointing_to_non_empty_value is added.


Related to:

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Apr 14, 2026
@rluvaton
Copy link
Copy Markdown
Member Author

Would appreciate a review

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rluvaton -- i reviewed the code and it looks good to me overall

I think we should reconsider the super long name and the seemingly over-large number of tests, but otherwise 👍

return last_offset != initial_offset;
}

let mut valid_slices_iter = null_buffer.valid_slices();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code would be a lot simpler if there were a NullBuffer::invalid_slices 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but I feel the code is pretty straight forward and this have comments as well that explaining the logic

Comment thread arrow-buffer/src/buffer/offset.rs Outdated
/// # Panics
///
/// Panics if the length of the `null_buffer` does not equal `self.len() - 1`.
pub fn is_there_null_pointing_to_non_empty_value(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function name is a little excessively long and doesn't fit with the rest of this codebase. Can you please rename it to something shorter and more concise. Here are some ideas:

  1. has_null_data
  2. has_non_empty_nulls
  3. nulls_have_values
  4. has_populated_nulls

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to has_non_empty_nulls

Comment thread arrow-buffer/src/buffer/offset.rs Outdated
// ---------------------------------------------------------------

#[test]
fn null_pointing_none_null_buffer() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all these tests? It seems like a single test for each representative case would be more than adequate and trim this PR down substantially

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed some

@rluvaton rluvaton changed the title feat: add is_there_null_pointing_to_non_empty_value helper function in OffsetBuffer feat: add has_non_empty_nulls helper function in OffsetBuffer Apr 27, 2026
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rluvaton -- looks good o me

@alamb alamb merged commit a3dbc15 into apache:main Apr 27, 2026
27 checks passed
@rluvaton rluvaton deleted the add-is-there-null-pointing-to-non-empty-value branch April 27, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants