fix: bitmap_count should report nullability correctly #19195
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #19146
Part of #19144 (EPIC: fix nullability report for spark expression)
Rationale for this change
The
bitmap_countUDF was using the defaultreturn_typeimplementation which does not preserve nullability information. This causes:The
bitmap_countfunction counts set bits in a binary input and returns an Int64. The operation itself doesn't introduce nullability - if the input is non-nullable, the output will always be non-nullable. Therefore, the output nullability should match the input.What changes are included in this PR?
return_field_from_args: Creates a field with Int64 type and the same nullability as the input fieldreturn_type: Now returns an error directing users to usereturn_field_from_argsinstead (following DataFusion best practices)FieldRefimport to support returning field metadataAre these changes tested?
Yes, this PR includes a new test
test_bitmap_count_nullabilitythat verifies:Test results:
Additionally, all existing
bitmap_counttests continue to pass, ensuring backward compatibility.Are there any user-facing changes?
Yes - Schema metadata improvement:
Users will now see correct nullability information in the schema:
Before (Bug):
After (Fixed):
This is a bug fix that corrects schema metadata only - it does not change the actual computation or introduce any breaking changes to the API.
Impact:
Code Changes Summary
Modified File:
datafusion/spark/src/function/bitmap/bitmap_count.rs1. Added Import
2. Updated return_type Method
3. Added return_field_from_args Implementation
4. Added Test
Verification Steps
Run the new test:
cargo test -p datafusion-spark test_bitmap_count_nullability --libRun all bitmap_count tests:
cargo test -p datafusion-spark bitmap_count --libRun clippy checks:
All checks pass successfully!
Related Issues
bitmap_countneed to have custom nullability #19146shuffleshould report nullability correctly #19145 (shuffle function nullability)