-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix: bit_count function to report nullability correctly #19197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: bit_count function to report nullability correctly #19197
Conversation
- Replace return_type with return_field_from_args to preserve input nullability - Add test to verify nullability is correctly reported - Addresses issue apache#19147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes the bit_count function to correctly preserve nullability information from input to output. Previously, the function used the default return_type implementation which always marked outputs as nullable, resulting in incorrect schema metadata. The fix implements return_field_from_args to return Int32 with the same nullability as the input field.
Key changes:
- Implemented
return_field_from_argsmethod to preserve input nullability - Updated
return_typeto return an error directing to usereturn_field_from_args - Added comprehensive test coverage for nullability behavior
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @rluvaton ! I'm seeing a CI formatting error in const_evaluator.rs on my PR branch. However, this file is not part of my changes - I only modified: bit_count.rs Question: Should I: Fix the formatting issue in const_evaluator.rs as part of this PR (even though it's unrelated to my changes)? I want to make sure I follow the correct contribution workflow. Thanks! |
|
The formatting is already fixed |
rluvaton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that Spark nullability depend on the child:
Which issue does this PR close?
Closes #19147
Part of #19144 (EPIC: fix nullability report for spark expression)
Rationale for this change
The
bit_countUDF was using the defaultreturn_typeimplementation which does not preserve nullability information. This causes:The
bit_countfunction counts the number of set bits (ones) in the binary representation of a number and returns an Int32. The operation itself doesn't introduce nullability - if the input is non-nullable, the output will always be non-nullable. Therefore, the output nullability should match the input.What changes are included in this PR?
return_field_from_args: Creates a field with Int32 type and the same nullability as the input fieldreturn_type: Now returns an error directing users to usereturn_field_from_argsinstead (following DataFusion best practices)FieldRefandinternal_errimports to support the new implementationAre these changes tested?
Yes, this PR includes a new test
test_bit_count_nullabilitythat verifies:Test results:
Additionally, all existing
bit_counttests continue to pass, ensuring backward compatibility.Are there any user-facing changes?
Yes - Schema metadata improvement:
Users will now see correct nullability information in the schema:
Before (Bug):
After (Fixed):
This is a bug fix that corrects schema metadata only - it does not change the actual computation or introduce any breaking changes to the API.
Impact:
Code Changes Summary
Modified File:
datafusion/spark/src/function/bitwise/bit_count.rs1. Added Imports
2. Updated return_type Method
3. Added return_field_from_args Implementation
4. Added Test
Verification Steps
Run the new test:
cargo test -p datafusion-spark test_bit_count_nullability --libRun all bit_count tests:
cargo test -p datafusion-spark bit_count --libRun clippy checks:
All checks pass successfully!
Related Issues
bit_countneed to have custom nullability #19147shuffleshould report nullability correctly #19145 (shuffle function nullability)bitmap_countneed to have custom nullability #19146 (bitmap_count function nullability)