-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix shuffle function to report nullability correctly #19184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix shuffle function to report nullability correctly #19184
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes the shuffle function to correctly preserve the nullability of its input array, addressing issue #19145. Previously, the function always reported outputs as nullable regardless of input nullability, which could lead to incorrect schema inference and missed optimization opportunities.
Key Changes:
- Replaced
return_typeimplementation withreturn_field_from_argsto preserve input field metadata including nullability - Added comprehensive unit tests to verify nullability preservation for both nullable and non-nullable inputs
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rluvaton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
5cd23bc to
c3262d2
Compare
c3262d2 to
629e995
Compare
- Replace return_type with return_field_from_args to preserve input nullability - Add test to verify nullability is correctly reported - Addresses issue apache#19145
629e995 to
853ff42
Compare
rluvaton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
hope to see you again soon
fix: shuffle should report nullability correctly
shuffleshould report nullability correctly #19145Which issue does this PR close?
Closes #19145
Rationale for this change
The
shuffleUDF was using the defaultis_nullableimplementation which always returnstrue, regardless of the input array's nullability. This causes:The shuffle function simply reorders elements within an array without changing the array's structure or nullability, so the output should have the same nullability as the input.
What changes are included in this PR?
return_field_from_args: Returns the input field directly, preserving both data type and nullabilityreturn_type: Now returns an error directing users to usereturn_field_from_argsinstead (following DataFusion best practices)Are these changes tested?
Yes, this PR includes a new test
test_shuffle_nullabilitythat verifies:Test results: