-
Notifications
You must be signed in to change notification settings - Fork 1.8k
UDF zero params #5378 #5380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDF zero params #5378 #5380
Conversation
|
|
||
| // udfs with zero params expect null array as input | ||
| if args.is_empty() { | ||
| physical_args.push(Arc::new(Literal::new(ScalarValue::Null))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though a convenient implementation, I'm wary that generating a fake argument by which to transfer row count will cause problems later.
I'd prefer to see a solution where the row count is passed out-of-band, for example in a change to https://github.com/apache/arrow-datafusion/blob/cef119da9ee8672b1b1e50ac01387dcb1640d96e/datafusion/expr/src/function.rs#L39 that would add an extra argument (i.e. len: usize) for this purpose.
If that were present, we could populate it up in the call stack where we know the RecordBatch size, probably here: https://github.com/apache/arrow-datafusion/blob/cef119da9ee8672b1b1e50ac01387dcb1640d96e/datafusion/physical-expr/src/scalar_function.rs#L147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Thanks for the suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it was not the best solution. The reason why I thought to do it this way is because the existing docs state:
...with the exception of zero param function, where a singular element vec
will be passed. In that case the single element is a null array to indicate
the batch's row count (so that the generative zero-argument function can know
the result array size).
Is it safe to assume I should update this part of the docs according to my implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry I missed that. I didn't realize this was part of the existing design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries I should've put that in the PR itself. Just edited
|
Benchmark runs are scheduled for baseline = 32a238c and contender = 47bdda6. 47bdda6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #5378.
Rationale for this change
UDFs with zero arguments were not working before.
What changes are included in this PR?
Change UDF planning to receive null as input if it takes no args
Add test for zero param UDF to sql integration test suite
Are these changes tested?
Yes.
Are there any user-facing changes?
Fixing user-facing bug.