Skip to content

Conversation

@jaylmiller
Copy link
Contributor

@jaylmiller jaylmiller commented Feb 23, 2023

Which issue does this PR close?

Closes #5378.

Rationale for this change

UDFs with zero arguments were not working before.

What changes are included in this PR?

  • Change UDF planning to receive null as input if it takes no args

    • This is following the existing docs specification:
    • ...with the exception of zero param function, where a singular element vec
      will be passed. In that case the single element is a null array to indicate
      the batch's row count (so that the generative zero-argument function can know
      the result array size).

  • Add test for zero param UDF to sql integration test suite

Are these changes tested?

Yes.

Are there any user-facing changes?

Fixing user-facing bug.

@github-actions github-actions bot added core Core DataFusion crate physical-expr Changes to the physical-expr crates labels Feb 23, 2023
@jaylmiller jaylmiller marked this pull request as ready for review February 23, 2023 22:47

// udfs with zero params expect null array as input
if args.is_empty() {
physical_args.push(Arc::new(Literal::new(ScalarValue::Null)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though a convenient implementation, I'm wary that generating a fake argument by which to transfer row count will cause problems later.

I'd prefer to see a solution where the row count is passed out-of-band, for example in a change to https://github.com/apache/arrow-datafusion/blob/cef119da9ee8672b1b1e50ac01387dcb1640d96e/datafusion/expr/src/function.rs#L39 that would add an extra argument (i.e. len: usize) for this purpose.

If that were present, we could populate it up in the call stack where we know the RecordBatch size, probably here: https://github.com/apache/arrow-datafusion/blob/cef119da9ee8672b1b1e50ac01387dcb1640d96e/datafusion/physical-expr/src/scalar_function.rs#L147

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks for the suggestions!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it was not the best solution. The reason why I thought to do it this way is because the existing docs state:

...with the exception of zero param function, where a singular element vec
will be passed. In that case the single element is a null array to indicate
the batch's row count (so that the generative zero-argument function can know
the result array size).

Is it safe to assume I should update this part of the docs according to my implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry I missed that. I didn't realize this was part of the existing design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries I should've put that in the PR itself. Just edited

@avantgardnerio avantgardnerio merged commit 47bdda6 into apache:main Feb 24, 2023
@ursabot
Copy link

ursabot commented Feb 24, 2023

Benchmark runs are scheduled for baseline = 32a238c and contender = 47bdda6. 47bdda6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UDF with zero params broken (doesn't receive null array as input)

3 participants