Add function origin API to replace name-based function checks in the optimizer by alexandreyc · Pull Request #20868 · apache/datafusion

alexandreyc · 2026-03-11T13:14:47Z

Which issue does this PR close?

Closes Avoid check function type by matching names in the optimizer #18643.

Rationale for this change

See the issue.

What changes are included in this PR?

I followed the proposal made by @2010YOUY01 in #18643.

Added a is_builtin() -> bool method to AggregateUDF/AggregateUDFImpl and to WindowUDF/WindowUDFImpl
Updated all implementations of those traits
Updated occurences of matching on function names to additionally check the origin of the function

Are these changes tested?

Not directly but I can't see a relevant way to test this. Suggestions are welcome.

Are there any user-facing changes?

Currently yes, but the change could be made non-breaking, see questions below.

Request for advices

I'm new to the codebase so feel free to challenge this PR. In particular, I'd like to have your opinion on the following items:

Should we make is_builtin have a default implementation that returns false? That would make this change non-breaking for users and slightly simplify this PR. But in return it would be more error-prone when implementing built-in functions.
Should we add the method is_builtin to ScalarUDF/ScalarUPDImpl? I didn't do it because it seems there doesn't exist any scalar UDF name matching in the codebase. It might be desirable to add it for the sake of consistency across all kinds of UDF.
Should we replace is_builtin() -> bool by origin() -> UDFOrigin? UDFOrigin would be something like enum { BuiltIn, Spark, UserDefined }. Asking because it's not clear to me if functions in the datafusion_spark crates should be considered built-in or not.

coderfender · 2026-03-13T01:36:18Z

This seems like a major change in the DF change. Perhaps we could break this into smaller PRs (if that it is even possible? )

2010YOUY01 · 2026-03-13T01:41:22Z

Thank you for the help!

Should we replace is_builtin() -> bool by origin() -> UDFOrigin? UDFOrigin would be something like enum { BuiltIn, Spark, UserDefined }. Asking because it's not clear to me if functions in the datafusion_spark crates should be considered built-in or not.

I think origin() -> UDFOrigin API is better, it allows more flexibility for other potential usages. For example in the same context, 2 functions with the same name are registered, they're in different dialect and we want to check origin at runtime.

I suggest we first wait several days to see if there are other opinions.

alexandreyc · 2026-03-16T13:36:24Z

Thanks @2010YOUY01 for your reply.

I updated the PR replacing is_builtin() -> bool by origin() -> UDFOrigin where UDFOrigin is enum { BuiltIn, SparkCompat, UserDefined }.

Also, I added a default implementation for the new method that returns UDFOrigin::UserDefined so that we don't break existing users. This also allows to make the PR slightly shorter.

…name to be more robust

`UDFOrigin::UserDefined`

github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate substrait Changes to the substrait crate proto Related to proto crate functions Changes to functions implementation ffi Changes to the ffi crate spark labels Mar 11, 2026

2010YOUY01 changed the title ~~Fix #18643~~ Add function origin API to replace name-based function checks in the optimizer Mar 13, 2026

alexandreyc force-pushed the fix-18643 branch from 7670b0a to c6abaee Compare March 16, 2026 13:32

alexandreyc added 6 commits March 16, 2026 14:47

Add is_builtin method to AggregateUDFImpl

3d28f72

Update usage of UDF's name to be more robust

22b12fa

Add is_builtin method to WindowUDFImpl and update usage of UDF's …

66d8b2d

…name to be more robust

Update to use origin() instead of is_builtin()

1453b6a

Add default implementation that returns

d69df67

`UDFOrigin::UserDefined`

fix comment

2414421

alexandreyc force-pushed the fix-18643 branch from 5243bac to 2414421 Compare March 16, 2026 13:48

github-actions bot removed the proto Related to proto crate label Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function origin API to replace name-based function checks in the optimizer#20868

Add function origin API to replace name-based function checks in the optimizer#20868
alexandreyc wants to merge 6 commits intoapache:mainfrom
alexandreyc:fix-18643

alexandreyc commented Mar 11, 2026 •

edited by alamb

Loading

Uh oh!

coderfender commented Mar 13, 2026

Uh oh!

2010YOUY01 commented Mar 13, 2026

Uh oh!

alexandreyc commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alexandreyc commented Mar 11, 2026 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Request for advices

Uh oh!

coderfender commented Mar 13, 2026

Uh oh!

2010YOUY01 commented Mar 13, 2026

Uh oh!

alexandreyc commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexandreyc commented Mar 11, 2026 •

edited by alamb

Loading