Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Unify Expr::ScalarFunction and Expr::ScalarUDF, introduce unresolved functions by name #8258

Merged
merged 7 commits into from
Nov 26, 2023

Conversation

2010YOUY01
Copy link
Contributor

Which issue does this PR close?

Part of #8157

Rationale for this change

See #8157 for the rationale:
Creation of Exprs for BuiltinScalarFunctions can be done statelessly

let expr1 = abs(lit(-1));
let expr2 = call_fn("abs", vec![lit(-1)]);

However, we can't do that for ScalarUDFs because they are registered within SessionState, it's only possible to do

let ctx = create_context()?;
ctx.register_udf(abs_impl);
let expr2 = ctx.call_fn("abs", vec![lit(-1)]);

For the ongoing plan to migrate functions from BuiltinScalarFunction to ScalarUDF based implementation (ref #8045 ), we would like the ScalarUDF based functions to continue to support the original Expr construction API, so the Expr struct should have a variant for unresolved name, and then we can add a AnalyzerRule to resolve the function.
This PR only include refactor preparation for UDF Expr support.

What changes are included in this PR?

  1. Change Expr::ScalarFunction
// before
pub struct ScalarFunction {
    /// The function
    pub fun: built_in_function::BuiltinScalarFunction,
    /// List of expressions to feed to the functions as arguments
    pub args: Vec<Expr>,
}
// This PR
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum ScalarFunctionDefinition {
    /// Resolved to a `BuiltinScalarFunction`
    /// There is plan to migrate `BuiltinScalarFunction` to UDF-based implementation (issue#8045)
    /// This variant is planned to be removed in long term
    BuiltIn(built_in_function::BuiltinScalarFunction),
    /// Resolved to a user defined function
    UDF(Arc<crate::ScalarUDF>),
    /// A scalar function constructed with name, could be resolved with registered functions during
    /// analyzing.
    Name(Arc<str>),
}

/// ScalarFunction expression invokes a built-in scalar function
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction {
    /// The function
    pub func_def: ScalarFunctionDefinition,
    /// List of expressions to feed to the functions as arguments
    pub args: Vec<Expr>,
}
  1. Remove Expr::ScalarUDF

This way we can support unresolved function name variant.
Also, merge the execution path for BuiltinScalarFunction/ScalarUDF, to make future work towards #8045 a bit easier

Are these changes tested?

Covered by existing tests.

Are there any user-facing changes?

Yes, see comments for API changes

@github-actions github-actions bot added sql logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core datafusion crate substrait labels Nov 17, 2023
/// ScalarFunction expression invokes a built-in scalar function
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction {
/// The function
pub fun: built_in_function::BuiltinScalarFunction,
pub func_def: ScalarFunctionDefinition,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the major change

}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the major change

@alamb alamb added the api change Changes the API exposed to users of the crate label Nov 18, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @2010YOUY01 -- I think this looks great 🦾 . I had a few small comments, but otherwise I think it is ready to go.

I want to leave this PR open for a few days to allow people a chance to comment

then we can add a AnalyzerRule to resolve the function.
This PR only include refactor preparation for UDF Expr support.

While waiting, it might help to create another draft PR on top of this one that implements name resolution so people can see how that would work

Some other todos (for myself):

  1. File a ticket to do the same thing for AggregateUDF and WindowUDF (to maintain a consistent interface)

datafusion/expr/src/expr.rs Show resolved Hide resolved
datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved
datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved
ScalarFunction::new_udf(fun, transform_vec(args, &mut transform)?),
),
ScalarFunctionDefinition::Name(_) => {
return internal_err!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is probably ok to support tree walks on unresolved function names. I don't see any reason to throw an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe it's necessary, but I'm not sure how it will be implemented right now, so I plan to leave this change to next PR which resolves function Expr name

datafusion/substrait/src/logical_plan/producer.rs Outdated Show resolved Hide resolved
@2010YOUY01
Copy link
Contributor Author

@alamb Thank you! Review comments are addressed

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @2010YOUY01 -- I had one comment on name() signature but I also think we can do that as a follow on PR. As a reminder I want to leave this open a few more days to allow for comment prior to merging

/// List of expressions to feed to the functions as arguments
pub args: Vec<Expr>,
}

impl ScalarFunctionDefinition {
/// Function's name for display
pub fn name(&self) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to allow the callsites to decide if they needed to make the string copy -- so passing back &str I think would make for a better API:

Suggested change
pub fn name(&self) -> String {
pub fn name(&self) -> &str {

Expr::ScalarUDF(ScalarUDF { fun, args }) => {
fmt_function(f, fun.name(), false, args, true)
Expr::ScalarFunction(ScalarFunction { func_def, args }) => {
fmt_function(f, &func_def.name(), false, args, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, this callsite doesn't need an owned String, &str would work well

@alamb alamb changed the title Refactor: Unify Expr::ScalarFunction and Expr::ScalarUDF Refactor: Unify Expr::ScalarFunction and Expr::ScalarUDF, introduce unresolved functions by name Nov 20, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thanks again @2010YOUY01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core datafusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants