Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-9756: [Rust] [DataFusion] Added support for scalar UDFs of arbitrary return types #7974

Closed
wants to merge 13 commits into from

Conversation

jorgecarleitao
Copy link
Member

This PR is done on top of #7971 ,

Its current runtime consequence is that all our math functions now return float32 or float64 depending on their incoming column (and use float32 for other numeric types). Its API consequences is that it allows to register UDFs of variable return types.

This PR hits both physical plans and logical plans, and abstracts a bit the typing of some of our structs. In particular, it introduces a new trait that only cares about the return datatype of an object (PhysicalExpr and LogicalExpr).

@jorgecarleitao jorgecarleitao marked this pull request as draft August 16, 2020 10:39
@jorgecarleitao jorgecarleitao changed the title ARROW-9756: [Rust] [DataFusion] Added support for scalar UDFs of arbitrary types ARROW-9756: [Rust] [DataFusion] Added support for scalar UDFs of arbitrary return types Aug 16, 2020
@jorgecarleitao
Copy link
Member Author

FYI @andygrove @alamb @houqp : this is a draft because it depends on other PRs being reviewed and accepted.

@github-actions
Copy link

@@ -1087,7 +1101,7 @@ impl LogicalPlanBuilder {
/// Apply a projection
pub fn project(&self, expr: Vec<Expr>) -> Result<Self> {
let input_schema = self.plan.schema();
let projected_expr = if expr.contains(&Expr::Wildcard) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to drop partialEq for expressions, and therefore also dropped this since it was strictly not needed.

@@ -87,7 +95,7 @@ impl Debug for ScalarFunctionExpr {
.field("fun", &"<FUNC>")
.field("name", &self.name)
.field("args", &self.args)
.field("return_type", &self.return_type)
//.field("return_type", &self.return_type)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo

@@ -50,7 +58,7 @@ impl Debug for ScalarFunction {
f.debug_struct("ScalarFunction")
.field("name", &self.name)
.field("args", &self.args)
.field("return_type", &self.return_type)
//.field("return_type", &self.return_type)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo

@@ -156,7 +168,7 @@ macro_rules! sum_accumulate {

#[derive(Debug)]
struct SumAccumulator {
sum: Option<ScalarValue>,
pub sum: Option<ScalarValue>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed?

@@ -31,6 +31,7 @@ extern crate sqlparser;

pub mod dataframe;
pub mod datasource;
mod datatyped;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub so that others can implement a different DataTyped?

@@ -355,7 +355,7 @@ pub enum Expr {
/// List of expressions to feed to the functions as arguments
args: Vec<Expr>,
/// The `DataType` the expression will yield
return_type: DataType,
return_type: ReturnType,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THIS IS A MAJOR CHANGE: it makes Expr hold an Arc<&dyn>, which make them unserializable. I was unable to find another way, unfortunately.

Expr::AggregateFunction { return_type, .. } => Ok(return_type.clone()),
Expr::ScalarFunction {
args, return_type, ..
} => return_type(&args.iter().map(|x| x.as_datatyped()).collect(), schema),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not very beautiful, but was unable to find a better pattern for this :(

match self.schema_provider.get_function_meta(&name) {
Some(fm) => {
let args = if name == "count" {
// optimization to avoid computing expressions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be moved to another place (an optimizer?). For now I left it where it here as it was.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants