Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 58 additions & 3 deletions datafusion/optimizer/src/type_coercion.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ use datafusion_expr::type_coercion::other::{
};
use datafusion_expr::utils::from_plan;
use datafusion_expr::{
is_false, is_not_false, is_not_true, is_not_unknown, is_true, is_unknown, Expr,
LogicalPlan, Operator,
is_false, is_not_false, is_not_true, is_not_unknown, is_true, is_unknown,
BuiltinScalarFunction, Expr, LogicalPlan, Operator,
};
use datafusion_expr::{ExprSchemable, Signature};
use std::sync::Arc;
Expand Down Expand Up @@ -401,6 +401,20 @@ impl ExprRewriter for TypeCoercionRewriter {
}
}
}
Expr::ScalarFunction { fun, args } => match fun {
BuiltinScalarFunction::Concat
| BuiltinScalarFunction::ConcatWithSeparator => {
let new_args = args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should do something with LargeUtf8?

Also, would it make sense to check the types before clone()ing them to do a cast that might not be needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to check the types before clone()ing them to do a cast that might not be needed?

I think this has been done in the cast_to function:

fn cast_to<S: ExprSchema>(self, cast_to_type: &DataType, schema: &S) -> Result<Expr> {
    // TODO(kszucs): most of the operations do not validate the type correctness
    // like all of the binary expressions below. Perhaps Expr should track the
    // type of the expression?
    let this_type = self.get_type(schema)?;
    if this_type == *cast_to_type {
        Ok(self)
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should do something with LargeUtf8?

This is a good suggestion. My opinion is that we could use LargeUtf8 if one of the arguments has this type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there type coercion rule for the function Concat or ConcatWithSeparator?

Now the type coercion are not supported in the logical phase for some expr which is Expr::ScalarFunction, Expr::AggregateFunction,Expr::WindowFunction and Expr::AggregateUDF in the follow-up pr for this #3582 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think after moving the type coercion rule to the logical phase, this issue can be resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean about "type coercion rule for the function Concat or ConcatWithSeparator"

Since it is a Expr::ScalarFunction { fun, args } it currently gets coerced using data_types https://github.com/apache/arrow-datafusion/blob/3eb55e9a0510d872f6f7765b1a5f17db46486e45/datafusion/expr/src/type_coercion.rs#L44-L47

Are you suggesting we move the logic that picks what argument types (in this case string) for concat into data_types? (I think this is a good idea, for the record)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do it in the #3582 (comment)

And this pr can be merged first.

.iter()
.map(|e| e.clone().cast_to(&DataType::Utf8, &self.schema))
.collect::<Result<Vec<_>>>()?;
Ok(Expr::ScalarFunction {
fun,
args: new_args,
})
}
fun => Ok(Expr::ScalarFunction { fun, args }),
},
expr => Ok(expr),
}
}
Expand Down Expand Up @@ -449,7 +463,7 @@ mod test {
use arrow::datatypes::DataType;
use datafusion_common::{DFField, DFSchema, Result, ScalarValue};
use datafusion_expr::expr_rewriter::ExprRewritable;
use datafusion_expr::{cast, col, is_true, ColumnarValue};
use datafusion_expr::{cast, col, concat, concat_ws, is_true, ColumnarValue};
use datafusion_expr::{
lit,
logical_plan::{EmptyRelation, Projection},
Expand Down Expand Up @@ -782,6 +796,47 @@ mod test {
Ok(())
}

#[test]
fn concat_for_type_coercion() -> Result<()> {
let empty = empty_with_type(DataType::Utf8);
let args = [col("a"), lit("b"), lit(true), lit(false), lit(13)];

// concat
{
let expr = concat(&args);

let plan = LogicalPlan::Projection(Projection::try_new(
vec![expr],
empty.clone(),
None,
)?);
let rule = TypeCoercion::new();
let mut config = OptimizerConfig::default();
let plan = rule.optimize(&plan, &mut config).unwrap();
assert_eq!(
"Projection: concat(a, Utf8(\"b\"), CAST(Boolean(true) AS Utf8), CAST(Boolean(false) AS Utf8), CAST(Int32(13) AS Utf8))\n EmptyRelation",
&format!("{:?}", plan)
);
}

// concat_ws
{
let expr = concat_ws("-", &args);

let plan =
LogicalPlan::Projection(Projection::try_new(vec![expr], empty, None)?);
let rule = TypeCoercion::new();
let mut config = OptimizerConfig::default();
let plan = rule.optimize(&plan, &mut config).unwrap();
assert_eq!(
"Projection: concatwithseparator(Utf8(\"-\"), a, Utf8(\"b\"), CAST(Boolean(true) AS Utf8), CAST(Boolean(false) AS Utf8), CAST(Int32(13) AS Utf8))\n EmptyRelation",
&format!("{:?}", plan)
);
}

Ok(())
}

fn empty() -> Arc<LogicalPlan> {
Arc::new(LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
Expand Down