Skip to content

Conversation

@HaoYang670
Copy link
Contributor

@HaoYang670 HaoYang670 commented Oct 18, 2022

Signed-off-by: remzi 13716567376yh@gmail.com

Which issue does this PR close?

Closes #3856.
Closes #3857.

Rationale for this change

Simplify the concat_ws expression:

  1. fold the whole expression to null if the delimiter is null
  2. filter out null arguments
  3. use concat to replace concat_ws if the delimiter is an empty string
  4. concatenate contiguous literals if the delimiter is a literal.

What changes are included in this PR?

Are there any user-facing changes?

Signed-off-by: remzi <13716567376yh@gmail.com>
@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Oct 18, 2022
Signed-off-by: remzi <13716567376yh@gmail.com>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful @HaoYang670

}
}

impl Literal for &String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"TIL" lit() 👍

Comment on lines +325 to +354
for arg in args {
match arg {
// filter out null args
Expr::Literal(ScalarValue::Utf8(None) | ScalarValue::LargeUtf8(None)) => {}
Expr::Literal(ScalarValue::Utf8(Some(v)) | ScalarValue::LargeUtf8(Some(v))) => {
match contiguous_scalar {
None => contiguous_scalar = Some(v.to_string()),
Some(mut pre) => {
pre += delimiter;
pre += v;
contiguous_scalar = Some(pre)
}
}
}
Expr::Literal(s) => return Err(DataFusionError::Internal(format!("The scalar {} should be casted to string type during the type coercion.", s))),
// If the arg is not a literal, we should first push the current `contiguous_scalar`
// to the `new_args` and reset it to None.
// Then pushing this arg to the `new_args`.
arg => {
if let Some(val) = contiguous_scalar {
new_args.push(lit(val));
}
new_args.push(arg.clone());
contiguous_scalar = None;
}
}
}
if let Some(val) = contiguous_scalar {
new_args.push(lit(val));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern of creating the contiguous scalar is so similar -- I wonder if it could be extracted out into a function -- perhaps as a follow on PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing @alamb ♥️

The logic for concat and concat_ws is a little different, because in concat_ws we must consider the delimiter and we can't ignore the empty string literals. I will try to find a way to refactor them.

args: new_args,
}
}
} => simpl_concat(args)?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

// the delimiter is not a literal
{
let expr = concat_ws(col("c"), vec![lit("a"), null.clone(), lit("b")]);
let expected = concat_ws(col("c"), vec![lit("a"), lit("b")]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so cool!

@andygrove
Copy link
Member

Thanks @HaoYang670

@andygrove andygrove merged commit feff5dc into apache:master Oct 18, 2022
@HaoYang670 HaoYang670 deleted the 3856_3857_optimize_concat_ws branch October 19, 2022 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions optimizer Optimizer rules

Projects

None yet

3 participants