[SPARK-37914][SQL] Make `RuntimeReplaceable` works for `AggregateFunction` #35213

beliefer · 2022-01-15T03:52:27Z

What changes were proposed in this pull request?

Currently, Spark provides RuntimeReplaceable to replace function with another function. The last function must be exists in build-in functions, so RuntimeReplaceable make Spark could reuse the implement of build-in function.
But RuntimeReplaceable not works for aggregate function.

Why are the changes needed?

Make RuntimeReplaceable works for AggregateFunction

Does this PR introduce any user-facing change?

'No'.
This change is for spark developers.

How was this patch tested?

Exists tests.

…tion`

beliefer · 2022-01-17T00:40:52Z

ping @cloud-fan

cloud-fan · 2022-01-17T07:57:49Z

Does this PR introduce any user-facing change?

This is for end-users, not spark developers. It's not a user-facing change because end-users who only use DataFrame/SQL APIs can't notice this change.

cloud-fan · 2022-01-17T08:07:30Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala

@@ -366,6 +366,8 @@ trait RuntimeReplaceable extends UnaryExpression with Unevaluable {
  // are semantically equal.
  override lazy val preCanonicalized: Expression = child.preCanonicalized

+  def isAggregate: Boolean = child.isInstanceOf[AggregateFunction]


More thoughts about it: technically a RuntimeReplaceable can combine one or more expressions, even for something like f(x) = max(x) - min(x).

For aggregate functions, it's very hard to reason about f(distinct x) FILTER WHERE ... if f(x) combines many expressions, so I think it makes sense to only allow a direct mapping here.

We can make this assumption more explicit here

def isAggregate: Boolean = { if (child.isInstanceOf[AggregateFunction]) { true } else { assert(child.find(_.isInstanceOf[AggregateFunction]).isEmpty) false } }

cloud-fan · 2022-01-17T08:08:21Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala

@@ -46,12 +46,11 @@ import org.apache.spark.util.Utils
 */
 object ReplaceExpressions extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressionsWithPruning(
-    _.containsAnyPattern(RUNTIME_REPLACEABLE, COUNT_IF, BOOL_AGG, REGR_COUNT)) {
+    _.containsAnyPattern(RUNTIME_REPLACEABLE, COUNT_IF, BOOL_AGG)) {
    case e: RuntimeReplaceable => e.child
    case CountIf(predicate) => Count(new NullIf(predicate, Literal.FalseLiteral))
    case BoolOr(arg) => Max(arg)
    case BoolAnd(arg) => Min(arg)


Can we remove the above 3 as well?

cloud-fan · 2022-01-17T14:01:52Z

sql/core/src/test/resources/sql-functions/sql-expression-schema.md

-| org.apache.spark.sql.catalyst.expressions.aggregate.BoolOr | any | SELECT any(col) FROM VALUES (true), (false), (false) AS tab(col) | struct<any(col):boolean> |
-| org.apache.spark.sql.catalyst.expressions.aggregate.BoolOr | bool_or | SELECT bool_or(col) FROM VALUES (true), (false), (false) AS tab(col) | struct<bool_or(col):boolean> |
-| org.apache.spark.sql.catalyst.expressions.aggregate.BoolOr | some | SELECT some(col) FROM VALUES (true), (false), (false) AS tab(col) | struct<some(col):boolean> |
+| org.apache.spark.sql.catalyst.expressions.aggregate.BoolAnd | bool_and | SELECT bool_and(col) FROM VALUES (true), (true), (true) AS tab(col) | struct<min(col):boolean> |


Do you know how we introduce this change?

cloud-fan · 2022-01-17T14:02:24Z

sql/core/src/test/resources/sql-tests/results/udf/udf-group-by.sql.out

 -- !query output
-org.apache.spark.sql.AnalysisException
-cannot resolve 'every('true')' due to data type mismatch: Input to function 'every' should have been boolean, but it's [string].; line 1 pos 11


the behavior change is a bit scarying. Can you explain it?

beliefer · 2022-01-18T10:07:13Z

It seems the output schema is difficult to be consistent with the previous one.
#35241 is a better way.

beliefer · 2022-02-24T08:06:12Z

#35534 merged

[SPARK-37914][SQL] Make RuntimeReplaceable works for `AggregateFunc…

22566a5

…tion`

github-actions bot added the SQL label Jan 15, 2022

beliefer added 2 commits January 15, 2022 15:26

Update code

3d2bf95

Update code

30b9ec8

cloud-fan reviewed Jan 17, 2022

View reviewed changes

beliefer added 2 commits January 17, 2022 20:04

Update code

ad4e08a

Update code

ce71a80

cloud-fan reviewed Jan 17, 2022

View reviewed changes

beliefer added 2 commits January 18, 2022 10:36

Update code

72ee37b

Update code

b311003

beliefer closed this Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37914][SQL] Make `RuntimeReplaceable` works for `AggregateFunction` #35213

[SPARK-37914][SQL] Make `RuntimeReplaceable` works for `AggregateFunction` #35213

beliefer commented Jan 15, 2022 •

edited

beliefer commented Jan 17, 2022

cloud-fan commented Jan 17, 2022

cloud-fan Jan 17, 2022

cloud-fan Jan 17, 2022

beliefer Jan 17, 2022

cloud-fan Jan 17, 2022

cloud-fan Jan 17, 2022

beliefer commented Jan 18, 2022 •

edited

beliefer commented Feb 24, 2022

[SPARK-37914][SQL] Make RuntimeReplaceable works for AggregateFunction #35213

[SPARK-37914][SQL] Make RuntimeReplaceable works for AggregateFunction #35213

Conversation

beliefer commented Jan 15, 2022 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

beliefer commented Jan 17, 2022

cloud-fan commented Jan 17, 2022

cloud-fan Jan 17, 2022

Choose a reason for hiding this comment

cloud-fan Jan 17, 2022

Choose a reason for hiding this comment

beliefer Jan 17, 2022

Choose a reason for hiding this comment

cloud-fan Jan 17, 2022

Choose a reason for hiding this comment

cloud-fan Jan 17, 2022

Choose a reason for hiding this comment

beliefer commented Jan 18, 2022 • edited

beliefer commented Feb 24, 2022

[SPARK-37914][SQL] Make `RuntimeReplaceable` works for `AggregateFunction` #35213

[SPARK-37914][SQL] Make `RuntimeReplaceable` works for `AggregateFunction` #35213

beliefer commented Jan 15, 2022 •

edited

beliefer commented Jan 18, 2022 •

edited