-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44846][SQL] Pull out complex grouping expressions after remove redundant aggregates #42531
Conversation
…ndReference error PushFoldableIntoBranches in complex grouping expressions may cause bindReference error
@@ -3674,6 +3674,21 @@ class DataFrameSuite extends QueryTest | |||
parameters = Map("viewName" -> "AUTHORIZATION")) | |||
} | |||
} | |||
|
|||
test("SPARK-44846: PushFoldableIntoBranches in complex grouping expressions " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case a @ Aggregate(groupingExpressions, _, _) | ||
if !groupingExpressions.forall(_.isInstanceOf[NamedExpression]) => a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the aggregate contains complex grouping expressions after RemoveRedundantAggregates
. So could we fix it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. In PhysicalAggregation
,if the expression is not a NamedExpressions, it add an alias. Also I'm not sure if complex expressions are generated anywhere else other than RemoveRedundantAggregates
. So is it safer to fix PushFoldableIntoBranches
, and the impact is smaller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a similar issue before: https://issues.apache.org/jira/browse/SPARK-34581
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, later I will do something like PullOutGroupingExpressions
after RemoveRedundantAggregates
triggers Remove.
@@ -96,7 +96,7 @@ class RemoveRedundantAggregatesSuite extends PlanTest { | |||
.groupBy($"a" + $"b")(($"a" + $"b") as "c") | |||
.analyze | |||
val optimized = Optimize.execute(query) | |||
comparePlans(optimized, expected) | |||
comparePlans(optimized, PullOutGroupingExpressions.apply(expected)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val expected = relation
.select($"a", $"b", ($"a" + $"b") as "_groupingexpression")
.groupBy($"_groupingexpression")($"_groupingexpression" as "c")
.analyze
...alyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala
Outdated
Show resolved
Hide resolved
Could you update the PR title and PR description? |
Done. |
@@ -42,11 +42,12 @@ object RemoveRedundantAggregates extends Rule[LogicalPlan] with AliasHelper { | |||
) | |||
|
|||
// We might have introduces non-deterministic grouping expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a different idea. I think it's risky to put the new grouping expressions back to the Aggregate
, as the grouping expression contains things in the SELECT list. This is a long-standing issue and I feel it's better to just create a Project
below the Aggregate
to calculate grouping expressions, and other optimizer rules can merge/eliminate this extra Project
if it only contains Attributes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thanks. I'm sorry I didn't understand. Before this PR, it will create a Project
below the Aggregate
to Pulls out nondeterministic grouping expressions. In this PR, I keep this logic, and create a Project
below the Aggregate
to Pulls out complex grouping expressions by rule PullOutGroupingExpressions
. I do not put the new grouping expressions back to the Aggregate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me put it this way: if we can safely remove a distinct-like Aggregate
, it means we can just turn it into a Project
(e.g. case a: Aggregate => Project(a.aggregateExpressions, a.child)
).
The code today is merging the distinct-like Aggregate
to the upper Aggregate
, which is problematic and we have to pull something out to a Project
case by case. This is fragile and this PR just finds another case.
My proposal is to not merge. Just convert the distinct-like Aggregate
to Project
, and let other optimizer rules decide if we can merge it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I understand, I will resubmit a PR later.
New PR: #42633 |
What changes were proposed in this pull request?
This PR makes the aggregate do not contains complex grouping expressions after
RemoveRedundantAggregates
. In ruleRemoveRedundantAggregates
, if trigger remove redundant aggregates, then doPullOutGroupingExpressions
to pull out complex grouping expressions to aProject
node under anAggregate
.Why are the changes needed?
The aggregate contains complex grouping expressions after
RemoveRedundantAggregates
, ifaggregateExpressions
has (if / case) branches, it is possible thatgroupingExpressions
is no longer a subexpression ofaggregateExpressions
after executePushFoldableIntoBranches
rule, Then causeboundReference
error.For example
Before pr
After pr
Does this PR introduce any user-facing change?
No
How was this patch tested?
UT