-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33131][SQL] Fix grouping sets with having clause can not resolve qualified col name #30029
Conversation
SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY GROUPING SETS(t.c1) HAVING t.c1 = 1; | ||
SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY CUBE(t.c1) HAVING t.c1 = 1; | ||
SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY ROLLUP(t.c1) HAVING t.c1 = 1; | ||
SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY t.c1 HAVING t.c1 = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is just ensure we support it.
@maropu @cloud-fan do you have time to take a look ? |
val exprMap = extraAggExprs.zip( | ||
newChild.asInstanceOf[Aggregate].aggregateExpressions.takeRight( | ||
extraAggExprs.length)).toMap | ||
// We also add the original aggregateExpressions in Map to avoid extraAggExprs is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong if extraAggExprs
is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we resolved the having condition but extraAggExprs is empty, that means the resolution is in original aggregateExpressions.
We need use original aggregateExpressions mapping having condition NamedExpression to the new since grouping sets constructAggregate will build the new alias for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. The comment says
// Since the exprId of extraAggExprs will be changed in the constructed aggregate, and the
// aggregateExpressions keeps the input order. So here we build an exprMap to resolve the
// condition again.
I don't see why we need to include the original aggregateExpressions in exprMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is right if we not return the resolved having condition when exprMap
is empty.
This pr break the behavior that return it. If we not add the original aggregateExpressions, we can't mapping the resolved having condition since exprMap
is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make the comment clearer?
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
@@ -2315,7 +2315,8 @@ class Analyzer( | |||
if (aggregateExpressions.nonEmpty) { | |||
Some(aggregateExpressions.toSeq, transformedAggregateFilter) | |||
} else { | |||
None | |||
// We need to return the resolved having condition | |||
Some(Seq.empty, transformedAggregateFilter.asInstanceOf[Alias].child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we don't trim the Alias
here?
|
||
|
||
-- !query | ||
SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY t.c1 HAVING t.c1 = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this test can pass even in master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this test is just for better coverage.
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129748 has finished for PR 30029 at commit
|
Seems this way will introduce unexpect thing. We want to return the resolved having condition which is extra expression or exists in grouping expression. Current pr implementation will break such sql
Let me find an another way to solve it. |
d3893b2
to
a4c2028
Compare
Kubernetes integration test starting |
Kubernetes integration test status success |
@@ -470,7 +470,7 @@ class Analyzer( | |||
*/ | |||
private def constructGroupByAlias(groupByExprs: Seq[Expression]): Seq[Alias] = { | |||
groupByExprs.map { | |||
case e: NamedExpression => Alias(e, e.name)() | |||
case e: NamedExpression => Alias(e, e.name)(qualifier = e.qualifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah this is a good catch! do we have this bug in 3.0 and 2.4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems it's a long time bug.
Test build #129791 has finished for PR 30029 at commit
|
GA passed, thanks, merging to master! |
@ulysses-you can you send backport PRs for 3.0 and 2.4? thanks! |
@cloud-fan sure will do it tomorrow. |
…resolve qualified col name This is [#30029](#30029) backport for branch-3.0. ### What changes were proposed in this pull request? Correct the resolution of having clause. ### Why are the changes needed? Grouping sets construct new aggregate lost the qualified name of grouping expression. Here is a example: ``` -- Works resolved by `ResolveReferences` select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 1 -- Works because of the extra expression c1 select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1 -- Failed select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1 ``` It wroks with `Aggregate` without grouping sets through `ResolveReferences`, but Grouping sets not works since the exprId has been changed. ### Does this PR introduce _any_ user-facing change? Yes, bug fix. ### How was this patch tested? add test. Closes #30077 from ulysses-you/SPARK-33131-branch-3.0. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…resolve qualified col name This is [#30029](#30029) backport for branch-2.4. ### What changes were proposed in this pull request? Correct the resolution of having clause. ### Why are the changes needed? Grouping sets construct new aggregate lost the qualified name of grouping expression. Here is a example: ``` -- Works resolved by `ResolveReferences` select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 1 -- Works because of the extra expression c1 select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1 -- Failed select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1 ``` It wroks with `Aggregate` without grouping sets through `ResolveReferences`, but Grouping sets not works since the exprId has been changed. ### Does this PR introduce _any_ user-facing change? Yes, bug fix. ### How was this patch tested? add test. Closes #30075 from ulysses-you/SPARK-33131-branch-2.4. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…resolve qualified col name This is [apache#30029](apache#30029) backport for branch-3.0. ### What changes were proposed in this pull request? Correct the resolution of having clause. ### Why are the changes needed? Grouping sets construct new aggregate lost the qualified name of grouping expression. Here is a example: ``` -- Works resolved by `ResolveReferences` select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 1 -- Works because of the extra expression c1 select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1 -- Failed select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1 ``` It wroks with `Aggregate` without grouping sets through `ResolveReferences`, but Grouping sets not works since the exprId has been changed. ### Does this PR introduce _any_ user-facing change? Yes, bug fix. ### How was this patch tested? add test. Closes apache#30077 from ulysses-you/SPARK-33131-branch-3.0. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
Correct the resolution of having clause.
Why are the changes needed?
Grouping sets construct new aggregate lost the qualified name of grouping expression. Here is a example:
It wroks with
Aggregate
without grouping sets throughResolveReferences
, but Grouping sets not works since the exprId has been changed.Does this PR introduce any user-facing change?
Yes, bug fix.
How was this patch tested?
add test.