-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16639][SQL] The query with having condition that contains grouping by column should work #14296
Conversation
case e: Expression => | ||
val alias = Alias(e, e.toString)() | ||
aggregateExpressions += alias | ||
alias.toAttribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we are blindly pushing all expression into Aggregate
, how about we just push the whole filter condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean we don't replace and push the expressions, but just push the whole filter condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like pushing the whole filter condition brings more problem.
Not the problem of whole filter condition pushdown. But the problem of this approach. I will update this.
Test build #62655 has finished for PR 14296 at commit
|
Test build #62702 has finished for PR 14296 at commit
|
@@ -1207,6 +1207,12 @@ class Analyzer( | |||
val alias = Alias(ae, ae.toString)() | |||
aggregateExpressions += alias | |||
alias.toAttribute | |||
case ne: NamedExpression => ne | |||
case e: Expression if grouping.exists(_.semanticEquals(e)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not push down all expressions or the whole expression. Only push down the group by expressions or aggregate expressions.
Test build #62722 has finished for PR 14296 at commit
|
@@ -1207,6 +1207,12 @@ class Analyzer( | |||
val alias = Alias(ae, ae.toString)() | |||
aggregateExpressions += alias | |||
alias.toAttribute | |||
case ne: NamedExpression => ne | |||
case e: Expression if grouping.exists(_.semanticEquals(e)) && | |||
!ResolveGroupingAnalytics.hasGroupingFunction(e) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why !ResolveGroupingAnalytics.hasGroupingFunction(e)
is needed? Do we want a test case for this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not near my laptop. But pushing grouping function causes test failed in SQLQuerySuite. I remember there is another rule taking care of grouping function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be great if we can figure it out and add comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah. ResolveGroupingAnalytics
performs grouping id and grouping function replacement. We should skip pushdown them here.
@cloud-fan any more comments? Thanks! |
@@ -1207,6 +1207,12 @@ class Analyzer( | |||
val alias = Alias(ae, ae.toString)() | |||
aggregateExpressions += alias | |||
alias.toAttribute | |||
case ne: NamedExpression => ne |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to replace named expr (attributes, alias)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but case e: Expression if grouping.exists(_.semanticEquals(e))
will filter them out right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing it will cause error in replaceGroupingFunc
. As we create a new attribute based on the grouping column with additional alias, we can't find it in aggregate's group by expressions.
Test build #62808 has finished for PR 14296 at commit
|
case ne: NamedExpression => ne | ||
// Grouping functions are handled in the rule [[ResolveGroupingAnalytics]]. | ||
case e: Expression if grouping.exists(_.semanticEquals(e)) && | ||
!ResolveGroupingAnalytics.hasGroupingFunction(e) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what will happen if we use grouping function in HAVING? will the query fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean if we don't add this check? Yes. As the grouping function is pushed down to Aggregation, the rule ResolveGroupingAnalytics
can't perform the replacement of grouping function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean with this check. Can we resolve grouping function inside Filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. SQLQuerySuite has test for with grouping function in HAVING.
ping @cloud-fan Any more comments? Thanks. |
// Replacing [[NamedExpression]] causes the error on [[Grouping]] because the | ||
// grouping column will be new attribute created by adding additional [[Alias]]. | ||
// So we can't find the grouping column and replace it in the rule | ||
// [[ResolveGroupingAnalytics]]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this comment, can you give a concrete example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there is a Grouping(col#1)
in the HAVING clause. If we push down it to Aggregate and use an Alias instead, the grouping function become Grouping(col#1#2)
. Then in the rule ResolveGroupingAnalytics
we try to find the index of col in group by expressions, we can't find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in the next case, we will check: 1) if it's a grouping column. 2) if it's not grouping function. So how can this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we do a transform, the inner col of Grouping
will match this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about we do some improve for next case:
- add one more check: the expression should not exist in the chid output
- if the expression is a
NamedExpression
already, don't alias it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this. Please take a look.
Test build #62961 has finished for PR 14296 at commit
|
thanks, merging to master! |
…ping by column should work ## What changes were proposed in this pull request? The query with having condition that contains grouping by column will be failed during analysis. E.g., create table tbl(a int, b string); select count(b) from tbl group by a + 1 having a + 1 = 2; Having condition should be able to use grouping by column. ## How was this patch tested? Jenkins tests. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #14296 from viirya/having-contains-grouping-column. (cherry picked from commit 9ade77c) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
also backport it to 2.0 |
Thanks for reviewing this. |
What changes were proposed in this pull request?
The query with having condition that contains grouping by column will be failed during analysis. E.g.,
Having condition should be able to use grouping by column.
How was this patch tested?
Jenkins tests.