-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11179] [SQL] Push filters through aggregate #9167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…'group by' attribute set
|
Tests please. Look here for examples. |
…'group by' attribute set
|
Added tests |
|
We could do a similar thing for window functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than checking for complete overlap, can we pull out the expressions for group by columns and push those down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Incorporated.
…'group by' attribute set
|
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation. I'm not sure we have a strict rule, but no indent is kinda hard to follow. I'd probably try to make it a tree if it fits?
case filter @ Filter(condition,
aggregate @ Aggregate(groupingExpressions, aggregateExpressions, grandChild)) =>or just 4 space indent?
|
This looks great, thanks for doing it! Can you cleanup the title If you are feeling ambitious I'd include query plans before and after the optimization too. These become the commit message when we use our merge tool. |
|
Test build #43995 has finished for PR 9167 at commit
|
Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.
Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]
Query plan after optimisation :-
Filter (c#138L = 2)
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Filter (a#0 = 3)
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]
|
Thanks for reviewing it Michael. Addressed 1 code review comment, fixed couple of scalastyle issues and cleaned up title and description in latest commit and this PR. |
|
Test build #44002 has finished for PR 9167 at commit
|
|
@marmbrus Test failed again due to whitespace issue. I have already removed whitespace from that line and I am not getting scalastyle issue when compiling locally (don't see any whitespace in code review also). Am I missing something? [error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala:664:0: Whitespace at end of line |
|
try |
|
You can run local style tests by dev/lint-scala |
Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions. Query plan before optimisation :- Filter ((c#138L = 2) && (a#0 = 3)) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Project [a#0,b#1] LocalRelation [a#0,b#1,c#2] Query plan after optimisation :- Filter (c#138L = 2) Aggregate [a#0], [a#0,count(b#1) AS c#138L] Filter (a#0 = 3) Project [a#0,b#1] LocalRelation [a#0,b#1,c#2]
|
Somehow, I wasn't getting scalastyle issue on my branch with above 2 commands. Cloned apache spark master, cherry-picked my changes and then I got the error. Checked-in the whitespace removal. Please test this. |
|
Test build #44041 has finished for PR 9167 at commit
|
|
Thanks, merging to master! |
Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.
Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]
Query plan after optimisation :-
Filter (c#138L = 2)
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Filter (a#0 = 3)
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]