Skip to content

Conversation

@nitin2goyal
Copy link

Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.

Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]

Query plan after optimisation :-
Filter (c#138L = 2)
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Filter (a#0 = 3)
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]

@marmbrus
Copy link
Contributor

Tests please. Look here for examples.

@nitin2goyal
Copy link
Author

Added tests

@hvanhovell
Copy link
Contributor

We could do a similar thing for window functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than checking for complete overlap, can we pull out the expressions for group by columns and push those down?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Incorporated.

@marmbrus
Copy link
Contributor

ok to test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation. I'm not sure we have a strict rule, but no indent is kinda hard to follow. I'd probably try to make it a tree if it fits?

case filter @ Filter(condition,
     aggregate @ Aggregate(groupingExpressions, aggregateExpressions, grandChild)) =>

or just 4 space indent?

@marmbrus
Copy link
Contributor

This looks great, thanks for doing it!

Can you cleanup the title [SPARK-11179] [SQL] Push filters through aggregate
and the description: Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.

If you are feeling ambitious I'd include query plans before and after the optimization too.

These become the commit message when we use our merge tool.

@SparkQA
Copy link

SparkQA commented Oct 20, 2015

Test build #43995 has finished for PR 9167 at commit 671fbb3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.

Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
 Aggregate [a#0], [a#0,count(b#1) AS c#138L]
  Project [a#0,b#1]
   LocalRelation [a#0,b#1,c#2]

Query plan after optimisation :-
Filter (c#138L = 2)
 Aggregate [a#0], [a#0,count(b#1) AS c#138L]
  Filter (a#0 = 3)
   Project [a#0,b#1]
    LocalRelation [a#0,b#1,c#2]
@nitin2goyal nitin2goyal changed the title SPARK-11179: Push filters through aggregate if filters are subset of … [SPARK-11179] [SQL] Push filters through aggregate Oct 20, 2015
@nitin2goyal
Copy link
Author

Thanks for reviewing it Michael. Addressed 1 code review comment, fixed couple of scalastyle issues and cleaned up title and description in latest commit and this PR.

@SparkQA
Copy link

SparkQA commented Oct 20, 2015

Test build #44002 has finished for PR 9167 at commit f422aa8.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nitin2goyal
Copy link
Author

@marmbrus Test failed again due to whitespace issue. I have already removed whitespace from that line and I am not getting scalastyle issue when compiling locally (don't see any whitespace in code review also). Am I missing something?

[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala:664:0: Whitespace at end of line

@marmbrus
Copy link
Contributor

try sbt test:scalastyle

@rxin
Copy link
Contributor

rxin commented Oct 20, 2015

You can run local style tests by

dev/lint-scala

Push conjunctive predicates though Aggregate operators when their references are a subset of the groupingExpressions.

Query plan before optimisation :-
Filter ((c#138L = 2) && (a#0 = 3))
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]

Query plan after optimisation :-
Filter (c#138L = 2)
Aggregate [a#0], [a#0,count(b#1) AS c#138L]
Filter (a#0 = 3)
Project [a#0,b#1]
LocalRelation [a#0,b#1,c#2]
@nitin2goyal
Copy link
Author

Somehow, I wasn't getting scalastyle issue on my branch with above 2 commands. Cloned apache spark master, cherry-picked my changes and then I got the error. Checked-in the whitespace removal. Please test this.

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44041 has finished for PR 9167 at commit 82fc386.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Thanks, merging to master!

@asfgit asfgit closed this in f62e326 Oct 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants