Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7824][SQL]Collapsing operator reordering and constant folding into a single batch to push down the single side. #6351

Closed
wants to merge 27 commits into from

Conversation

pzzs
Copy link
Contributor

@pzzs pzzs commented May 22, 2015

SQL

select * from tableA join tableB on (a > 3 and b = d) or (a > 3 and b = e)

Plan before modify

== Optimized Logical Plan ==
Project [a#293,b#294,c#295,d#296,e#297]
 Join Inner, Some(((a#293 > 3) && ((b#294 = d#296) || (b#294 = e#297))))
  MetastoreRelation default, tablea, None
  MetastoreRelation default, tableb, None

Plan after modify

== Optimized Logical Plan ==
Project [a#293,b#294,c#295,d#296,e#297]
 Join Inner, Some(((b#294 = d#296) || (b#294 = e#297)))
  Filter (a#293 > 3)
   MetastoreRelation default, tablea, None
  MetastoreRelation default, tableb, None

CombineLimits ==> Limit(If(LessThan(ne, le), ne, le), grandChild) and LessThan is in BooleanSimplification , so CombineLimits must before BooleanSimplification and BooleanSimplification must before PushPredicateThroughJoin.

@pzzs
Copy link
Contributor Author

pzzs commented May 23, 2015

This optimizer can void CartesianProduct
Tables

tableA        tableB        tableC
a   int       c    int      f    int    
b   int       d    int      g    int
              e    int

SQL

select * from tableA, tableB, tableC where (d = f and  a = c and b = d) or (d = f and a = c and b = e)

Plan before modify

== Optimized Logical Plan ==
Project [a#306,b#307,c#308,d#309,e#310,f#311,g#312]
 Join Inner, Some((((d#309 = f#311) && (a#306 = c#308)) && ((b#307 = d#309) || (b#307 = e#310))))
  Join Inner, None
   MetastoreRelation default, tablea, None
   MetastoreRelation default, tableb, None
  MetastoreRelation default, tablec, None

== Physical Plan ==
Project [a#306,b#307,c#308,d#309,e#310,f#311,g#312]
 Filter ((a#306 = c#308) && ((b#307 = d#309) || (b#307 = e#310)))
  ShuffledHashJoin [d#309], [f#311], BuildRight
   Exchange (HashPartitioning 5), []
    CartesianProduct
     HiveTableScan [a#306,b#307], (MetastoreRelation default, tablea, None), None
     HiveTableScan [c#308,d#309,e#310], (MetastoreRelation default, tableb, None), None
   Exchange (HashPartitioning 5), []
    HiveTableScan [f#311,g#312], (MetastoreRelation default, tablec, None), None

Plan after modify

== Optimized Logical Plan ==
Project [a#306,b#307,c#308,d#309,e#310,f#311,g#312]
 Join Inner, Some((d#309 = f#311))
  Join Inner, Some(((a#306 = c#308) && ((b#307 = d#309) || (b#307 = e#310))))
   MetastoreRelation default, tablea, None
   MetastoreRelation default, tableb, None
  MetastoreRelation default, tablec, None

== Physical Plan ==
Project [a#306,b#307,c#308,d#309,e#310,f#311,g#312]
 ShuffledHashJoin [d#309], [f#311], BuildRight
  Exchange (HashPartitioning 5), []
   Filter ((b#307 = d#309) || (b#307 = e#310))
    ShuffledHashJoin [a#306], [c#308], BuildRight
     Exchange (HashPartitioning 5), []
      HiveTableScan [a#306,b#307], (MetastoreRelation default, tablea, None), None
     Exchange (HashPartitioning 5), []
      HiveTableScan [c#308,d#309,e#310], (MetastoreRelation default, tableb, None), None
  Exchange (HashPartitioning 5), []
   HiveTableScan [f#311,g#312], (MetastoreRelation default, tablec, None), None

@pzzs
Copy link
Contributor Author

pzzs commented May 26, 2015

@marmbrus @scwf

@scwf
Copy link
Contributor

scwf commented May 26, 2015

Get it. The root cause here is we moved Filter Pushdown before ConstantFolding in 4459514, then we can not simplify boolean condition before filter push down through join, which leads to CartesianProduct.

i think this change is reasonable.

@marmbrus
Copy link
Contributor

Would you get the same result (and possibly more) by collapsing operator reordering and constant folding into a single batch? This seems like a very one-off fix to me.

@pzzs
Copy link
Contributor Author

pzzs commented May 27, 2015

yes, it can work. but this batch will have two different types of optimizers. @marmbrus

@marmbrus
Copy link
Contributor

That is not fundamentally a problem. Honestly some more thought probably needs to be put into the batches. Really the only reasons for splitting are the following:

  • Large batches are inherently more costly as you must go through every rule, even if only a small number are making changes. So if rules will never interact they can be in separate batches.
  • However, large batches are more powerful as there is more opportunity for rules to interact
  • Its possible for rules to undo the result of other rules. In this case they must be in separate batches or it will go back and forth till the limit is reached
  • Another reason for batches is satisfying preconditions. (i.e. a plan must be analyzed before optimizing it).

@pzzs pzzs changed the title [SPARK-7824][SQL] Extracting and/or condition optimizer from BooleanSimplification optimizer and put it before PushPredicateThroughJoin optimizer to push down the single side. [SPARK-7824][SQL]Collapsing operator reordering and constant folding into a single batch to push down the single side. May 27, 2015
@pzzs
Copy link
Contributor Author

pzzs commented May 27, 2015

@marmbrus /cc

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented May 31, 2015

Test build #33867 has finished for PR 6351 at commit a04ffae.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -36,21 +36,20 @@ object DefaultOptimizer extends Optimizer {
// SubQueries are only needed for analysis and can be removed before execution.
Batch("Remove SubQueries", FixedPoint(100),
EliminateSubQueries) ::
Batch("Operator Reordering", FixedPoint(100),
Batch("Operator Optimizations", FixedPoint(100),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about "Operator Reordering and ConstantFolding"

@SparkQA
Copy link

SparkQA commented Jun 1, 2015

Test build #33869 has finished for PR 6351 at commit ae3af6d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 1, 2015

Test build #33875 has finished for PR 6351 at commit f8b9314.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pzzs
Copy link
Contributor Author

pzzs commented Jun 4, 2015

@marmbrus

@pzzs pzzs closed this Jun 10, 2015
@pzzs pzzs reopened this Jun 10, 2015
@pzzs
Copy link
Contributor Author

pzzs commented Jun 11, 2015

@marmbrus @yhuai @scwf /cc

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34717 has finished for PR 6351 at commit 20de7be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Thanks! Merged to master.

@asfgit asfgit closed this in 7914c72 Jun 12, 2015
NullPropagation,
OptimizeIn,
ConstantFolding,
LikeSimplification,
BooleanSimplification,
PushPredicateThroughJoin,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to change the rule order inside a batch with fixed point. Rules within a single batch shouldn't be sensitive to execution order.

nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…into a single batch.

SQL
```
select * from tableA join tableB on (a > 3 and b = d) or (a > 3 and b = e)
```
Plan before modify
```
== Optimized Logical Plan ==
Project [a#293,b#294,c#295,d#296,e#297]
 Join Inner, Some(((a#293 > 3) && ((b#294 = d#296) || (b#294 = e#297))))
  MetastoreRelation default, tablea, None
  MetastoreRelation default, tableb, None
```
Plan after modify
```
== Optimized Logical Plan ==
Project [a#293,b#294,c#295,d#296,e#297]
 Join Inner, Some(((b#294 = d#296) || (b#294 = e#297)))
  Filter (a#293 > 3)
   MetastoreRelation default, tablea, None
  MetastoreRelation default, tableb, None
```

CombineLimits ==> Limit(If(LessThan(ne, le), ne, le), grandChild) and LessThan is in BooleanSimplification ,  so CombineLimits  must before BooleanSimplification and BooleanSimplification must before PushPredicateThroughJoin.

Author: Zhongshuai Pei <799203320@qq.com>
Author: DoingDone9 <799203320@qq.com>

Closes apache#6351 from DoingDone9/master and squashes the following commits:

20de7be [Zhongshuai Pei] Update Optimizer.scala
7bc7d28 [Zhongshuai Pei] Merge pull request apache#17 from apache/master
0ba5f42 [Zhongshuai Pei] Update Optimizer.scala
f8b9314 [Zhongshuai Pei] Update FilterPushdownSuite.scala
c529d9f [Zhongshuai Pei] Update FilterPushdownSuite.scala
ae3af6d [Zhongshuai Pei] Update FilterPushdownSuite.scala
a04ffae [Zhongshuai Pei] Update Optimizer.scala
11beb61 [Zhongshuai Pei] Update FilterPushdownSuite.scala
f2ee5fe [Zhongshuai Pei] Update Optimizer.scala
be6b1d5 [Zhongshuai Pei] Update Optimizer.scala
b01e622 [Zhongshuai Pei] Merge pull request apache#15 from apache/master
8df716a [Zhongshuai Pei] Update FilterPushdownSuite.scala
d98bc35 [Zhongshuai Pei] Update FilterPushdownSuite.scala
fa65718 [Zhongshuai Pei] Update Optimizer.scala
ab8e9a6 [Zhongshuai Pei] Merge pull request apache#14 from apache/master
14952e2 [Zhongshuai Pei] Merge pull request apache#13 from apache/master
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request apache#5 from apache/master
161cae3 [DoingDone9] Merge pull request apache#4 from apache/master
c87e8b6 [DoingDone9] Merge pull request apache#3 from apache/master
cb1852d [DoingDone9] Merge pull request apache#2 from apache/master
c3f046f [DoingDone9] Merge pull request apache#1 from apache/master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants