[WIP][SPARK-22497][SQL] Project reuse by wangyum · Pull Request #19727 · apache/spark

wangyum · 2017-11-12T16:23:10Z

What changes were proposed in this pull request?

The below SQL will scan table1 twice. This PR reuse the p1 and scan table1 once.

with p1 as (select * from table1 where key < 100), 
s1 as (SELECT key, count(*) FROM p1 group by key), 
s2 as (SELECT key, count(*) FROM p1 where key > -100 group by key) 
select s1.* from s1 join s2 on s1.key= s2.key

How was this patch tested?

unit tests

SparkQA · 2017-11-12T18:02:38Z

Test build #83744 has finished for PR 19727 at commit 1c458b8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ReusedProjectExec(override val output: Seq[Attribute], child: ProjectExec)
case class ReuseProject(conf: SQLConf) extends Rule[SparkPlan]

viirya · 2017-11-13T02:57:16Z

Simply reusing ProjectExec doesn't really reduce the scan. The duplication execution of CTE is a well known issue. I've addressed it before. But seems no solution to deal all possible cases yet.

gatorsmile · 2017-11-13T21:37:52Z

CTE reuse can cause the performance regression. It is hard to address without considering the costs.

Reuse project

1c458b8

wangyum closed this Nov 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][SPARK-22497][SQL] Project reuse#19727

[WIP][SPARK-22497][SQL] Project reuse#19727
wangyum wants to merge 1 commit intoapache:masterfrom
wangyum:SPARK-22497

wangyum commented Nov 12, 2017

Uh oh!

SparkQA commented Nov 12, 2017

Uh oh!

viirya commented Nov 13, 2017

Uh oh!

gatorsmile commented Nov 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wangyum commented Nov 12, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Nov 12, 2017

Uh oh!

viirya commented Nov 13, 2017

Uh oh!

gatorsmile commented Nov 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants