Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13732] [SPARK-13797] [SQL] Remove projectList from Window and Eliminate useless Window #11565

Closed
wants to merge 46 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

projectList is useless. Its value is always the same as the child.output. Remove it from the class Window. Removal can simplify the codes in Analyzer and Optimizer.

This PR is based on the discussion started by @cloud-fan in a separate PR:
#5604 (comment)

This PR also eliminates useless Window.

@cloud-fan @yhuai

How was this patch tested?

Existing test cases cover it.

gatorsmile and others added 30 commits November 13, 2015 14:50
// Prune windowExpressions and child of Window
case p @ Project(_, w: Window) if (w.outputSet -- p.references).nonEmpty =>
val newWindowExprs = w.windowExpressions.filter(p.references.contains)
val newGrandChild = prunedChild(w.child, w.references ++ p.references)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w.references ++ p.references are not the attributes we need from child right? First, we may filter out some window expressions and w.references is out of date. Second, w.child.output will never contains p.reference, as Window operator only produce window function results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your points. Yeah, you are right. We should use the newWindowExprs.

For the second part, we still need p.references. After this PR, the output of Window is child.output ++ windowExpression.map(_.toAttribute). Thus, the attributes used in Project have to be kept. Is my understanding right?

How about this?

prunedChild(w.child, p.references ++ AttributeSet(newWindowExprs.flatMap(_.references)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, based on your suggestion, I found another rule to eliminate useless Window. : )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, we also can convert Aggregate to Project if Aggregate does not have any aggregate function.

@SparkQA
Copy link

SparkQA commented Mar 10, 2016

Test build #52815 has finished for PR 11565 at commit 4dd3e66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile changed the title [SPARK-13732] [SQL] Remove projectList from Window [SPARK-13732] [SPARK-13797] [SPARK-13798] [SQL] Remove projectList from Window and Eliminate useless Aggregate and Window Mar 10, 2016
@@ -384,9 +374,25 @@ object ColumnPruning extends Rule[LogicalPlan] {
// Eliminate no-op Projects
case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child

// Eliminate no-op Window
case w: Window if sameOutput(w.child.output, w.output) => w.child
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w.windowExpressions.empty looks simplier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, will do

@gatorsmile gatorsmile changed the title [SPARK-13732] [SPARK-13797] [SPARK-13798] [SQL] Remove projectList from Window and Eliminate useless Aggregate and Window [SPARK-13732] [SPARK-13797] [SQL] Remove projectList from Window and Eliminate useless Aggregate and Window Mar 10, 2016
@SparkQA
Copy link

SparkQA commented Mar 10, 2016

Test build #52818 has finished for PR 11565 at commit 6cf6f44.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2016

Test build #52820 has finished for PR 11565 at commit 44326f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

UnspecifiedFrame)).as('window) :: Nil,
'a :: Nil, 'b.asc :: Nil)
.select('a, 'c, 'window).select('a, 'c, 'window, 'window)
.select('a, 'c, 'window).where('window > 1).select('a, 'c).analyze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to see 3 selects here, we can add CollapseProject rule into the optimizer in this test suite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Thanks!

@cloud-fan
Copy link
Contributor

LGTM exception 2 small comments. It will be better if we can add more DSL to build window expression, but I'm ok to do it in follow-ups.

cc @yhuai

@gatorsmile gatorsmile changed the title [SPARK-13732] [SPARK-13797] [SQL] Remove projectList from Window and Eliminate useless Aggregate and Window [SPARK-13732] [SPARK-13797] [SQL] Remove projectList from Window and Eliminate useless Window Mar 10, 2016
@gatorsmile
Copy link
Member Author

Will add more DSL for Window in a separate PR #11635, which is doing predicate push down for Window. Thank you for your detailed review! @cloud-fan

@SparkQA
Copy link

SparkQA commented Mar 10, 2016

Test build #52831 has finished for PR 11565 at commit fc96d84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Can't prune the columns on LeafNode
case p @ Project(_, l: LeafNode) => p

// Prune windowExpressions and child of Window
case p @ Project(_, w: Window) if (w.outputSet -- p.references).nonEmpty =>
val newWindowExprs = w.windowExpressions.filter(p.references.contains)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rethink about it, seems we can still separate it into 2 rules.
We can add a def windowOutputSet = AttributeSet(windowExpressions.map(_.toAttribute)) to Window operator, and change the first case to case p @ Project(_, w: Window) if (w.windowOutputSet -- p.references).nonEmpty => and filter out useless window expressions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do it.Thanks!

@@ -384,6 +381,9 @@ object ColumnPruning extends Rule[LogicalPlan] {
// Eliminate no-op Projects
case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child

// Eliminate no-op Window
case w: Window if w.windowExpressions.isEmpty => w.child
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this rule near the rule that filter out useless window expressions, which makes people eaiser to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@cloud-fan
Copy link
Contributor

LGTM, thanks for working on it!
will merge after tests pass.

@gatorsmile
Copy link
Member Author

Thank you for your detailed review! Actually, the problem was first found by you. I thought my PR can save your time, but it sounds like you still waste a lot of time on the review. Anyway, really thank you! @cloud-fan

@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52883 has finished for PR 11565 at commit 6a59b42.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

nvm, that's my job:)
Thanks, merging to master!

@asfgit asfgit closed this in 560489f Mar 11, 2016
@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52885 has finished for PR 11565 at commit bd35ee7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…iminate useless Window

#### What changes were proposed in this pull request?

`projectList` is useless. Its value is always the same as the child.output. Remove it from the class `Window`. Removal can simplify the codes in Analyzer and Optimizer.

This PR is based on the discussion started by cloud-fan in a separate PR:
apache#5604 (comment)

This PR also eliminates useless `Window`.

cloud-fan yhuai

#### How was this patch tested?

Existing test cases cover it.

Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>

Closes apache#11565 from gatorsmile/removeProjListWindow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants