-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13732] [SPARK-13797] [SQL] Remove projectList from Window and Eliminate useless Window #11565
Conversation
// Prune windowExpressions and child of Window | ||
case p @ Project(_, w: Window) if (w.outputSet -- p.references).nonEmpty => | ||
val newWindowExprs = w.windowExpressions.filter(p.references.contains) | ||
val newGrandChild = prunedChild(w.child, w.references ++ p.references) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w.references ++ p.references
are not the attributes we need from child right? First, we may filter out some window expressions and w.references
is out of date. Second, w.child.output
will never contains p.reference
, as Window
operator only produce window function results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your points. Yeah, you are right. We should use the newWindowExprs
.
For the second part, we still need p.references
. After this PR, the output
of Window
is child.output ++ windowExpression.map(_.toAttribute)
. Thus, the attributes used in Project
have to be kept. Is my understanding right?
How about this?
prunedChild(w.child, p.references ++ AttributeSet(newWindowExprs.flatMap(_.references)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, based on your suggestion, I found another rule to eliminate useless Window
. : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, we also can convert Aggregate
to Project
if Aggregate
does not have any aggregate function.
Test build #52815 has finished for PR 11565 at commit
|
@@ -384,9 +374,25 @@ object ColumnPruning extends Rule[LogicalPlan] { | |||
// Eliminate no-op Projects | |||
case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child | |||
|
|||
// Eliminate no-op Window | |||
case w: Window if sameOutput(w.child.output, w.output) => w.child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w.windowExpressions.empty
looks simplier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, will do
Test build #52818 has finished for PR 11565 at commit
|
Test build #52820 has finished for PR 11565 at commit
|
UnspecifiedFrame)).as('window) :: Nil, | ||
'a :: Nil, 'b.asc :: Nil) | ||
.select('a, 'c, 'window).select('a, 'c, 'window, 'window) | ||
.select('a, 'c, 'window).where('window > 1).select('a, 'c).analyze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird to see 3 selects here, we can add CollapseProject
rule into the optimizer in this test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Thanks!
LGTM exception 2 small comments. It will be better if we can add more DSL to build window expression, but I'm ok to do it in follow-ups. cc @yhuai |
Will add more DSL for |
Test build #52831 has finished for PR 11565 at commit
|
// Can't prune the columns on LeafNode | ||
case p @ Project(_, l: LeafNode) => p | ||
|
||
// Prune windowExpressions and child of Window | ||
case p @ Project(_, w: Window) if (w.outputSet -- p.references).nonEmpty => | ||
val newWindowExprs = w.windowExpressions.filter(p.references.contains) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rethink about it, seems we can still separate it into 2 rules.
We can add a def windowOutputSet = AttributeSet(windowExpressions.map(_.toAttribute))
to Window
operator, and change the first case to case p @ Project(_, w: Window) if (w.windowOutputSet -- p.references).nonEmpty =>
and filter out useless window expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do it.Thanks!
@@ -384,6 +381,9 @@ object ColumnPruning extends Rule[LogicalPlan] { | |||
// Eliminate no-op Projects | |||
case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child | |||
|
|||
// Eliminate no-op Window | |||
case w: Window if w.windowExpressions.isEmpty => w.child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this rule near the rule that filter out useless window expressions, which makes people eaiser to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
LGTM, thanks for working on it! |
Thank you for your detailed review! Actually, the problem was first found by you. I thought my PR can save your time, but it sounds like you still waste a lot of time on the review. Anyway, really thank you! @cloud-fan |
Test build #52883 has finished for PR 11565 at commit
|
nvm, that's my job:) |
Test build #52885 has finished for PR 11565 at commit
|
…iminate useless Window #### What changes were proposed in this pull request? `projectList` is useless. Its value is always the same as the child.output. Remove it from the class `Window`. Removal can simplify the codes in Analyzer and Optimizer. This PR is based on the discussion started by cloud-fan in a separate PR: apache#5604 (comment) This PR also eliminates useless `Window`. cloud-fan yhuai #### How was this patch tested? Existing test cases cover it. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes apache#11565 from gatorsmile/removeProjListWindow.
What changes were proposed in this pull request?
projectList
is useless. Its value is always the same as the child.output. Remove it from the classWindow
. Removal can simplify the codes in Analyzer and Optimizer.This PR is based on the discussion started by @cloud-fan in a separate PR:
#5604 (comment)
This PR also eliminates useless
Window
.@cloud-fan @yhuai
How was this patch tested?
Existing test cases cover it.