New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26390][SQL] ColumnPruning rule should only do column pruning #23343
Conversation
Test build #100274 has finished for PR 23343 at commit
|
@@ -93,7 +93,7 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) | |||
RewriteCorrelatedScalarSubquery, | |||
EliminateSerialization, | |||
RemoveRedundantAliases, | |||
RemoveRedundantProject, | |||
RemoveNoopOperators, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: RemoveUselessOperators
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Noop is fine too. It's no-op, right :-)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is fine too, I preferred Useless
because they are actually doing something, so they introduce a useless overhead, anyway not a big deal
// Can't prune the columns on LeafNode | ||
case p @ Project(_, _: LeafNode) => p | ||
|
||
// for all other logical plans that inherits the output from it's children | ||
case p @ Project(_, child) => | ||
case p @ Project(_, child) if !child.isInstanceOf[Project] => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case the child is a project, shall we anyway update it with c.output.filter(allReferences.contains)
? I mean can we instead update the prunedChild
method to check if c
is a Project
and behave accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already handled project over project at L542
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I see, makes sense, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires a code comment. I believe the others will ask the same q when we read the problem again.
retest this please |
@@ -34,6 +34,7 @@ class ColumnPruningSuite extends PlanTest { | |||
val batches = Batch("Column pruning", FixedPoint(100), | |||
PushDownPredicate, | |||
ColumnPruning, | |||
RemoveNoopOperators, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this added to remove top Project('b :: Nil ...)
? Without that, this test can be unchanged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this, a lot more tests need to be updated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see. :)
ColumnPruning) :: | ||
Batch("Column Pruning", FixedPoint(100), | ||
ColumnPruning, | ||
RemoveNoopOperators) :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Looks not precise to have RemoveNoopOperators
in Column Pruning
batch, but it is fine as this is just test.
Test build #100284 has finished for PR 23343 at commit
|
In general, we normally do not encourage the code refactoring in optimizer rules. Any change could introduce a perf regression in query planning. For this PR, I think it is safe although it could slightly increase the number of runs in the big optimizer batch. LGTM. Please update the comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
case p @ Project(_, child) if child.sameOutput(p) => child | ||
|
||
// Eliminate no-op Window | ||
case w: Window if w.windowExpressions.isEmpty => w.child | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan . Is this too small to move out as a separate file during this refactoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's small and it has been here for a while. We can move many rules here to separated files in another PR.
cc @dbtsai |
retest this please |
The final commit is only about adding a comment. I'll merge this to master. Thank you all. |
@dongjoon-hyun thanks for pinging me. I'm working on |
@dbtsai It will make us hard to track the change history. Let us avoid moving the existing rules to new files. |
@gatorsmile for example, we were refactoring I'm wondering when we should or should not do such refactoring. I create a PR #23359 so we can discuss over there. |
|
## What changes were proposed in this pull request? This is a small clean up. By design catalyst rules should be orthogonal: each rule should have its own responsibility. However, the `ColumnPruning` rule does not only do column pruning, but also remove no-op project and window. This PR updates the `RemoveRedundantProject` rule to remove no-op window as well, and clean up the `ColumnPruning` rule to only do column pruning. ## How was this patch tested? existing tests Closes apache#23343 from cloud-fan/column-pruning. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
## What changes were proposed in this pull request? This is a small clean up. By design catalyst rules should be orthogonal: each rule should have its own responsibility. However, the `ColumnPruning` rule does not only do column pruning, but also remove no-op project and window. This PR updates the `RemoveRedundantProject` rule to remove no-op window as well, and clean up the `ColumnPruning` rule to only do column pruning. ## How was this patch tested? existing tests Closes apache#23343 from cloud-fan/column-pruning. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This is a small clean up.
By design catalyst rules should be orthogonal: each rule should have its own responsibility. However, the
ColumnPruning
rule does not only do column pruning, but also remove no-op project and window.This PR updates the
RemoveRedundantProject
rule to remove no-op window as well, and clean up theColumnPruning
rule to only do column pruning.How was this patch tested?
existing tests