-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24172][SQL]: Push projection and filters once when converting to physical plan. #21262
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced May 8, 2018
Test build #90346 has finished for PR 21262 at commit
|
jzhuge
pushed a commit
to jzhuge/spark
that referenced
this pull request
Mar 7, 2019
…onversion This removes the v2 optimizer rule for push-down and instead pushes filters and required columns when converting to a physical plan, as suggested by marmbrus. This makes the v2 relation cleaner because the output and filters do not change in the logical plan. A side-effect of this change is that the stats from the logical (optimized) plan no longer reflect pushed filters and projection. This is a temporary state, until the planner gathers stats from the physical plan instead. An alternative to this approach is rdblue@9d3a11e. The first commit was proposed in apache#21262. This PR replaces apache#21262. Existing tests. Author: Ryan Blue <blue@apache.org> Closes apache#21503 from rdblue/SPARK-24478-move-push-down-to-physical-conversion. (cherry picked from commit 22daeba) Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
rdblue
added a commit
to rdblue/spark
that referenced
this pull request
Apr 3, 2019
…onversion This removes the v2 optimizer rule for push-down and instead pushes filters and required columns when converting to a physical plan, as suggested by marmbrus. This makes the v2 relation cleaner because the output and filters do not change in the logical plan. A side-effect of this change is that the stats from the logical (optimized) plan no longer reflect pushed filters and projection. This is a temporary state, until the planner gathers stats from the physical plan instead. An alternative to this approach is 9d3a11e. The first commit was proposed in apache#21262. This PR replaces apache#21262. Existing tests. Author: Ryan Blue <blue@apache.org> Closes apache#21503 from rdblue/SPARK-24478-move-push-down-to-physical-conversion.
jzhuge
pushed a commit
to jzhuge/spark
that referenced
this pull request
Oct 15, 2019
…onversion This removes the v2 optimizer rule for push-down and instead pushes filters and required columns when converting to a physical plan, as suggested by marmbrus. This makes the v2 relation cleaner because the output and filters do not change in the logical plan. A side-effect of this change is that the stats from the logical (optimized) plan no longer reflect pushed filters and projection. This is a temporary state, until the planner gathers stats from the physical plan instead. An alternative to this approach is rdblue@9d3a11e. The first commit was proposed in apache#21262. This PR replaces apache#21262. Existing tests. Author: Ryan Blue <blue@apache.org> Closes apache#21503 from rdblue/SPARK-24478-move-push-down-to-physical-conversion. (cherry picked from commit 22daeba) Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
otterc
pushed a commit
to linkedin/spark
that referenced
this pull request
Mar 22, 2023
…onversion This removes the v2 optimizer rule for push-down and instead pushes filters and required columns when converting to a physical plan, as suggested by marmbrus. This makes the v2 relation cleaner because the output and filters do not change in the logical plan. A side-effect of this change is that the stats from the logical (optimized) plan no longer reflect pushed filters and projection. This is a temporary state, until the planner gathers stats from the physical plan instead. An alternative to this approach is rdblue@9d3a11e. The first commit was proposed in apache#21262. This PR replaces apache#21262. Existing tests. Author: Ryan Blue <blue@apache.org> Closes apache#21503 from rdblue/SPARK-24478-move-push-down-to-physical-conversion. Ref: LIHADOOP-48531 RB=1850239 G=superfriends-reviewers R=zolin,yezhou,latang,fli,mshen A=
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This removes
PruneFileSourcePartitions
and moves projection and filter push-down toDataSourceV2Strategy
. This accomplishes the same goal as #21230 and only runs the push-down once by not usingtransformUp
to traverse the plan.Unlike #21230, this moves pushdown to the v2 strategy to match the way pushdown happens for other code paths: when creating a physical plan from a logical plan. This was suggested by @marmbrus in #20387, but not implemented at the time. The same concern from that PR still applies to this commit: pushdown is not applied until conversion to a physical plan, so
computeStats
can't report stats after filtering or projecting.A benefit of this approach is that the
DataSourceV2Relation
is simpler and the relation'soutput
is constant.How was this patch tested?
This uses existing tests.