Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24172][SQL]: Push projection and filters once when converting to physical plan. #21262

Closed

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented May 8, 2018

What changes were proposed in this pull request?

This removes PruneFileSourcePartitions and moves projection and filter push-down to DataSourceV2Strategy. This accomplishes the same goal as #21230 and only runs the push-down once by not using transformUp to traverse the plan.

Unlike #21230, this moves pushdown to the v2 strategy to match the way pushdown happens for other code paths: when creating a physical plan from a logical plan. This was suggested by @marmbrus in #20387, but not implemented at the time. The same concern from that PR still applies to this commit: pushdown is not applied until conversion to a physical plan, so computeStats can't report stats after filtering or projecting.

A benefit of this approach is that the DataSourceV2Relation is simpler and the relation's output is constant.

How was this patch tested?

This uses existing tests.

@SparkQA
Copy link

SparkQA commented May 8, 2018

Test build #90346 has finished for PR 21262 at commit 7497cc2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue rdblue closed this Jun 26, 2018
@rdblue rdblue deleted the SPARK-24172-v2-pushdown-in-strategy branch June 26, 2018 18:06
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Mar 7, 2019
…onversion

This removes the v2 optimizer rule for push-down and instead pushes filters and required columns when converting to a physical plan, as suggested by marmbrus. This makes the v2 relation cleaner because the output and filters do not change in the logical plan.

A side-effect of this change is that the stats from the logical (optimized) plan no longer reflect pushed filters and projection. This is a temporary state, until the planner gathers stats from the physical plan instead. An alternative to this approach is rdblue@9d3a11e.

The first commit was proposed in apache#21262. This PR replaces apache#21262.

Existing tests.

Author: Ryan Blue <blue@apache.org>

Closes apache#21503 from rdblue/SPARK-24478-move-push-down-to-physical-conversion.

(cherry picked from commit 22daeba)

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
rdblue added a commit to rdblue/spark that referenced this pull request Apr 3, 2019
…onversion

This removes the v2 optimizer rule for push-down and instead pushes filters and required columns when converting to a physical plan, as suggested by marmbrus. This makes the v2 relation cleaner because the output and filters do not change in the logical plan.

A side-effect of this change is that the stats from the logical (optimized) plan no longer reflect pushed filters and projection. This is a temporary state, until the planner gathers stats from the physical plan instead. An alternative to this approach is 9d3a11e.

The first commit was proposed in apache#21262. This PR replaces apache#21262.

Existing tests.

Author: Ryan Blue <blue@apache.org>

Closes apache#21503 from rdblue/SPARK-24478-move-push-down-to-physical-conversion.
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Oct 15, 2019
…onversion

This removes the v2 optimizer rule for push-down and instead pushes filters and required columns when converting to a physical plan, as suggested by marmbrus. This makes the v2 relation cleaner because the output and filters do not change in the logical plan.

A side-effect of this change is that the stats from the logical (optimized) plan no longer reflect pushed filters and projection. This is a temporary state, until the planner gathers stats from the physical plan instead. An alternative to this approach is rdblue@9d3a11e.

The first commit was proposed in apache#21262. This PR replaces apache#21262.

Existing tests.

Author: Ryan Blue <blue@apache.org>

Closes apache#21503 from rdblue/SPARK-24478-move-push-down-to-physical-conversion.

(cherry picked from commit 22daeba)

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
…onversion

This removes the v2 optimizer rule for push-down and instead pushes filters and required columns when converting to a physical plan, as suggested by marmbrus. This makes the v2 relation cleaner because the output and filters do not change in the logical plan.

A side-effect of this change is that the stats from the logical (optimized) plan no longer reflect pushed filters and projection. This is a temporary state, until the planner gathers stats from the physical plan instead. An alternative to this approach is rdblue@9d3a11e.

The first commit was proposed in apache#21262. This PR replaces apache#21262.

Existing tests.

Author: Ryan Blue <blue@apache.org>

Closes apache#21503 from rdblue/SPARK-24478-move-push-down-to-physical-conversion.

Ref: LIHADOOP-48531

RB=1850239
G=superfriends-reviewers
R=zolin,yezhou,latang,fli,mshen
A=
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants