Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24478][SQL][followup] Move projection and filter push down to physical conversion #21574

Closed
wants to merge 3 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a followup of #21503, to completely move operator pushdown to the planner rule.

The code are mostly from #21319

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @rdblue @gengliangwang

case _ =>
throw new AnalysisException(s"Data source is not readable: $name")
}
private def asReadSupport: ReadSupport = source match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strip one indentation level to shorten the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on cosmetic changes like this. It doesn't make a big improvement, but makes it harder to track down when a functional change happened.

@SparkQA
Copy link

SparkQA commented Jun 15, 2018

Test build #91885 has finished for PR 21574 at commit 1aeb6ab.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 16, 2018

Test build #91937 has finished for PR 21574 at commit 1aeb6ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jun 16, 2018

Does this move actually make sense? It'd destroy stats estimation for partition pruning.

@cloud-fan
Copy link
Contributor Author

According to the dicsussion , we agree that it's better to remove the hacky workaround for mutable DataSourceV2Relation and not having stats estimation for partition pruning. For builtin data source like file source, we can still keep the optimzer rule PruneFileSourcePartitions to make them work.

@rdblue
Copy link
Contributor

rdblue commented Jun 18, 2018

@rxin, we can also add the second pushdown (in the stats visitor) to get better stats with a property to turn it on or off. We're going to add it back in our branch anyway.

*
* @return pushed filter and post-scan filters.
*/
private def pushFilters(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for moving these functions. I considered it in the other commit, but decided to go with fewer changes. I like them here.

case class DataSourceV2Relation(
source: DataSourceV2,
output: Seq[AttributeReference],
options: Map[String, String],
userSpecifiedSchema: Option[StructType] = None)
userSpecifiedSchema: Option[StructType])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a place that uses this default value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because there are few places that create v2 relations so far, but when SQL statements and other paths that don't allow you to supply your own schema are added, I think this will be more common. It's okay to remove it, but I don't see much value in the change and I like to keep non-functional changes to a minimum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I agree we should minimum the non-functional changes, but removing dead code is also good to do. This is really a small change, if we do need the default value in the future, it's very easy to add it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, it's up to you.

@@ -106,7 +106,7 @@ case class StreamingDataSourceV2Relation(

object DataSourceV2Relation {
private implicit class SourceHelpers(source: DataSourceV2) {
def asReadSupport: ReadSupport = {
private def asReadSupport: ReadSupport = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: these are effectively private because SourceHelpers is private. If we were to move SourceHelpers or make it public to some other part of Spark, we would have to revert these changes. So I think it is best to rely on the visibility of SourceHelpers instead of making these private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

@rdblue
Copy link
Contributor

rdblue commented Jun 18, 2018

+1 (non-binding) assuming that tests pass.

@SparkQA
Copy link

SparkQA commented Jun 18, 2018

Test build #92041 has finished for PR 21574 at commit 13c91df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2018

Test build #92049 has finished for PR 21574 at commit 4471422.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

thanks, merging to master!

@asfgit asfgit closed this in 1737d45 Jun 19, 2018
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Mar 7, 2019
…physical conversion

This is a followup of apache#21503, to completely move operator pushdown to the planner rule.

The code are mostly from apache#21319

existing tests

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#21574 from cloud-fan/followup.

(cherry picked from commit 1737d45)

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
rdblue pushed a commit to rdblue/spark that referenced this pull request Apr 3, 2019
…physical conversion

This is a followup of apache#21503, to completely move operator pushdown to the planner rule.

The code are mostly from apache#21319

existing tests

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#21574 from cloud-fan/followup.
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Oct 15, 2019
…physical conversion

This is a followup of apache#21503, to completely move operator pushdown to the planner rule.

The code are mostly from apache#21319

existing tests

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#21574 from cloud-fan/followup.

(cherry picked from commit 1737d45)

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
…physical conversion

This is a followup of apache#21503, to completely move operator pushdown to the planner rule.

The code are mostly from apache#21319

existing tests

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#21574 from cloud-fan/followup.

Ref: LIHADOOP-48531

RB=1853689
G=superfriends-reviewers
R=zolin,fli,yezhou,mshen,latang
A=
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants