New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24478][SQL][followup] Move projection and filter push down to physical conversion #21574
Conversation
case _ => | ||
throw new AnalysisException(s"Data source is not readable: $name") | ||
} | ||
private def asReadSupport: ReadSupport = source match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strip one indentation level to shorten the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -1 on cosmetic changes like this. It doesn't make a big improvement, but makes it harder to track down when a functional change happened.
Test build #91885 has finished for PR 21574 at commit
|
retest this please |
Test build #91937 has finished for PR 21574 at commit
|
Does this move actually make sense? It'd destroy stats estimation for partition pruning. |
According to the dicsussion , we agree that it's better to remove the hacky workaround for mutable |
@rxin, we can also add the second pushdown (in the stats visitor) to get better stats with a property to turn it on or off. We're going to add it back in our branch anyway. |
* | ||
* @return pushed filter and post-scan filters. | ||
*/ | ||
private def pushFilters( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for moving these functions. I considered it in the other commit, but decided to go with fewer changes. I like them here.
case class DataSourceV2Relation( | ||
source: DataSourceV2, | ||
output: Seq[AttributeReference], | ||
options: Map[String, String], | ||
userSpecifiedSchema: Option[StructType] = None) | ||
userSpecifiedSchema: Option[StructType]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a place that uses this default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because there are few places that create v2 relations so far, but when SQL statements and other paths that don't allow you to supply your own schema are added, I think this will be more common. It's okay to remove it, but I don't see much value in the change and I like to keep non-functional changes to a minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I agree we should minimum the non-functional changes, but removing dead code is also good to do. This is really a small change, if we do need the default value in the future, it's very easy to add it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, it's up to you.
@@ -106,7 +106,7 @@ case class StreamingDataSourceV2Relation( | |||
|
|||
object DataSourceV2Relation { | |||
private implicit class SourceHelpers(source: DataSourceV2) { | |||
def asReadSupport: ReadSupport = { | |||
private def asReadSupport: ReadSupport = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: these are effectively private because SourceHelpers
is private. If we were to move SourceHelpers
or make it public to some other part of Spark, we would have to revert these changes. So I think it is best to rely on the visibility of SourceHelpers
instead of making these private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
+1 (non-binding) assuming that tests pass. |
Test build #92041 has finished for PR 21574 at commit
|
Test build #92049 has finished for PR 21574 at commit
|
thanks, merging to master! |
…physical conversion This is a followup of apache#21503, to completely move operator pushdown to the planner rule. The code are mostly from apache#21319 existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes apache#21574 from cloud-fan/followup. (cherry picked from commit 1737d45) Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
…physical conversion This is a followup of apache#21503, to completely move operator pushdown to the planner rule. The code are mostly from apache#21319 existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes apache#21574 from cloud-fan/followup.
…physical conversion This is a followup of apache#21503, to completely move operator pushdown to the planner rule. The code are mostly from apache#21319 existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes apache#21574 from cloud-fan/followup. (cherry picked from commit 1737d45) Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
…physical conversion This is a followup of apache#21503, to completely move operator pushdown to the planner rule. The code are mostly from apache#21319 existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes apache#21574 from cloud-fan/followup. Ref: LIHADOOP-48531 RB=1853689 G=superfriends-reviewers R=zolin,fli,yezhou,mshen,latang A=
What changes were proposed in this pull request?
This is a followup of #21503, to completely move operator pushdown to the planner rule.
The code are mostly from #21319
How was this patch tested?
existing tests