-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38999][SQL] Refactor FileSourceScanExec
: file scan physical node
#36327
Conversation
@cloud-fan @gengliangwang Can you please help review this? |
Can one of the admins verify this patch? |
} ++ staticMetrics | ||
} | ||
|
||
lazy val inputRDD: RDD[InternalRow] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea of separating planning/metrics stuff vs RDD/execution.
// Number of coalesced buckets. | ||
def optionalNumCoalescedBuckets: Option[Int] | ||
// Output attributes of the scan, including data attributes and partition attributes. | ||
def output: Seq[Attribute] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataSourceScanExec
extends LeafExecNode
, and def output: Seq[Attribute]
is already declared there.
// Output attributes of the scan, including data attributes and partition attributes. | ||
def output: Seq[Attribute] | ||
// Predicates to use for partition pruning. | ||
def partitionFilters: Seq[Expression] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we put it near dataFilters
?
thanks, merging to master! |
What changes were proposed in this pull request?
The PR refactors
FileSourceScanExec
case class into a base traitFileSourceScanLike
which is then subclassed byFileSourceScanExec
.FileSourceScanLike
contains basic functionality like metrics and file listing while theFileSourceScanExec
contains execution specific code.Why are the changes needed?
Currently the code for
FileSourceScanExec
class, the physical node for the file scans is quite complex and lengthy making it slightly difficult to reason about.Does this PR introduce any user-facing change?
No
How was this patch tested?
Code refactor, existing tests should suffice.