-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36404][SQL] Support ORC nested column vectorized reader for data source v2 #33626
Conversation
withSQLConf(SQLConf.ORC_VECTORIZED_READER_NESTED_COLUMN_ENABLED.key -> "true") { | ||
val readDf = spark.read.orc(path) | ||
val vectorizationEnabled = readDf.queryExecution.executedPlan.find { | ||
case scan @ (_: FileSourceScanExec | _: BatchScanExec) => scan.supportsColumnar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added BatchScanExec
here for DS v2 compared to the original test in OrcSourceSuite.scala
. Moved the query as we can test DS v1 and v2 here via OrcV1QuerySuite
and OrcV2QuerySuite
defined below.
Kubernetes integration test starting |
Kubernetes integration test status success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @c21 and @HyukjinKwon .
Merged to master
Thank you @HyukjinKwon and @dongjoon-hyun for review! |
Test build #142008 has finished for PR 33626 at commit
|
What changes were proposed in this pull request?
We added support of nested columns in ORC vectorized reader for data source v1. Data source v2 and v1 both use same underlying implementation for vectorized reader (OrcColumnVector), so we can support data source v2 as well.
Why are the changes needed?
Improve query performance for ORC data source v2 when reading nested columns.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added test in
OrcQuerySuite.scala
.