Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36404][SQL] Support ORC nested column vectorized reader for data source v2 #33626

Closed
wants to merge 1 commit into from

Conversation

c21
Copy link
Contributor

@c21 c21 commented Aug 3, 2021

What changes were proposed in this pull request?

We added support of nested columns in ORC vectorized reader for data source v1. Data source v2 and v1 both use same underlying implementation for vectorized reader (OrcColumnVector), so we can support data source v2 as well.

Why are the changes needed?

Improve query performance for ORC data source v2 when reading nested columns.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added test in OrcQuerySuite.scala.

@github-actions github-actions bot added the SQL label Aug 3, 2021
withSQLConf(SQLConf.ORC_VECTORIZED_READER_NESTED_COLUMN_ENABLED.key -> "true") {
val readDf = spark.read.orc(path)
val vectorizationEnabled = readDf.queryExecution.executedPlan.find {
case scan @ (_: FileSourceScanExec | _: BatchScanExec) => scan.supportsColumnar
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added BatchScanExec here for DS v2 compared to the original test in OrcSourceSuite.scala. Moved the query as we can test DS v1 and v2 here via OrcV1QuerySuite and OrcV2QuerySuite defined below.

@SparkQA
Copy link

SparkQA commented Aug 4, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46520/

@SparkQA
Copy link

SparkQA commented Aug 4, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46520/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @c21 and @HyukjinKwon .
Merged to master

@c21
Copy link
Contributor Author

c21 commented Aug 4, 2021

Thank you @HyukjinKwon and @dongjoon-hyun for review!

@SparkQA
Copy link

SparkQA commented Aug 4, 2021

Test build #142008 has finished for PR 33626 at commit f58e31e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants