[SPARK-23456][SPARK-21783] Turn on `native` ORC impl and PPD by default #20634

dongjoon-hyun · 2018-02-17T17:09:19Z

What changes were proposed in this pull request?

Apache Spark 2.3 introduced native ORC supports with vectorization and many fixes. However, it's shipped as a not-default option. This PR enables native ORC implementation and predicate-pushdown by default for Apache Spark 2.4. We will improve and stabilize ORC data source before Apache Spark 2.4. And, eventually, Apache Spark will drop old Hive-based ORC code.

How was this patch tested?

Pass the Jenkins with existing tests.

SparkQA · 2018-02-17T20:34:23Z

Test build #87525 has finished for PR 20634 at commit bde6818.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile

LGTM

gatorsmile · 2018-02-20T17:15:24Z

Thanks! Merged to master.

dongjoon-hyun · 2018-02-20T18:03:23Z

Thank you, @gatorsmile .

Apache Spark 2.3 introduced `native` ORC supports with vectorization and many fixes. However, it's shipped as a not-default option. This PR enables `native` ORC implementation and predicate-pushdown by default for Apache Spark 2.4. We will improve and stabilize ORC data source before Apache Spark 2.4. And, eventually, Apache Spark will drop old Hive-based ORC code. Pass the Jenkins with existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes apache#20634 from dongjoon-hyun/SPARK-23456. Change-Id: Ib7ec85d2ae6b96451fd28370ef5f5e3924d10de8

Apache Spark 2.3 introduced `native` ORC supports with vectorization and many fixes. However, it's shipped as a not-default option. This PR enables `native` ORC implementation and predicate-pushdown by default for Apache Spark 2.4. We will improve and stabilize ORC data source before Apache Spark 2.4. And, eventually, Apache Spark will drop old Hive-based ORC code. Pass the Jenkins with existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes apache#20634 from dongjoon-hyun/SPARK-23456.

pan3793 · 2023-08-04T08:23:01Z

... eventually, Apache Spark will drop old Hive-based ORC code.

@dongjoon-hyun Do you think it's a good time to drop the implementation spark.sql.orc.impl=hive in Spark 4.0? If you don't object, I will open a JIRA to track it.

dongjoon-hyun · 2023-08-04T14:52:49Z

Yes, it's a good chance. BTW, IIRC, there was a different at Hive ORC CHAR implementation before. So, we couldn't remove it for backward-compatibility issues. Since Spark implements many CHAR features, we need to re-verify that native implementation has all legacy Hive-based ORC features, @pan3793 .

pan3793 · 2023-08-04T15:22:00Z

@dongjoon-hyun thanks for your response and detailed comment. SPARK-44677 is created to track the dropping work, will start to take a look after 3.5.0 released

dongjoon-hyun · 2023-08-04T15:22:49Z

Thank you, @pan3793 !

[SPARK-23456][SPARK-21783] Turn on native ORC impl and PPD by default

bde6818

gatorsmile reviewed Feb 20, 2018

View reviewed changes

asfgit closed this in 83c0087 Feb 20, 2018

dongjoon-hyun deleted the SPARK-23456 branch February 20, 2018 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-23456][SPARK-21783] Turn on `native` ORC impl and PPD by default #20634

[SPARK-23456][SPARK-21783] Turn on `native` ORC impl and PPD by default #20634

Uh oh!

dongjoon-hyun commented Feb 17, 2018

Uh oh!

SparkQA commented Feb 17, 2018

Uh oh!

gatorsmile left a comment

Uh oh!

gatorsmile commented Feb 20, 2018

Uh oh!

dongjoon-hyun commented Feb 20, 2018

Uh oh!

pan3793 commented Aug 4, 2023 •

edited

Loading

Uh oh!

dongjoon-hyun commented Aug 4, 2023 •

edited

Loading

Uh oh!

pan3793 commented Aug 4, 2023

Uh oh!

dongjoon-hyun commented Aug 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-23456][SPARK-21783] Turn on native ORC impl and PPD by default #20634

[SPARK-23456][SPARK-21783] Turn on native ORC impl and PPD by default #20634

Uh oh!

Conversation

dongjoon-hyun commented Feb 17, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 17, 2018

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Feb 20, 2018

Uh oh!

dongjoon-hyun commented Feb 20, 2018

Uh oh!

pan3793 commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pan3793 commented Aug 4, 2023

Uh oh!

dongjoon-hyun commented Aug 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-23456][SPARK-21783] Turn on `native` ORC impl and PPD by default #20634

[SPARK-23456][SPARK-21783] Turn on `native` ORC impl and PPD by default #20634

pan3793 commented Aug 4, 2023 •

edited

Loading

dongjoon-hyun commented Aug 4, 2023 •

edited

Loading