ORC-669. Reduce breaking changes in ReaderImpl.java #547

dongjoon-hyun · 2020-10-01T06:06:56Z

What changes were proposed in this pull request?

Although this is Implementation class, this PR aims to reduce two breaking changes in ReaderImpl.java due to ORC-520 at Apache ORC 1.6.0 ~ 1.6.5.

Why are the changes needed?

This helps Apache Hive and Spark works with Apache ORC 1.6.x by removing the following breaking changes with a minor and safe revision.

[info] org.apache.spark.sql.hive.orc.OrcHadoopFsRelationSuite *** ABORTED ***
[info]   java.lang.NoSuchFieldError: types
[info]   at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:64)

[info] org.apache.spark.sql.hive.orc.HiveOrcHadoopFsRelationSuite *** ABORTED ***
[info]   java.lang.IllegalAccessError: tried to access field org.apache.orc.impl.ReaderImpl.compressionKind
from class org.apache.hadoop.hive.ql.io.orc.ReaderImpl

How was this patch tested?

Manually review because this adds back a field and changes visibility of the field like ORC 1.5.x,

dongjoon-hyun · 2020-10-01T06:10:12Z

Could you review this, @omalley ?

dongjoon-hyun · 2020-10-01T18:44:34Z

@omalley and @alanfgates .

I'm trying to make Apache Spark 3.1 (scheduled on December 2020) to use Apache ORC 1.6.6. We need at least

ORC-669. Reduce breaking changes in ReaderImpl.java (this PR)
[SPARK-33047][BUILD] Upgrade hive-storage-api to 2.7.2

The other stuffs I'm looking at are the followings

The ORC's case-insensitive predicate handling change (1.6.x returns more rows compared with 1.5.x). Although Spark can filter this from Spark side, this may cause a performance regression.
OrcTail API change.

This will help Apache Hive eventually too.

omalley · 2020-10-02T15:40:10Z

Is Spark using the details of ReaderImpl?

dongjoon-hyun · 2020-10-02T15:47:14Z

Thank you for merging, @omalley . The above two are not used Spark sql module, but are used in Hive library in Spark's hive module.

dongjoon-hyun · 2020-10-02T15:47:54Z

If you don't mind, please land this to branch-1.6, too.

Signed-off-by: Owen O'Malley <omalley@apache.org>

omalley · 2020-10-02T17:37:30Z

done

dongjoon-hyun · 2020-10-02T18:04:08Z

Thank you so much!

ORC-669. Reduce breaking changes in ReaderImpl.java

bdbaba7

omalley merged commit c9bcc7a into apache:master Oct 2, 2020

dongjoon-hyun deleted the ORC-669 branch October 2, 2020 15:47

omalley pushed a commit that referenced this pull request Oct 2, 2020

ORC-669. Reduce breaking changes in ReaderImpl.java (#547)

03e5415

Signed-off-by: Owen O'Malley <omalley@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORC-669. Reduce breaking changes in ReaderImpl.java #547

ORC-669. Reduce breaking changes in ReaderImpl.java #547

dongjoon-hyun commented Oct 1, 2020 •

edited

dongjoon-hyun commented Oct 1, 2020

dongjoon-hyun commented Oct 1, 2020 •

edited

omalley commented Oct 2, 2020

dongjoon-hyun commented Oct 2, 2020

dongjoon-hyun commented Oct 2, 2020

omalley commented Oct 2, 2020

dongjoon-hyun commented Oct 2, 2020

ORC-669. Reduce breaking changes in ReaderImpl.java #547

ORC-669. Reduce breaking changes in ReaderImpl.java #547

Conversation

dongjoon-hyun commented Oct 1, 2020 • edited

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

dongjoon-hyun commented Oct 1, 2020

dongjoon-hyun commented Oct 1, 2020 • edited

omalley commented Oct 2, 2020

dongjoon-hyun commented Oct 2, 2020

dongjoon-hyun commented Oct 2, 2020

omalley commented Oct 2, 2020

dongjoon-hyun commented Oct 2, 2020

dongjoon-hyun commented Oct 1, 2020 •

edited

dongjoon-hyun commented Oct 1, 2020 •

edited