Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-669. Reduce breaking changes in ReaderImpl.java #547

Merged
merged 1 commit into from Oct 2, 2020
Merged

ORC-669. Reduce breaking changes in ReaderImpl.java #547

merged 1 commit into from Oct 2, 2020

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 1, 2020

What changes were proposed in this pull request?

Although this is Implementation class, this PR aims to reduce two breaking changes in ReaderImpl.java due to ORC-520 at Apache ORC 1.6.0 ~ 1.6.5.

Why are the changes needed?

This helps Apache Hive and Spark works with Apache ORC 1.6.x by removing the following breaking changes with a minor and safe revision.

[info] org.apache.spark.sql.hive.orc.OrcHadoopFsRelationSuite *** ABORTED ***
[info]   java.lang.NoSuchFieldError: types
[info]   at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:64)

[info] org.apache.spark.sql.hive.orc.HiveOrcHadoopFsRelationSuite *** ABORTED ***
[info]   java.lang.IllegalAccessError: tried to access field org.apache.orc.impl.ReaderImpl.compressionKind
from class org.apache.hadoop.hive.ql.io.orc.ReaderImpl

How was this patch tested?

Manually review because this adds back a field and changes visibility of the field like ORC 1.5.x,

@dongjoon-hyun
Copy link
Member Author

Could you review this, @omalley ?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Oct 1, 2020

@omalley and @alanfgates .

I'm trying to make Apache Spark 3.1 (scheduled on December 2020) to use Apache ORC 1.6.6. We need at least

The other stuffs I'm looking at are the followings

  • The ORC's case-insensitive predicate handling change (1.6.x returns more rows compared with 1.5.x). Although Spark can filter this from Spark side, this may cause a performance regression.
  • OrcTail API change.

This will help Apache Hive eventually too.

@omalley
Copy link
Contributor

omalley commented Oct 2, 2020

Is Spark using the details of ReaderImpl?

@omalley omalley merged commit c9bcc7a into apache:master Oct 2, 2020
@dongjoon-hyun
Copy link
Member Author

Thank you for merging, @omalley . The above two are not used Spark sql module, but are used in Hive library in Spark's hive module.

@dongjoon-hyun dongjoon-hyun deleted the ORC-669 branch October 2, 2020 15:47
@dongjoon-hyun
Copy link
Member Author

If you don't mind, please land this to branch-1.6, too.

omalley pushed a commit that referenced this pull request Oct 2, 2020
Signed-off-by: Owen O'Malley <omalley@apache.org>
@omalley
Copy link
Contributor

omalley commented Oct 2, 2020

done

@dongjoon-hyun
Copy link
Member Author

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants