Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40323][BUILD] Update ORC to 1.8.0 #37787

Closed
wants to merge 1 commit into from
Closed

[SPARK-40323][BUILD] Update ORC to 1.8.0 #37787

wants to merge 1 commit into from

Conversation

williamhyun
Copy link
Member

@williamhyun williamhyun commented Sep 4, 2022

What changes were proposed in this pull request?

This PR aims to update ORC to 1.8.0.

Why are the changes needed?

This will bring the latest changes and bug fixes.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @williamhyun .
Merged to master for Apache Spark 3.4.0.

@pan3793
Copy link
Member

pan3793 commented Sep 23, 2022

After this upgrade, do we still need protobuf 2.5.0?

@dongjoon-hyun
Copy link
Member

After this upgrade, do we still need protobuf 2.5.0?

May I ask who is we? For native ORC reader, it's not required in Apache Spark. And, this is a part of Apache Spark community effort which is trying to reduce the dependency of protobuf 2.5.0. However, removing protobuf 2.5.0 from Apache Spark distribution is a different issue.

@pan3793
Copy link
Member

pan3793 commented Sep 23, 2022

May I ask who is we?

I mean the Spark project/community.

And, this is a part of Apache Spark community effort which is trying to reduce the dependency of protobuf 2.5.0.

That's why I ask it in the Spark community.

However, removing protobuf 2.5.0 from Apache Spark distribution is a different issue.

I agree, but it's related to this issue, because I see "ORC-1212 Upgrade protobuf-java to 3.17.3" in the ORC 1.8.0 release note. As far as I know, Spark requires protobuf-java 2.5.0 for Hadoop2 and ORC 1.7 and earlier versions, as Spark already switched to Hadoop Shaded client for Hadoop3, I think maybe Spark can remove protobuf 2.5.0 from the binary release(only for hadoop3 profile).

@dongjoon-hyun
Copy link
Member

Yes, of course, it's technically feasible to try in Hadoop 3 distribution and we will eventually at some releases, @pan3793 .

@pan3793
Copy link
Member

pan3793 commented Feb 9, 2023

This upgrade brings threeten-extra:1.7.1, which has a known issue with JDK 8 bytecode compatibility, see ThreeTen/threeten-extra#226

As a downstream project of Spark, Kyuubi found the same issue https://github.com/apache/kyuubi/actions/runs/4130823342/jobs/7137880315, but I'm not sure if it's a real issue or just does not matter.

I suppose it's not a big deal since all Spark ORC tests passed w/ JDK8.

@pan3793
Copy link
Member

pan3793 commented Feb 9, 2023

I checked the files inside threeten-extra-1.7.1.jar, and only module-info.class has the issue, other classes can be loaded by JDK8 correctly. Thus IMO it's not a big problem.

➜  tmp javap module-info.class
Error: error while reading constant pool for module-info.class: unexpected tag at #5: 19

@dongjoon-hyun
Copy link
Member

Thank you for checking, @pan3793 . You are welcome to make an Apache ORC JIRA or PR to upgrade threeten-extra if needed. I can merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants