Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18197. Upgrade protobuf to 3.21.7 #19

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Jun 8, 2022

This patch bumps up the protobuf version so that Hadoop
is not a vulnerable to CVE-2021-22569.

This also fixes up the parent POM references in the child modules
as IntelliJ requires a full path.

Testing: needs to go through hadoop built with the updated jar and
with its own protobuf version marker updated.
Verified hadoop compiles on a macbook m1.

@steveloughran steveloughran changed the title HADOOP-18197. Upgrade protobuf to 3.20.1 HADOOP-18197. Upgrade protobuf to 3.21.1 Jun 8, 2022
@ayushtkn
Copy link
Member

ayushtkn commented Jun 8, 2022

That said, I don't see why the version number needed to be
included there. We will have to live with that.

That wasn't something we wanted to do that initially, that came as a suggestion in the ML thread.
https://lists.apache.org/thread/v7cqm2bwvrlyhmdl2xo9pg84rvb6t214

Guess as per the ML suggestion, it was to be done for Guava also, but the developer forgot about this, so we have to live without that in guava

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the directory hadoop-shaded-protobuf_3_7 to hadoop-shaded-protobuf_3?


Oh I am sorry. Didn't look at the latest comments before I posted the comments.

@steveloughran
Copy link
Contributor Author

aah, i see the discussion. ok. that complicates life even more. really not sure what to do here.

if we were exporting a module for other to use, that version in module names makes sense. if this is for internal use only then not giving it a name works better.

what to do here?

  1. rename the module and pom artifacts and then have hadoop versions import the protobuf_3_21 module
  2. keep both side by side

if the repackaging retains the names of the paths then after adding a new module with the new version, new compilations will link with the new lib, but old stuff will still work

@ayushtkn
Copy link
Member

ayushtkn commented Jun 9, 2022

Hbase does shade protobuf and doesn't suffix the version I suppose:
https://github.com/apache/hbase-thirdparty/blob/master/hbase-shaded-protobuf/pom.xml#L25

We are using this internally only I guess, the version will be tied with hadoop-thirdparty release version. So, If I have to choose one option, I would choose to keep the name same.

If we choose to keep changing the name for some reason, we should start that with guava as well at least for future releases.

@steveloughran steveloughran marked this pull request as draft June 14, 2022 09:08
@steveloughran
Copy link
Contributor Author

thinking of doing it differently

  • add the 3.21 stuff parallel to the 3.7
  • all our own code switches to the new shaded release
  • delete the old version

this will break anything linked to the old one. I'd thought about leaving it there, but then thought about how you would get a maven build to do that and concluded that "it would get so complex, so fast, it's only justifiable if we know external code uses it. Or that people may want to drop this jar in in place of the previous one?

This patch bumps up the protobuf version so that Hadoop
is not a vulnerable to CVE-2021-22569.

I'm not renaming the module hadoop-shaded-protobuf_3_7
because that significantly complicates imports/upgrading.
That said, I don't see why the version number needed to be
included there. We will have to live with that.

This also fixes up the parent POM references in the child modules
as IntelliJ requires a full path.

Testing: needs to go through hadoop built with the updated jar and
with its own protobuf version marker updated.
Verified hadoop compiles on a macbook m1.
@steveloughran steveloughran force-pushed the feature/HADOOP-18197-protobuf-upgrade branch from 1330c3a to 58fc777 Compare October 10, 2022 13:38
@steveloughran steveloughran force-pushed the feature/HADOOP-18197-protobuf-upgrade branch from 58fc777 to 77339d6 Compare October 10, 2022 15:01
@steveloughran steveloughran marked this pull request as ready for review October 10, 2022 15:12
@steveloughran
Copy link
Contributor Author

the latest iteration imports 3.21.7 sets the protobuf module name to be hadoop-shaded-protobuf_3_21-1.2.0-SNAPSHOT.jar

it cuts out the older 3_7 release entirely.

we could restore that and so do a thirdparty release which published to mvn both artifacts, allowing older builds to still use the latest release.

my PR on hadoop updates for the newer JAR, only

@steveloughran steveloughran changed the title HADOOP-18197. Upgrade protobuf to 3.21.1 HADOOP-18197. Upgrade protobuf to 3.21.7 Oct 11, 2022
@steveloughran
Copy link
Contributor Author

add for @pjfanning to get involved; whole question about whether to use a version number in #21. applies.

  • version in jar name is a PITA when upgrading/re-releasing, as every pom which imports the artifact needs to be patched.
  • but it does make it easier to see what version of parquet/avro is shipped.

i think for hadoop we will have to use a maven property to define the version of the protobuf lib to use...not sure what this does to ide imports, though spark seems to handle this

@tooptoop4
Copy link

where can i grab a copy of this 1.2.0-SNAPSHOT jar? assuming PR test results go somewhere

@steveloughran
Copy link
Contributor Author

@tooptoop4
you need to check out this pr, build locally

I am not doing any active work on this...if you want to take up then everyone will be grateful

@pjfanning
Copy link
Contributor

@steveloughran I've just built a hadoop thirdparty protobuf java based on this PR - with the git conflict resolved and using v3.21.12 instead. I was thinking that if we're going to use v3.21 that we might as well take the latest patch version of that line.

I'm currently building hadoop trunk locally with this jar and all seems ok (so far, at least).

Is there a reason not to merge this or an equivalent PR that I can do if necessary?

I can see that the jar name is a problem especially if we wanted to backport to hadoop 3.3 - but we could start by just upgrading on hadoop trunk and target this only to the hadoop 3.4.0 release.

@steveloughran
Copy link
Contributor Author

I'm on vacation until 2024, so not really commenting.

It might be good if we have the option of updating branch 3.3 in future – it will stop people who haven't upgraded complaining so much

@steveloughran
Copy link
Contributor Author

@pjfanning if you can do a PR with the fixes, I'll review.

@pjfanning
Copy link
Contributor

@steveloughran I created #26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants