Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-1540] Add commons-codec to spark and utilities bundle jars #2316

Closed
wants to merge 1 commit into from
Closed

[HUDI-1540] Add commons-codec to spark and utilities bundle jars #2316

wants to merge 1 commit into from

Conversation

sbernauer
Copy link
Contributor

@sbernauer sbernauer commented Dec 9, 2020

What is the purpose of the pull request

Fixes #2239 NoClassDefFoundError: org/apache/hudi/org/apache/commons/codec/binary/Base64

Brief change log

Included commons-codec:commons-codec in the spark and utilities bundle jars

Verify this pull request

This pull request is a trivial rework / code cleanup without any test coverage.

Committer checklist

  • Has a corresponding JIRA in PR title & commit -> NOTE: Doesnt exist

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@codecov-io
Copy link

codecov-io commented Dec 9, 2020

Codecov Report

Merging #2316 (ce5948d) into master (3a91d26) will decrease coverage by 43.80%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #2316       +/-   ##
============================================
- Coverage     53.49%   9.68%   -43.81%     
+ Complexity     2788      48     -2740     
============================================
  Files           355      53      -302     
  Lines         16169    1930    -14239     
  Branches       1650     230     -1420     
============================================
- Hits           8649     187     -8462     
+ Misses         6819    1730     -5089     
+ Partials        701      13      -688     
Flag Coverage Δ Complexity Δ
hudicli ? ?
hudiclient ? ?
hudicommon ? ?
hudihadoopmr ? ?
hudispark ? ?
huditimelineservice ? ?
hudiutilities 9.68% <ø> (-60.42%) 0.00 <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...va/org/apache/hudi/utilities/IdentitySplitter.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-2.00%)
...va/org/apache/hudi/utilities/schema/SchemaSet.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-3.00%)
...a/org/apache/hudi/utilities/sources/RowSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-4.00%)
.../org/apache/hudi/utilities/sources/AvroSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-1.00%)
.../org/apache/hudi/utilities/sources/JsonSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-1.00%)
...rg/apache/hudi/utilities/sources/CsvDFSSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-10.00%)
...g/apache/hudi/utilities/sources/JsonDFSSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-4.00%)
...apache/hudi/utilities/sources/JsonKafkaSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-6.00%)
...pache/hudi/utilities/sources/ParquetDFSSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-5.00%)
...lities/schema/SchemaProviderWithPostProcessor.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-4.00%)
... and 334 more

@vinothchandar vinothchandar self-assigned this Dec 11, 2020
@vinothchandar
Copy link
Member

@sbernauer so we don't shade commons-codec in these bundles, with the intention of using it from the spark installation.

I am going to try and trace why it's looking for the shaded class name

@sbernauer
Copy link
Contributor Author

H @vinothchandar are you looking for the relocation here https://github.com/apache/hudi/blob/master/packaging/hudi-utilities-bundle/pom.xml#L170?

<relocation>
  <pattern>org.apache.commons.codec.</pattern>
  <shadedPattern>org.apache.hudi.org.apache.commons.codec.</shadedPattern>
</relocation>

@vinothchandar
Copy link
Member

@sbernauer rekicked the tests.

yes, I was trying to understand whether that's a left over relocation. Seems like we used to include commons-codec back in the day, (link), moved away from this later

@vinothchandar
Copy link
Member

@sbernauer wanted to clarify if you see this on 0.5.3 or only on master? Coz 0.5.3 has been stable for a while, used with 2.4.0 spark as well. Integ test environment for e.g uses Spark 2.4 with the same bundles.

I am wondering if we can instead remove the relocation from the bundle pom? it should be working with the existing jars in the spark install.

@sbernauer
Copy link
Contributor Author

Im using >= 0.6.0 from master branch and Spark 3.0.1
I'm sorry I can't downgrade to spark 2.4
But I will try removing the relocation

@nsivabalan
Copy link
Contributor

@vinothchandar : is this a release blocker?

@nsivabalan
Copy link
Contributor

@sbernauer : looks like spark-bundle pom was missing the relocation for shade. Can you try out the fix and let us know if it works.

@nsivabalan
Copy link
Contributor

@sbernauer : looks like spark-bundle pom was missing the relocation for shade. I have updated the patch. Can you try out the fix and let us know if it works.

@nsivabalan
Copy link
Contributor

Or if can you give me steps to repro. what commands you ran w/ spark shell with the spark-bundle jar. I can give it a try.

@nsivabalan nsivabalan changed the title Add commons-codec to spark and utilities bundle jars [HUDI-1540] Add commons-codec to spark and utilities bundle jars Jan 20, 2021
@nsivabalan
Copy link
Contributor

nsivabalan commented Jan 28, 2021

@sbernauer : I landed another patch on this regard. can you please check the latest release and let us know if the the issue still persist ? If not, we can close this out.

@sbernauer
Copy link
Contributor Author

Hi @nsivabalan sorry for missing the notification and thanks for your work! I get the java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/commons/codec/binary/Base64 when using an Hive Server with Thrift over HTTP.
hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hive.{{ .Values.namespace | required "Value namespace is required" }}.svc.cluster.local:10001/default;transportMode=http;ssl=false;httpPath=cliservice;user=hudi;password=notneeded
Im updated to hudi-utilities-bundle_2.12-0.7.0.jar and hudi-spark-bundle_2.12-0.7.0.jar. Im getting the NoClassDefFoundError. My original provided patch with the <include>commons-codec:commons-codec</include> in the packaging/hudi-spark-bundle/pom.xml and packaging/hudi-utilities-bundle/pom.xml Tag did indeed work.

@nsivabalan
Copy link
Contributor

Don't have permission to update this patch. hence have put up another patch.
#2562
@sbernauer : my bad. your original patch was the right one. sorry to have delayed this long.

@sbernauer
Copy link
Contributor Author

No problem and thanks @nsivabalan for creating the PR!

@sbernauer sbernauer closed this Feb 10, 2021
@nsivabalan
Copy link
Contributor

@sbernauer : we have closed out this #2562 . Would appreciate if you can verify if the fix works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SUPPORT] NoClassDefFoundError: org/apache/hudi/org/apache/commons/codec/binary/Base64
4 participants