Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-11055] Update log4j to version 2.14.1 #13073

Merged
merged 1 commit into from Apr 28, 2021

Conversation

iemejia
Copy link
Member

@iemejia iemejia commented Oct 12, 2020

Copy link
Contributor

@aromanenko-dev aromanenko-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, take a look on failed tests. Some of them (at least Java PreCommit) were failed because of this change.

@iemejia
Copy link
Member Author

iemejia commented Oct 12, 2020

I missed some places so it became a bit unstable, let's see if it is ok now.

@aromanenko-dev
Copy link
Contributor

Do we really need to have so many places where the same version is defined? Couldn't be better to extract this into BeamModulePlugin.groovy since it's shared across different modules?

@iemejia
Copy link
Member Author

iemejia commented Oct 12, 2020

Do we really need to have so many places where the same version is defined? Couldn't be better to extract this into BeamModulePlugin.groovy since it's shared across different modules?

I hesitated to do this because the references defined in BeamModulePlugin.groovy tend to be used as a reference of libraries Beam devs 'can' use and I did not want to promote the use of log4j which is not Beam's default logging library but mostly a provided lifetime one. If you prefer I can put it there.

@aromanenko-dev
Copy link
Contributor

@iemejia Yes, this is not our default logging library but since it's a library that is used in different modules (and seems the same version everywhere), then I'd prefer to extract it as we do for other libraries as well. In any case, the version can be overridden in case if module will need another version.

@@ -23,53 +23,54 @@ package org.apache.beam.gradle
*/
class GrpcVendoring_1_26_0 {

static def guava_version = "26.0-jre"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one and the next one are only sorting stuff to make it more easy to find the refs.

static def jboss_marshalling_version = "1.4.11.Final"
static def jboss_modules_version = "1.1.0.Beta1"
static def jzlib_version = "1.1.3"
static def log4j_version = "2.13.3"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that vendored grpc dependencies can have different versions so they don't depend on Beam main module file, so versions are defined independently.

@iemejia
Copy link
Member Author

iemejia commented Oct 13, 2020

Done suggested changes can you PTAL again @aromanenko-dev

@aromanenko-dev
Copy link
Contributor

aromanenko-dev commented Oct 14, 2020

@iemejia Thanks! Could you take a look on failed tests? Looks like that some of them are caused by this change.

@iemejia
Copy link
Member Author

iemejia commented Oct 14, 2020

Arrghh I executed by mistake the hcatalog tests locally instead of hadoop-format ones. Let's hope it is ok this time 🤞

@aromanenko-dev
Copy link
Contributor

retest this please

@aromanenko-dev
Copy link
Contributor

Unfortunately, there are still failing Java tests, not sure that Python one is related.

@iemejia
Copy link
Member Author

iemejia commented Nov 4, 2020

This PR is breaking on the HCatalog module with a NoClassDefFoundError and that does not make any sense to me.
In the PR I am not even touching that module and as far as I see none of its dependencies, so I am wondering if this is some sort of gradle craziness.

I tried to run Google's Linkage Checker to find the issue but it seems also that the tool configuration is broken on Beam current master (double checked by Alexey) when used with the hcatalog module. @suztomo sorry to bother you, but I am absolutely puzzled about the issue here and I was wondering if you could have a hint on what is going on.

Also I suppose you could help us check what is going on with the linkage check since you mentioned some months ago it was working now on the HCatalog module. Notice that Beam recently upgraded to gradle 6.6 so maybe something was broken as part of that.

@suztomo
Copy link
Contributor

suztomo commented Nov 4, 2020

Interesting. Let me see.

org.apache.logging.log4j.util.ReflectionUtil is missing when running the test.

java.lang.NoClassDefFoundError: org/apache/logging/log4j/util/ReflectionUtil
	at org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:42)
	at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:48)
	at org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:363)
	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:388)
	at org.apache.hive.hcatalog.common.HCatUtil.<clinit>(HCatUtil.java:83)
	at org.apache.beam.sdk.io.hcatalog.test.EmbeddedMetastoreService.<init>(EmbeddedMetastoreService.java:50)
	at org.apache.beam.sdk.io.hcatalog.HCatalogBeamSchemaTest.setupEmbeddedMetastoreService(HCatalogBeamSchemaTest.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/14470/testReport/junit/org.apache.beam.sdk.io.hcatalog/HCatalogBeamSchemaTest/classMethod/

Reproduce Problem

The command below reproduces the problem. It's strange that Gradle marks it successful:

./gradlew -p ./sdks/java/io/hcatalog test
...
> Task :sdks:java:io:hcatalog:test

org.apache.beam.sdk.io.hcatalog.HCatalogIOTest > classMethod FAILED
    java.lang.NoClassDefFoundError at HCatalogIOTest.java:88
        Caused by: java.lang.ClassNotFoundException at HCatalogIOTest.java:88

org.apache.beam.sdk.io.hcatalog.HCatalogBeamSchemaTest > classMethod FAILED
    java.lang.NoClassDefFoundError at HCatalogBeamSchemaTest.java:52

org.apache.beam.sdk.io.hcatalog.SchemaUtilsTest > testParameterizedTypesToBeamTypes FAILED
    java.lang.NoClassDefFoundError at SchemaUtilsTest.java:43

3 tests completed, 3 failed
There were failing tests. See the report at: file:///usr/local/google/home/suztomo/beam/sdks/java/io/hcatalog/build/reports/tests/test/index.html

Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.6.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 2m 52s
49 actionable tasks: 49 executed
suztomo@suztomo:~/beam$ 

Found ignoreFailures in the build.gradle:

test {
  // TODO: Get tests to run. Known issues:
  //  * calcite-avatica bundles w/o repackaging Jackson (CALCITE-1110)
  //  * hive-exec bundles w/o repackaging Guava (HIVE-13690)
  ignoreFailures true
}

Turning it off.

TestRuntimeClasspath

https://gist.github.com/suztomo/63d6104d054d80370cd9e34f64e4d13e

As per IntelliJ, org.apache.logging.log4j:log4j-core:2.13.3 holds org.apache.logging.log4j.util.ReflectionUtil (This was wrong! org.apache.logging.log4j.core.util.ReflectionUtil). The artifact appears in the dependency graph:

+--- org.apache.hive.hcatalog:hive-hcatalog-core:2.1.0
|    +--- org.apache.hive:hive-cli:2.1.0
|    |    +--- org.apache.hive:hive-common:2.1.0
|    |    |    +--- org.apache.hive:hive-shims:2.1.0
...
|    |    |    +--- org.apache.logging.log4j:log4j-1.2-api:2.4.1
|    |    |    |    +--- org.apache.logging.log4j:log4j-api:2.4.1 -> 2.13.3
|    |    |    |    \--- org.apache.logging.log4j:log4j-core:2.4.1 -> 2.13.3
|    |    |    |         \--- org.apache.logging.log4j:log4j-api:2.13.3
|    |    |    +--- org.apache.logging.log4j:log4j-web:2.4.1
|    |    |    |    +--- org.apache.logging.log4j:log4j-api:2.4.1 -> 2.13.3
|    |    |    |    \--- org.apache.logging.log4j:log4j-core:2.4.1 -> 2.13.3 (*)

I don't know why Gradle does not supply the missing class from org.apache.logging.log4j:log4j-core:2.13.3 when it runs the test.

@iemejia
Copy link
Member Author

iemejia commented Nov 4, 2020

Thanks for taking a look @suztomo. I had not seen that ignoreFailures part, weird.
Gradle behavior is so weird, show the dependency but still ignoring it.
I think the person who knows Beam's build best is @lukecwik, maybe he understands what is going on here.

@suztomo
Copy link
Contributor

suztomo commented Nov 5, 2020

With some debug output, I see the Gradle supplies the ReflectionUtil class when the class runs.

suztomo@8138139#r43916554

I'm now feeling that the EmbeddedMetastoreService runs with a special class path.

Edit: My intelliJ is complaining it too.

Screen Shot 2020-11-05 at 16 28 13

Now I know that this case involves two fully-qualified class names:

  • org.apache.logging.log4j.util.ReflectionUtil (the missing class)
  • org.apache.logging.log4j.core.util.ReflectionUtil (available in org.apache.logging.log4j:log4j-core:2.13.3`)

@iemejia
Copy link
Member Author

iemejia commented Nov 6, 2020

Argh! I was afraid of that too, It clearly looks like a different classpath issue, that will explain why it breaks so weirdly.

I think the simplest solution would be to let that module to provide its own log4j. The part that still does not make any sense to me is how the hcatalog module in Beam ends up resolving the new version of log4j? Can you spot which dependency is overwriting the version ? From a look at the module build.gradle file I don't see any of the Beam dependencies of hcatalog requiring log4j. I have checked in the gradle scan too and I just cannot see who is doing it

@suztomo
Copy link
Contributor

suztomo commented Nov 6, 2020

@iemejia

Can you spot which dependency is overwriting the version ?

BeamModulePlugin.groovy forces the library version listed in the file.

    def log4j_version = "2.13.3"
...
        log4j_core                                  : "org.apache.logging.log4j:log4j-core:$log4j_version",
...
          config.resolutionStrategy {
...
            def librariesWithVersion = project.library.java.values().findAll { it.split(':').size() > 2 }
            force librariesWithVersion
          }

log4j-api-2.4.1.jar had the missing class, while log4j-api-2.13.3.jar does not have it.

suztomo-macbookpro44% jar tf log4j-api-2.4.1.jar |grep Reflection
org/apache/logging/log4j/util/ReflectionUtil$PrivateSecurityManager.class
org/apache/logging/log4j/util/ReflectionUtil.class
suztomo-macbookpro44% jar tf log4j-api-2.13.3.jar |grep Reflection
suztomo-macbookpro44% 

It seems that org.slf4j:slf4j-log4j12:1.7.30 has org.apache.logging.slf4j.Log4jLoggerFactory, the caller of the missing class. This does not seem compatible with org.apache.logging.log4j:log4j-api:2.13.3.

@iemejia
Copy link
Member Author

iemejia commented Nov 10, 2020

@suztomo thanks for the pointer I was not aware we were forcing dependencies, but does forcing all dependencies make even sense? The exclusion I am doing at the hcatalog module in this PR feels hacky specially because log4j is a dependency none of the Beam modules require it is only required by transitive dependencies.

I saw your work on BEAM-9542 #11168 but it seems that the change to support the GCP BoM undid it, was this intentional?

@iemejia iemejia changed the title [BEAM-11055] Update log4j to version 2.13.3 [BEAM-11055] Update log4j to version 2.14.0 Nov 12, 2020
@suztomo
Copy link
Contributor

suztomo commented Nov 12, 2020

My attempt #11168 was not merged. (As per my comment there, I had to upgrade google-api-client at that time and didn't get a chance to come back.)

@iemejia
Copy link
Member Author

iemejia commented Nov 14, 2020

I see, good to know there is awareness on the issue, thanks @suztomo.

I wonder now if my forcing fix makes even sense since this PR has two goals (1) upgrade the dependency and (2) silence automatic security detectors from reporting this dependency as a security issue which clearly won't be the case at least for the hcatalog module if I force the previous version even if this is not at all a Beam problem.

WDYT @aromanenko-dev shall we upgrade as it is to cover (1) for most of the other modules knowing that (2) would still be an issue for HCatalog?

@iemejia iemejia changed the title [BEAM-11055] Update log4j to version 2.14.0 [BEAM-11055] Update log4j to version 2.14.1 Apr 27, 2021
@codecov
Copy link

codecov bot commented Apr 27, 2021

Codecov Report

Merging #13073 (1958090) into master (a25261a) will decrease coverage by 0.00%.
The diff coverage is n/a.

❗ Current head 1958090 differs from pull request most recent head 8a24eb8. Consider uploading reports for the commit 8a24eb8 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #13073      +/-   ##
==========================================
- Coverage   83.64%   83.64%   -0.01%     
==========================================
  Files         443      886     +443     
  Lines       59032   118046   +59014     
==========================================
+ Hits        49376    98735   +49359     
- Misses       9656    19311    +9655     
Impacted Files Coverage Δ
...g/benchmarks/nexmark/queries/nexmark_query_util.py
...rcs/sdks/python/apache_beam/io/aws/s3filesystem.py
...m/examples/snippets/transforms/elementwise/keys.py
...ache_beam/runners/interactive/recording_manager.py
...am/examples/snippets/transforms/aggregation/top.py
...srcs/sdks/python/apache_beam/typehints/__init__.py
...apache_beam/runners/dataflow/internal/apiclient.py
...hon/apache_beam/examples/wordcount_with_metrics.py
.../srcs/sdks/python/apache_beam/coders/coder_impl.py
...ache_beam/runners/interactive/caching/cacheable.py
... and 1319 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a25261a...8a24eb8. Read the comment docs.

@iemejia iemejia force-pushed the BEAM-11055-log4j-api branch 2 times, most recently from 1958090 to 5f82918 Compare April 27, 2021 16:57
@@ -41,6 +41,17 @@ test {
ignoreFailures true
}

configurations.testRuntimeClasspath {
resolutionStrategy {
def log4j_version = "2.4.1"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I had to redefine this here to make it work, bad to have this repeated but it is the only workaround I could find.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 2.14.1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arghh what a silly mistake, my excuses I opened #14751 to upgrade this dep to the minimum compatible version with Hive dependencies, wondering if we should also cherry-pick it for the release. Thanks for noticing @aaltay !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a pity we cannot move to the latest log4j version because of Hive's HCatalog dependency constraints. The bad part is that this will still trigger security analysis tools warnings, but I have no idea of how to workaround this apart of doing a full upgrade of Hive to the latest but that does not seem to be so simple either, for ref BEAM-9351.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @iemejia. I do not have an opinion about cherry picking to the release or not. If it helps, I am not aware of any user issues or bug reports. I noticed this coincidentally.

@aromanenko-dev
Copy link
Contributor

Run Java PostCommit

Copy link
Contributor

@aromanenko-dev aromanenko-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@aromanenko-dev aromanenko-dev merged commit cbc67c8 into apache:master Apr 28, 2021
@iemejia iemejia deleted the BEAM-11055-log4j-api branch April 28, 2021 10:10
force "org.apache.logging.log4j:log4j-api:$log4j_version"
force "org.apache.logging.log4j:log4j-core:$log4j_version"
force library.java.log4j_api
force library.java.log4j_core
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is no longer necessary with the change to library in BeamModulePlugin.groovy. applyJavaNature does this for all dependencies in library:

if (config.getName() != "errorprone" && !inDependencyUpdates) {
config.resolutionStrategy {
// Filtering versionless coordinates that depend on BOM. Beam project needs to set the
// versions for only handful libraries when building the project (BEAM-9542).
def librariesWithVersion = project.library.java.values().findAll { it.split(':').size() > 2 }
force librariesWithVersion
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants