Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38516][BUILD] Add log4j-core and log4j-api to classpath if active hadoop-provided #35811

Closed
wants to merge 4 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Mar 11, 2022

What changes were proposed in this pull request?

Add log4j-core and log4j-api to classpath if active hadoop-provided.

Why are the changes needed?

log4j-core is needed:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/core/Filter
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
	at java.lang.Class.getMethod0(Class.java:3018)
	at java.lang.Class.getMethod(Class.java:1784)
	at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j.core.Filter
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more

log4j-api is needed:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/LogManager
	at org.apache.spark.deploy.yarn.SparkRackResolver.<init>(SparkRackResolver.scala:42)
	at org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114)
	at org.apache.spark.scheduler.cluster.YarnScheduler.<init>(YarnScheduler.scala:31)
	at org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35)
	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:563)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:327)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j.LogManager
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 26 more

log4j-slf4j-impl is not needed: #35811 (comment)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual test:

 ./dev/make-distribution.sh --name SPARK-38516 --tgz  -Phive -Phive-thriftserver  -Pyarn -Phadoop-2 -Phadoop-provided

@github-actions github-actions bot added the BUILD label Mar 11, 2022
@wangyum wangyum requested a review from viirya March 11, 2022 02:06
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j.version}</version>
<scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<!-- API bridge between log4j 1 and 2 -->
Copy link
Member

@viirya viirya Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think log4j-slf4j-impl is also needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. log4j-slf4j-impl is also needed:

private def isLog4j2(): Boolean = {
// This distinguishes the log4j 1.2 binding, currently
// org.slf4j.impl.Log4jLoggerFactory, from the log4j 2.0 binding, currently
// org.apache.logging.slf4j.Log4jLoggerFactory
val binderClass = StaticLoggerBinder.getSingleton.getLoggerFactoryClassStr
"org.apache.logging.slf4j.Log4jLoggerFactory".equals(binderClass)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we include lof4j-slf4j-impl will we then not end up with multiple slf4j bindings on the classpath since hadoop already includes slf4j-log4j12?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @eejbyfeldt . Removed it.

@wangyum wangyum changed the title [SPARK-38516][BUILD] Add log4j-core and log4j-api to classpath if active hadoop-provided [SPARK-38516][BUILD] Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active hadoop-provided Mar 11, 2022
@eejbyfeldt
Copy link
Contributor

eejbyfeldt commented Mar 11, 2022

I built this branch with ./dev/make-distribution.sh --tgz --name hadoop-provided-test -Phadoop-provided -Pyarn and then run org.apache.spark.examples.SparkPi example with hadoop 3.3.2. That example job runs fine but I do get slf4j warnings regarding multiple bindings:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:../spark-3.3.0-SNAPSHOT-bin-hadoop-provided-test/jars/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:../hadoop-3.3.2/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

@wangyum wangyum changed the title [SPARK-38516][BUILD] Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active hadoop-provided [SPARK-38516][BUILD] Add log4j-core and log4j-api to classpath if active hadoop-provided Mar 11, 2022
Copy link
Contributor

@eejbyfeldt eejbyfeldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built this branch using ./dev/make-distribution.sh --tgz --name hadoop-provided-test -Phadoop-provided -Pyarn and ran the SparkPi example and that ran as expected.

I am bit surprised that having both log4j2 and log4j12 on the classpath does not cause problems. Therefore I expected that we should maybe also include log4j-1.2-api in the hadoop provided build. But when doing that running SparkPi failed with

Exception in thread "Executor task launch worker-1" java.lang.NoClassDefFoundError: Could not initialize class org.slf4j.MDC
	at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:751)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

So that does not seem like it will work.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @wangyum , @viirya , @HyukjinKwon , @eejbyfeldt .
Merged to master for Apache Spark 3.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants