[SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided #41195

wangyum · 2023-05-17T07:48:19Z

What changes were proposed in this pull request?

This PR adds log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided.

Why are the changes needed?

To fix log issue.

How to reproduce this issue:

Build Spark:

./dev/make-distribution.sh --name provided --tgz -Phive -Phive-thriftserver -Pyarn -Phadoop-provided
tar -zxf spark-3.5.0-SNAPSHOT-bin-provided.tgz

Copy the following jars to spark-3.5.0-SNAPSHOT-bin-provided/jars:

guava-14.0.1.jar
hadoop-client-api-3.3.5.jar
hadoop-client-runtime-3.3.5.jar
hadoop-shaded-guava-1.1.1.jar
hadoop-yarn-server-web-proxy-3.3.5.jar
slf4j-api-2.0.7.jar

Add a new log4j2.properties to spark-3.5.0-SNAPSHOT-bin-provided/conf:

rootLogger.level = info
rootLogger.appenderRef.file.ref = File
rootLogger.appenderRef.stderr.ref = console

appender.console.type = Console
appender.console.name = console
appender.console.target = SYSTEM_ERR
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{2}:%L : %m%n

appender.file.type = RollingFile
appender.file.name = File
appender.file.fileName = /tmp/spark/logs/spark.log
appender.file.filePattern = /tmp/spark/logs/spark.%d{yyyyMMdd-HH}.log
appender.file.append = true
appender.file.layout.type = PatternLayout
appender.file.layout.pattern = %d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{2}:%L : %m%n
appender.file.policies.type = Policies
appender.file.policies.time.type = TimeBasedTriggeringPolicy
appender.file.policies.time.interval = 1
appender.file.policies.time.modulate = true
appender.file.policies.size.type = SizeBasedTriggeringPolicy
appender.file.policies.size.size = 256M
appender.file.strategy.type = DefaultRolloverStrategy
appender.file.strategy.max = 100

Start Spark thriftserver: sbin/start-thriftserver.sh.
The log file is empty: cat /tmp/spark/logs/spark.log.
Copy the following jars to spark-3.5.0-SNAPSHOT-bin-provided/jars:
```
log4j-1.2-api-2.20.0.jar
log4j-slf4j2-impl-2.20.0.jar
```
Restart Spark thriftserver: sbin/start-thriftserver.sh.
The log file is not empty: cat /tmp/spark/logs/spark.log.

This is because hadoop classpath does not contain these jars. So these jars are needed even if hadoop-provided is activated

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual test.

…-provided

srowen · 2023-05-17T13:41:29Z

Can you explain this more - your steps to reproduce require you to remove files from the build, and that causes the problem? but the JARs were there before you removed them. How does removing the scope change the result?

wangyum · 2023-05-17T14:05:47Z

Thank you @srowen. I have updated the PR description.

srowen · 2023-05-17T16:17:39Z

It seems weird that log4j 2 config works, if you add log4j 1.x. Maybe so, just trying to figure out if this is really what's going on and if we have to let log4j 1.x back in? because then we have both log4j in the distribution

Kimahriman · 2023-05-18T00:24:46Z

Maybe similar reason I made #37694 a while ago? Basically Spark logging setup assumes log4j2, but with hadoop provided you get 1.x from Hadoop. So you get weird logging behavior. We just have both providers in the class path and slf seems to pick the 2.x one

viirya · 2023-05-18T16:03:57Z

pom.xml

        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-1.2-api</artifactId>
        <version>${log4j.version}</version>


This looks okay. This is bridge between log4j1 and log4j2. In case of hadoop-provided, we still need this as hadoop classpath doesn't provide it.

srowen · 2023-05-19T13:10:55Z

Seems reasonable then. Let's just get the tests to run again.

Kimahriman · 2023-05-19T18:37:31Z

Actually hit a new issue related to this after finally being able to test out Spark 3.4 from the Delta release. Because of the bump to slf4j 2, it seems log4j-slf4j2-impl doesn't get recognized, so it always falls back to the log4j/reload4j 1.x for logging. I had to also include slf4j-api 2.0.6 to get Spark's built-in log handling to work correctly (i.e. SparkContext.setLogLevel)

wangyum · 2023-05-20T02:16:52Z

@Kimahriman Do you have a way to reproduce?

Kimahriman · 2023-05-20T12:50:35Z

Nevermind it looks like including log4j-1.2-api actually fixes the issue, I wasn't including that before. In fact, just including log4j-1.2-api fixes the issue I made for https://issues.apache.org/jira/browse/SPARK-40246. That might not fix custom log4j2 configs, but at least makes SparkContext.setLogLevel work with a hadoop-provided build

srowen · 2023-05-20T23:45:29Z

Merged to master

…path if active hadoop-provided ### What changes were proposed in this pull request? This PR adds `log4j-1.2-api` and `log4j-slf4j2-impl` to classpath if active `hadoop-provided`. ### Why are the changes needed? To fix log issue. How to reproduce this issue: 1. Build Spark: ``` ./dev/make-distribution.sh --name provided --tgz -Phive -Phive-thriftserver -Pyarn -Phadoop-provided tar -zxf spark-3.5.0-SNAPSHOT-bin-provided.tgz ``` 2. Copy the following jars to spark-3.5.0-SNAPSHOT-bin-provided/jars: ``` guava-14.0.1.jar hadoop-client-api-3.3.5.jar hadoop-client-runtime-3.3.5.jar hadoop-shaded-guava-1.1.1.jar hadoop-yarn-server-web-proxy-3.3.5.jar slf4j-api-2.0.7.jar ``` 3. Add a new log4j2.properties to spark-3.5.0-SNAPSHOT-bin-provided/conf: ``` rootLogger.level = info rootLogger.appenderRef.file.ref = File rootLogger.appenderRef.stderr.ref = console appender.console.type = Console appender.console.name = console appender.console.target = SYSTEM_ERR appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{2}:%L : %m%n appender.file.type = RollingFile appender.file.name = File appender.file.fileName = /tmp/spark/logs/spark.log appender.file.filePattern = /tmp/spark/logs/spark.%d{yyyyMMdd-HH}.log appender.file.append = true appender.file.layout.type = PatternLayout appender.file.layout.pattern = %d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{2}:%L : %m%n appender.file.policies.type = Policies appender.file.policies.time.type = TimeBasedTriggeringPolicy appender.file.policies.time.interval = 1 appender.file.policies.time.modulate = true appender.file.policies.size.type = SizeBasedTriggeringPolicy appender.file.policies.size.size = 256M appender.file.strategy.type = DefaultRolloverStrategy appender.file.strategy.max = 100 ``` 4. Start Spark thriftserver: `sbin/start-thriftserver.sh`. 5. The log file is empty: `cat /tmp/spark/logs/spark.log`. 6. Copy the following jars to spark-3.5.0-SNAPSHOT-bin-provided/jars: ``` log4j-1.2-api-2.20.0.jar log4j-slf4j2-impl-2.20.0.jar ``` 7. Restart Spark thriftserver: `sbin/start-thriftserver.sh`. 8. The log file is not empty: `cat /tmp/spark/logs/spark.log`. This is because hadoop classpath does not contain these jars. So these jars are needed even if `hadoop-provided` is activated ![image](https://github.com/apache/spark/assets/5399861/d53ff28c-05d7-40d7-891f-069449315217) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Closes apache#41195 from wangyum/SPARK-43534. Lead-authored-by: Yuming Wang <wgyumg@gmail.com> Co-authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <srowen@gmail.com>

Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop…

a53fee1

…-provided

github-actions bot added the BUILD label May 17, 2023

Merge branch 'apache:master' into SPARK-43534

fbe17c2

wangyum requested a review from viirya May 18, 2023 02:38

viirya reviewed May 18, 2023

View reviewed changes

Merge branch 'apache:master' into SPARK-43534

66c400b

Merge branch 'apache:master' into SPARK-43534

2c468b8

srowen closed this in 5d03950 May 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided #41195

[SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided #41195

wangyum commented May 17, 2023 •

edited

srowen commented May 17, 2023

wangyum commented May 17, 2023

srowen commented May 17, 2023

Kimahriman commented May 18, 2023

viirya May 18, 2023

srowen commented May 19, 2023

Kimahriman commented May 19, 2023 •

edited

wangyum commented May 20, 2023

Kimahriman commented May 20, 2023

srowen commented May 20, 2023

[SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided #41195

[SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided #41195

Conversation

wangyum commented May 17, 2023 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

srowen commented May 17, 2023

wangyum commented May 17, 2023

srowen commented May 17, 2023

Kimahriman commented May 18, 2023

viirya May 18, 2023

Choose a reason for hiding this comment

srowen commented May 19, 2023

Kimahriman commented May 19, 2023 • edited

wangyum commented May 20, 2023

Kimahriman commented May 20, 2023

srowen commented May 20, 2023

wangyum commented May 17, 2023 •

edited

Kimahriman commented May 19, 2023 •

edited