-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided #41195
Conversation
Can you explain this more - your steps to reproduce require you to remove files from the build, and that causes the problem? but the JARs were there before you removed them. How does removing the scope change the result? |
Thank you @srowen. I have updated the PR description. |
It seems weird that log4j 2 config works, if you add log4j 1.x. Maybe so, just trying to figure out if this is really what's going on and if we have to let log4j 1.x back in? because then we have both log4j in the distribution |
Maybe similar reason I made #37694 a while ago? Basically Spark logging setup assumes log4j2, but with hadoop provided you get 1.x from Hadoop. So you get weird logging behavior. We just have both providers in the class path and slf seems to pick the 2.x one |
<groupId>org.apache.logging.log4j</groupId> | ||
<artifactId>log4j-1.2-api</artifactId> | ||
<version>${log4j.version}</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks okay. This is bridge between log4j1 and log4j2. In case of hadoop-provided, we still need this as hadoop classpath doesn't provide it.
Seems reasonable then. Let's just get the tests to run again. |
Actually hit a new issue related to this after finally being able to test out Spark 3.4 from the Delta release. Because of the bump to slf4j 2, it seems |
@Kimahriman Do you have a way to reproduce? |
Nevermind it looks like including |
Merged to master |
…path if active hadoop-provided ### What changes were proposed in this pull request? This PR adds `log4j-1.2-api` and `log4j-slf4j2-impl` to classpath if active `hadoop-provided`. ### Why are the changes needed? To fix log issue. How to reproduce this issue: 1. Build Spark: ``` ./dev/make-distribution.sh --name provided --tgz -Phive -Phive-thriftserver -Pyarn -Phadoop-provided tar -zxf spark-3.5.0-SNAPSHOT-bin-provided.tgz ``` 2. Copy the following jars to spark-3.5.0-SNAPSHOT-bin-provided/jars: ``` guava-14.0.1.jar hadoop-client-api-3.3.5.jar hadoop-client-runtime-3.3.5.jar hadoop-shaded-guava-1.1.1.jar hadoop-yarn-server-web-proxy-3.3.5.jar slf4j-api-2.0.7.jar ``` 3. Add a new log4j2.properties to spark-3.5.0-SNAPSHOT-bin-provided/conf: ``` rootLogger.level = info rootLogger.appenderRef.file.ref = File rootLogger.appenderRef.stderr.ref = console appender.console.type = Console appender.console.name = console appender.console.target = SYSTEM_ERR appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{2}:%L : %m%n appender.file.type = RollingFile appender.file.name = File appender.file.fileName = /tmp/spark/logs/spark.log appender.file.filePattern = /tmp/spark/logs/spark.%d{yyyyMMdd-HH}.log appender.file.append = true appender.file.layout.type = PatternLayout appender.file.layout.pattern = %d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{2}:%L : %m%n appender.file.policies.type = Policies appender.file.policies.time.type = TimeBasedTriggeringPolicy appender.file.policies.time.interval = 1 appender.file.policies.time.modulate = true appender.file.policies.size.type = SizeBasedTriggeringPolicy appender.file.policies.size.size = 256M appender.file.strategy.type = DefaultRolloverStrategy appender.file.strategy.max = 100 ``` 4. Start Spark thriftserver: `sbin/start-thriftserver.sh`. 5. The log file is empty: `cat /tmp/spark/logs/spark.log`. 6. Copy the following jars to spark-3.5.0-SNAPSHOT-bin-provided/jars: ``` log4j-1.2-api-2.20.0.jar log4j-slf4j2-impl-2.20.0.jar ``` 7. Restart Spark thriftserver: `sbin/start-thriftserver.sh`. 8. The log file is not empty: `cat /tmp/spark/logs/spark.log`. This is because hadoop classpath does not contain these jars. So these jars are needed even if `hadoop-provided` is activated ![image](https://github.com/apache/spark/assets/5399861/d53ff28c-05d7-40d7-891f-069449315217) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Closes apache#41195 from wangyum/SPARK-43534. Lead-authored-by: Yuming Wang <wgyumg@gmail.com> Co-authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen <srowen@gmail.com>
What changes were proposed in this pull request?
This PR adds
log4j-1.2-api
andlog4j-slf4j2-impl
to classpath if activehadoop-provided
.Why are the changes needed?
To fix log issue.
How to reproduce this issue:
Build Spark:
Copy the following jars to spark-3.5.0-SNAPSHOT-bin-provided/jars:
Add a new log4j2.properties to spark-3.5.0-SNAPSHOT-bin-provided/conf:
Start Spark thriftserver:
sbin/start-thriftserver.sh
.The log file is empty:
cat /tmp/spark/logs/spark.log
.Copy the following jars to spark-3.5.0-SNAPSHOT-bin-provided/jars:
Restart Spark thriftserver:
sbin/start-thriftserver.sh
.The log file is not empty:
cat /tmp/spark/logs/spark.log
.This is because hadoop classpath does not contain these jars. So these jars are needed even if
hadoop-provided
is activatedDoes this PR introduce any user-facing change?
No.
How was this patch tested?
Manual test.