[SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j#35765
[SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j#35765yaooqinn wants to merge 1 commit intoapache:masterfrom
Conversation
…adCallback caused by Log4j
|
cc @viirya |
| } | ||
|
|
||
| override def onFailure(streamId: String, cause: Throwable): Unit = { | ||
| logDebug(s"Error downloading stream $streamId.", cause) |
There was a problem hiding this comment.
So, the deadlock happens when we use DEBUG log level?
There was a problem hiding this comment.
the log4j will loadClass again with ExecutorClassLoader, if the loglevel does not satisfy, it is skipped
There was a problem hiding this comment.
Before the deadlock occurs, I see something very similar with #10337 (comment)
|
Hmm, after I read the description, I think shouldn't we fix it at |
|
thanks, @viirya, |
|
cc @MaxGekk since he is a release manager. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Since this is a removal of log, there is no functional regression .
Let me merge this and I hope this can unblock Apache Kyuubi project and Apache Spark 3.3.0 release.
Thank you, @yaooqinn , @srowen , @viirya , @cloud-fan .
…DownloadCallback caused by Log4j ### What changes were proposed in this pull request? While `log4j.ignoreTCL/log4j2.ignoreTCL` is false, which is the default, it uses the context ClassLoader for the current Thread, see `org.apache.logging.log4j.util.LoaderUtil.loadClass`. While ExecutorClassLoader try to loadClass through remotely though the FileDownload, if error occurs, we will long on debug level, and `log4j...LoaderUtil` will be blocked by ExecutorClassLoader acquired classloading lock. Fortunately, it only happens when ThresholdFilter's level is `debug`. or we can set `log4j.ignoreTCL/log4j2.ignoreTCL` to true, but I don't know what else it will cause. So in this PR, I simply remove the debug log which cause this deadlock ### Why are the changes needed? fix deadlock ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? apache/kyuubi#2046 (comment), with a ut in kyuubi project, resolved(https://github.com/apache/incubator-kyuubi/actions/runs/1950222737) ### Additional Resources [ut.jstack.txt](https://github.com/apache/spark/files/8206457/ut.jstack.txt) Closes #35765 from yaooqinn/SPARK-38446. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit aef6745) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…DownloadCallback caused by Log4j ### What changes were proposed in this pull request? While `log4j.ignoreTCL/log4j2.ignoreTCL` is false, which is the default, it uses the context ClassLoader for the current Thread, see `org.apache.logging.log4j.util.LoaderUtil.loadClass`. While ExecutorClassLoader try to loadClass through remotely though the FileDownload, if error occurs, we will long on debug level, and `log4j...LoaderUtil` will be blocked by ExecutorClassLoader acquired classloading lock. Fortunately, it only happens when ThresholdFilter's level is `debug`. or we can set `log4j.ignoreTCL/log4j2.ignoreTCL` to true, but I don't know what else it will cause. So in this PR, I simply remove the debug log which cause this deadlock ### Why are the changes needed? fix deadlock ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? apache/kyuubi#2046 (comment), with a ut in kyuubi project, resolved(https://github.com/apache/incubator-kyuubi/actions/runs/1950222737) ### Additional Resources [ut.jstack.txt](https://github.com/apache/spark/files/8206457/ut.jstack.txt) Closes #35765 from yaooqinn/SPARK-38446. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit aef6745) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…DownloadCallback caused by Log4j ### What changes were proposed in this pull request? While `log4j.ignoreTCL/log4j2.ignoreTCL` is false, which is the default, it uses the context ClassLoader for the current Thread, see `org.apache.logging.log4j.util.LoaderUtil.loadClass`. While ExecutorClassLoader try to loadClass through remotely though the FileDownload, if error occurs, we will long on debug level, and `log4j...LoaderUtil` will be blocked by ExecutorClassLoader acquired classloading lock. Fortunately, it only happens when ThresholdFilter's level is `debug`. or we can set `log4j.ignoreTCL/log4j2.ignoreTCL` to true, but I don't know what else it will cause. So in this PR, I simply remove the debug log which cause this deadlock ### Why are the changes needed? fix deadlock ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? apache/kyuubi#2046 (comment), with a ut in kyuubi project, resolved(https://github.com/apache/incubator-kyuubi/actions/runs/1950222737) ### Additional Resources [ut.jstack.txt](https://github.com/apache/spark/files/8206457/ut.jstack.txt) Closes #35765 from yaooqinn/SPARK-38446. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit aef6745) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
|
thanks @dongjoon-hyun and all |
…DownloadCallback caused by Log4j ### What changes were proposed in this pull request? While `log4j.ignoreTCL/log4j2.ignoreTCL` is false, which is the default, it uses the context ClassLoader for the current Thread, see `org.apache.logging.log4j.util.LoaderUtil.loadClass`. While ExecutorClassLoader try to loadClass through remotely though the FileDownload, if error occurs, we will long on debug level, and `log4j...LoaderUtil` will be blocked by ExecutorClassLoader acquired classloading lock. Fortunately, it only happens when ThresholdFilter's level is `debug`. or we can set `log4j.ignoreTCL/log4j2.ignoreTCL` to true, but I don't know what else it will cause. So in this PR, I simply remove the debug log which cause this deadlock ### Why are the changes needed? fix deadlock ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? apache/kyuubi#2046 (comment), with a ut in kyuubi project, resolved(https://github.com/apache/incubator-kyuubi/actions/runs/1950222737) ### Additional Resources [ut.jstack.txt](https://github.com/apache/spark/files/8206457/ut.jstack.txt) Closes apache#35765 from yaooqinn/SPARK-38446. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit aef6745) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
While
log4j.ignoreTCL/log4j2.ignoreTCLis false, which is the default, it uses the context ClassLoader for the current Thread, seeorg.apache.logging.log4j.util.LoaderUtil.loadClass. While ExecutorClassLoader try to loadClass through remotely though the FileDownload, if error occurs, we will long on debug level, andlog4j...LoaderUtilwill be blocked by ExecutorClassLoader acquired classloading lock.Fortunately, it only happens when ThresholdFilter's level is
debug.or we can set
log4j.ignoreTCL/log4j2.ignoreTCLto true, but I don't know what else it will cause.So in this PR, I simply remove the debug log which cause this deadlock
Why are the changes needed?
fix deadlock
Does this PR introduce any user-facing change?
no
How was this patch tested?
apache/kyuubi#2046 (comment), with a ut in kyuubi project, resolved(https://github.com/apache/incubator-kyuubi/actions/runs/1950222737)
Additional Resources
ut.jstack.txt