-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Support Spark Structured Logging API #3146
[Spark] Support Spark Structured Logging API #3146
Conversation
4f0bcac
to
a5f2efb
Compare
a5f2efb
to
cd4e2d4
Compare
* All structured logging `keys` used in `MDC` must extends `LogKeyShims` | ||
*/ | ||
trait LogKeyShims { | ||
val name: String = this.toString.toLowerCase(Locale.ROOT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a recent changes in https://github.com/apache/spark/pull/46634/files#diff-41ea575e2e1305fa8a92f4e3f549e7b896d97eac0e43e00d3dc65c816846e349R52. Let's wait until the PR is merged (probably this week)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased with this change
* Various keys used for mapped diagnostic contexts(MDC) in logging. All structured logging keys | ||
* should be defined here for standardization. | ||
*/ | ||
object LogKeys { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: I plan to reduce the number of log keys in Apache Spark. Meanwhile, there will be new log keys from delta in this file. How will we keep sync with the Apache Spark?
Shall we just store the log keys for delta logging in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can and should just store the keys that Delta uses. This means potentially duplicating some keys with Spark in this file(for the keys that Delta uses) but that's fine.
I removed the used keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
import java.util.Locale | ||
|
||
// LogKey is part of Spark's Structured Logging API and is not available in Spark 3.5. | ||
trait LogKeyShims { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not singular (LogKeyShim
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows the naming convention under https://github.com/delta-io/delta/tree/master/spark/src/main/scala-spark-3.5/shims
*/ | ||
|
||
/* | ||
* Copyright (2021) The Delta Lake Project Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's a new file, use 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like every file under spark/ always uses 2021.
// MDC is part of Spark's Structured Logging API and is not available in Spark 3.5. | ||
case class MDC(key: LogKeyShims, value: Any) { | ||
require(!value.isInstanceOf[MessageWithContext], | ||
"the class of value cannot be MessageWithContext") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/cannot/must
Please add the type to the error OR remove the error message altogether (since it does not add much anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a stripped-down version from https://github.com/apache/spark/blob/bdcb79f23da3d09469910508426a54a78adcbda6/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala#L78 to provide the syntax support, without any functionality.
I prefer to keep it as is if it doesn't fix anything.
new LogEntry(msgWithCtx) | ||
} | ||
|
||
trait LoggingShims extends Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A singular?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaceklaskowski which of your comments are blocking? I see you have used the request changes option
*/ | ||
|
||
/* | ||
* Copyright (2021) The Delta Lake Project Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like every file under spark/ always uses 2021.
import java.util.Locale | ||
|
||
// LogKey is part of Spark's Structured Logging API and is not available in Spark 3.5. | ||
trait LogKeyShims { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows the naming convention under https://github.com/delta-io/delta/tree/master/spark/src/main/scala-spark-3.5/shims
// MDC is part of Spark's Structured Logging API and is not available in Spark 3.5. | ||
case class MDC(key: LogKeyShims, value: Any) { | ||
require(!value.isInstanceOf[MessageWithContext], | ||
"the class of value cannot be MessageWithContext") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a stripped-down version from https://github.com/apache/spark/blob/bdcb79f23da3d09469910508426a54a78adcbda6/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala#L78 to provide the syntax support, without any functionality.
I prefer to keep it as is if it doesn't fix anything.
new LogEntry(msgWithCtx) | ||
} | ||
|
||
trait LoggingShims extends Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
#### Which Delta project/connector is this regarding? - [X] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description #3146 added support for spark master structured logging, but broke the test logging for delta-spark against spark 3.5. This PR fixes that. The broken logging looked like this (https://github.com/delta-io/delta/actions/runs/10856009436/job/30129811815): ``` ERROR StatusConsoleListener Unable to locate plugin type for JsonTemplateLayout ERROR StatusConsoleListener Unable to locate plugin for JsonTemplateLayout ERROR StatusConsoleListener Could not create plugin of type class org.apache.logging.log4j.core.appender.FileAppender for element File: java.lang.NullPointerException java.lang.NullPointerException at org.apache.logging.log4j.core.config.plugins.visitors.PluginElementVisitor.findNamedNode(PluginElementVisitor.java:104) ``` ## How was this patch tested? GitHub CI tests. ## Does this PR introduce _any_ user-facing changes? No.
Which Delta project/connector is this regarding?
Description
Uber issue: #3145
Support Spark Structured Logging API by using the shimming layer. Spark Structured Logging is only available in Spark 4.0 snapshot so need to shim the API to make it compile for Spark 3.5.
How was this patch tested?
Tests that the new API on Spark master can produce structured logs: DeltaStructuredLoggingSuite
Tests that the shimming API on Spark 3.5 still produce the same plain text logs: DeltaPatternLoggingSuite
Does this PR introduce any user-facing changes?
No.