Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Support Spark Structured Logging API #3146

Merged
merged 7 commits into from
Jun 13, 2024

Conversation

zedtang
Copy link
Collaborator

@zedtang zedtang commented May 23, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Uber issue: #3145

Support Spark Structured Logging API by using the shimming layer. Spark Structured Logging is only available in Spark 4.0 snapshot so need to shim the API to make it compile for Spark 3.5.

How was this patch tested?

Tests that the new API on Spark master can produce structured logs: DeltaStructuredLoggingSuite
Tests that the shimming API on Spark 3.5 still produce the same plain text logs: DeltaPatternLoggingSuite

Does this PR introduce any user-facing changes?

No.

@zedtang zedtang self-assigned this May 23, 2024
* All structured logging `keys` used in `MDC` must extends `LogKeyShims`
*/
trait LogKeyShims {
val name: String = this.toString.toLowerCase(Locale.ROOT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a recent changes in https://github.com/apache/spark/pull/46634/files#diff-41ea575e2e1305fa8a92f4e3f549e7b896d97eac0e43e00d3dc65c816846e349R52. Let's wait until the PR is merged (probably this week)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased with this change

* Various keys used for mapped diagnostic contexts(MDC) in logging. All structured logging keys
* should be defined here for standardization.
*/
object LogKeys {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: I plan to reduce the number of log keys in Apache Spark. Meanwhile, there will be new log keys from delta in this file. How will we keep sync with the Apache Spark?
Shall we just store the log keys for delta logging in this file?

Copy link
Collaborator Author

@zedtang zedtang May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can and should just store the keys that Delta uses. This means potentially duplicating some keys with Spark in this file(for the keys that Delta uses) but that's fine.

I removed the used keys.

Copy link
Contributor

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

import java.util.Locale

// LogKey is part of Spark's Structured Logging API and is not available in Spark 3.5.
trait LogKeyShims {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not singular (LogKeyShim)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/

/*
* Copyright (2021) The Delta Lake Project Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a new file, use 2024

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like every file under spark/ always uses 2021.

// MDC is part of Spark's Structured Logging API and is not available in Spark 3.5.
case class MDC(key: LogKeyShims, value: Any) {
require(!value.isInstanceOf[MessageWithContext],
"the class of value cannot be MessageWithContext")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/cannot/must

Please add the type to the error OR remove the error message altogether (since it does not add much anyway).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a stripped-down version from https://github.com/apache/spark/blob/bdcb79f23da3d09469910508426a54a78adcbda6/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala#L78 to provide the syntax support, without any functionality.
I prefer to keep it as is if it doesn't fix anything.

new LogEntry(msgWithCtx)
}

trait LoggingShims extends Logging {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A singular?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

@zedtang zedtang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaceklaskowski which of your comments are blocking? I see you have used the request changes option

*/

/*
* Copyright (2021) The Delta Lake Project Authors.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like every file under spark/ always uses 2021.

import java.util.Locale

// LogKey is part of Spark's Structured Logging API and is not available in Spark 3.5.
trait LogKeyShims {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// MDC is part of Spark's Structured Logging API and is not available in Spark 3.5.
case class MDC(key: LogKeyShims, value: Any) {
require(!value.isInstanceOf[MessageWithContext],
"the class of value cannot be MessageWithContext")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a stripped-down version from https://github.com/apache/spark/blob/bdcb79f23da3d09469910508426a54a78adcbda6/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala#L78 to provide the syntax support, without any functionality.
I prefer to keep it as is if it doesn't fix anything.

new LogEntry(msgWithCtx)
}

trait LoggingShims extends Logging {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@allisonport-db allisonport-db merged commit 7a451dc into delta-io:master Jun 13, 2024
10 checks passed
@zedtang zedtang deleted the shim-structured-logging branch June 13, 2024 22:47
scottsand-db added a commit that referenced this pull request Sep 20, 2024
#### Which Delta project/connector is this regarding?

- [X] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

#3146 added support for spark
master structured logging, but broke the test logging for delta-spark
against spark 3.5. This PR fixes that.

The broken logging looked like this
(https://github.com/delta-io/delta/actions/runs/10856009436/job/30129811815):

```
ERROR StatusConsoleListener Unable to locate plugin type for JsonTemplateLayout
ERROR StatusConsoleListener Unable to locate plugin for JsonTemplateLayout
ERROR StatusConsoleListener Could not create plugin of type class org.apache.logging.log4j.core.appender.FileAppender for element File: java.lang.NullPointerException
 java.lang.NullPointerException
	at org.apache.logging.log4j.core.config.plugins.visitors.PluginElementVisitor.findNamedNode(PluginElementVisitor.java:104)
```

## How was this patch tested?

GitHub CI tests.

## Does this PR introduce _any_ user-facing changes?

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants