Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48134][CORE] Spark core (java side): Migrate error/warn/info with variables to structured logging framework #46390

Closed
wants to merge 7 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented May 5, 2024

What changes were proposed in this pull request?

The pr aims to
1.migrate error/warn/info in module core with variables to structured logging framework for java side.
2.convert all dependencies on org.slf4j.Logger & org.slf4j.LoggerFactory to org.apache.spark.internal.Logger & org.apache.spark.internal.LoggerFactory, in order to completely prohibit importing org.slf4j.Logger & org.slf4j.LoggerFactory in java code later.

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

… with variables to structured logging framework
@github-actions github-actions bot added the CORE label May 5, 2024
@@ -34,10 +34,26 @@ public class Logger {
this.slf4jLogger = slf4jLogger;
}

public boolean isErrorEnabled() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is found that some logic uses this method during migration.

slf4jLogger.error(msg, arg1, arg2);
}

public void error(String msg, String... args) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the method signature not public void error(String msg, Object... args)
Because it conflicts with public void error(String msg, MDC... mdcs)
When we want to use error(String msg, MDC... mdcs), it will enter error(String msg, Object... args)

@@ -13,13 +13,6 @@
*/
package org.apache.spark.io;

import com.google.common.base.Preconditions;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking this chance, let's adjust the import order of java code

@@ -42,7 +45,7 @@
*/
public class ReadAheadInputStream extends InputStream {

private static final Logger logger = LoggerFactory.getLogger(ReadAheadInputStream.class);
private static final Logger LOGGER = LoggerFactory.getLogger(ReadAheadInputStream.class);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's capitalize variables of type static final.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest avoiding such unnecessary change.
In https://www.slf4j.org/api/org/slf4j/Logger.html, the example code is using logger as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

requestingConsumer);
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("Task {} acquired {} for {}", String.valueOf(taskAttemptId),
Utils.bytesToString(got), requestingConsumer.toString());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This place does not need to check whether requestingConsumer is null, because it has been checked previously as following:
image

@panbingkun
Copy link
Contributor Author

cc @gengliangwang

@panbingkun panbingkun marked this pull request as ready for review May 6, 2024 10:42
@@ -205,7 +208,7 @@ private void closeUnderlyingInputStreamIfNecessary() {
try {
underlyingInputStream.close();
} catch (IOException e) {
logger.warn(e.getMessage(), e);
LOGGER.warn("{}", e, MDC.of(LogKeys.REASON$.MODULE$, e.getMessage()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REASON => ERROR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

MDC.of(LogKeys.THREAD_ID$.MODULE$, Thread.currentThread().getId()),
MDC.of(LogKeys.MEMORY_SIZE$.MODULE$, Utils.bytesToString(getMemoryUsage())),
MDC.of(LogKeys.NUM_SPILL_WRITERS$.MODULE$, spillWriters.size()),
MDC.of(LogKeys.TIME_UNIT$.MODULE$, spillWriters.size() > 1 ? " times" : " time"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPILL_TIMES?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dongjoon-hyun pushed a commit that referenced this pull request May 6, 2024
…uctured logging framework

### What changes were proposed in this pull request?

Since we are targeting on migration INFO/WARN/ERROR level logs to structure logging, this PR removes the logDebug and logTrace methods from the JAVA structured logging framework.

### Why are the changes needed?

In the log migration PR #46390, there are unnecessary changes such as updating
```
logger.debug("Task {} need to spill {} for {}", taskAttemptId,
            Utils.bytesToString(required - got), requestingConsumer);
```
to
```
LOGGER.debug("Task {} need to spill {} for {}", String.valueOf(taskAttemptId),
            Utils.bytesToString(required - got), requestingConsumer.toString());
```

With this PR, we can avoid such changes during log migrations.
### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Existing UT.
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46405 from gengliangwang/updateJavaLog.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@panbingkun
Copy link
Contributor Author

@gengliangwang
All done.
Thanks.

@@ -17,21 +17,22 @@

package org.apache.spark.util.collection.unsafe.sort;

import java.io.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's revert the unnecessary changes in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the import order of the file header, or from org.slf4j.Logger & org.slf4j.LoggerFactory to org.apache.spark.internal.Logger &org.apache.spark.internal.LoggerFactory
The later change is to prohibit the import of org.slf4j.Logger & org.slf4j.LoggerFactory in java code later, so as to prevent subsequent developers from introducingorg.slf4j.Logger & org.slf4j.LoggerFactory again, thus writing inconsistent logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no log migration in this file, why do we need to change the import order here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import order has been restored.
I actually have an idea to align the import order of java with the import order of scala.
It's OK, I restored it, 😄

logger.info("Spilling data because number of spilledRecords crossed the threshold " +
numElementsForSpillThreshold);
logger.info("Spilling data because number of spilledRecords crossed the threshold {}",
MDC.of(LogKeys.SHUFFLE_SPILL_NUM_ELEMENTS_FORCE_SPILL_THRESHOLD$.MODULE$,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this key is sort of long. How about NUM_ELEMENTS_SPILL_THRESHOLD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except two comments

@gengliangwang
Copy link
Member

Thanks, merging to master

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…uctured logging framework

### What changes were proposed in this pull request?

Since we are targeting on migration INFO/WARN/ERROR level logs to structure logging, this PR removes the logDebug and logTrace methods from the JAVA structured logging framework.

### Why are the changes needed?

In the log migration PR apache#46390, there are unnecessary changes such as updating
```
logger.debug("Task {} need to spill {} for {}", taskAttemptId,
            Utils.bytesToString(required - got), requestingConsumer);
```
to
```
LOGGER.debug("Task {} need to spill {} for {}", String.valueOf(taskAttemptId),
            Utils.bytesToString(required - got), requestingConsumer.toString());
```

With this PR, we can avoid such changes during log migrations.
### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Existing UT.
### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#46405 from gengliangwang/updateJavaLog.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
… with variables to structured logging framework

### What changes were proposed in this pull request?
The pr aims to
1.migrate `error/warn/info` in module `core` with variables to `structured logging framework` for java side.
2.convert all dependencies on `org.slf4j.Logger & org.slf4j.LoggerFactory` to `org.apache.spark.internal.Logger & org.apache.spark.internal.LoggerFactory`, in order to completely `prohibit` importing `org.slf4j.Logger & org.slf4j.LoggerFactory` in java code later.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#46390 from panbingkun/core_java_sl.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants