Skip to content

[SPARK-47578][CORE] Migrate logWarning with variables to structured logging framework#46309

Closed
dtenedor wants to merge 17 commits intoapache:masterfrom
dtenedor:spark-core-log-warn
Closed

[SPARK-47578][CORE] Migrate logWarning with variables to structured logging framework#46309
dtenedor wants to merge 17 commits intoapache:masterfrom
dtenedor:spark-core-log-warn

Conversation

@dtenedor
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Migrate logWarning with variables of the Spark Core module to structured logging framework. This transforms the logWarning entries of the following API

def logWarning(msg: => String): Unit

to

def logWarning(entry: LogEntry): Unit

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

Yes, Spark core logs will contain additional MDC

How was this patch tested?

Compiler and scala style checks, as well as code review.

Was this patch authored or co-authored using generative AI tooling?

No

Comment thread common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/Dependency.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/SparkConf.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/SparkConf.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/SparkContext.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/api/r/RBackendHandler.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/deploy/master/Master.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/executor/Executor.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/metrics/sink/StatsdReporter.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/util/random/StratifiedSamplingUtils.scala Outdated
@dtenedor dtenedor requested a review from gengliangwang May 1, 2024 17:31
@dtenedor
Copy link
Copy Markdown
Contributor Author

dtenedor commented May 1, 2024

@gengliangwang thanks for a thorough review. I followed your instructions for every comment, and then just resolved them all to clean up the GitHub conversation history page. Please look again when ready!

Comment thread core/src/main/scala/org/apache/spark/deploy/master/Master.scala Outdated
@gengliangwang
Copy link
Copy Markdown
Member

@dtenedor the test failure looks relevant

Comment thread core/src/main/scala/org/apache/spark/metrics/sink/StatsdReporter.scala Outdated
Comment thread core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala Outdated
@dtenedor
Copy link
Copy Markdown
Contributor Author

dtenedor commented May 2, 2024

I will look at this relevant test failure:

[info] *** 1 TEST FAILED ***
[error] Failed: Total 4098, Failed 1, Errors 0, Passed 4097, Ignored 10, Canceled 2
[error] Failed tests:
[error] 	org.apache.spark.deploy.RPackageUtilsSuite
[error] (core / Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 3310 s (55:10), completed May 1, 2024, 7:11:37 PM

The test failure was:

[info] - faulty R package shows documentation *** FAILED *** (786 milliseconds)
[info]   /home/runner/work/spark/spark/target/tmp/spark-e2d8b7bd-763a-408f-b5cf-1d303eead1d4/dep1-c.jar contains R source code. Now installing package.
[info]   Building R package with the command: List('/', 'h', 'o', 'm', 'e', '/', 'r', 'u', 
...
[info]   ', ' ', ' ', ' ', ' ') did not contain MessageWithContext("In order for Spark to build R packages that are parts of Spark Packages, there are a few
[info]   requirements. The R source code must be shipped in a jar, with additional Java/Scala
[info]   ...
[info]       ", {"has_r_package"="Spark-HasRPackage"}) (RPackageUtilsSuite.scala:130)
...

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-47578][CORE] Spark Core: Migrate logWarning with variables to structured logging framework [SPARK-47578][CORE] Migrate logWarning with variables to structured logging framework May 2, 2024
@dtenedor dtenedor requested a review from gengliangwang May 2, 2024 21:03
@dtenedor
Copy link
Copy Markdown
Contributor Author

dtenedor commented May 2, 2024

@gengliangwang thanks for your review, responded to comments, please look again.

verbose = true)
val output = lineBuffer.mkString("\n")
assert(output.contains(RPackageUtils.RJarDoc))
assert(output.contains("Building R package"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix seems hacky. Is there a better fix for this one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this change is not too great.
I reverted the changes to RPackageUtils.scala and this file for now.
I will figure out how to get the R packages installed on my local machine and fix this file in a next step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created https://issues.apache.org/jira/browse/SPARK-48122 to track it separately.

@dtenedor
Copy link
Copy Markdown
Contributor Author

dtenedor commented May 4, 2024

The two remaining test failures appear to be flaky/unrelated.

@gengliangwang
Copy link
Copy Markdown
Member

Thanks, merging to master

sinaiamonkar-sai pushed a commit to sinaiamonkar-sai/spark that referenced this pull request May 5, 2024
…ogging framework

### What changes were proposed in this pull request?

Migrate logWarning with variables of the Spark Core module to structured logging framework. This transforms the logWarning entries of the following API
```
def logWarning(msg: => String): Unit
```
to
```
def logWarning(entry: LogEntry): Unit
```

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

Yes, Spark core logs will contain additional MDC

### How was this patch tested?

Compiler and scala style checks, as well as code review.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#46309 from dtenedor/spark-core-log-warn.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants