[SPARK-47577][CORE][PART1] Migrate logError with variables to structured logging framework #45834

gengliangwang · 2024-04-03T06:11:53Z

What changes were proposed in this pull request?

Migrate logError with variables of core module to structured logging framework. This is part1 which transforms the logError entries of the following API

def logError(msg: => String): Unit

to

def logError(entry: LogEntry): Unit

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

Yes, Spark core logs will contain additional MDC

How was this patch tested?

Compiler and scala style checks, as well as code review.

Was this patch authored or co-authored using generative AI tooling?

No

gengliangwang · 2024-04-03T06:14:32Z

cc @panbingkun @pan3793 @itholic as well

common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala

panbingkun · 2024-04-03T11:29:30Z

core/src/main/scala/org/apache/spark/MapOutputTracker.scala

@@ -1736,9 +1738,11 @@ private[spark] object MapOutputTracker extends Logging {

  def validateStatus(status: ShuffleOutputStatus, shuffleId: Int, partition: Int) : Unit = {
    if (status == null) {
-      val errorMessage = s"Missing an output location for shuffle $shuffleId partition $partition"
+      // scalastyle:off line.size.limit


Can we split it into multiple lines? To avoid using // scalastyle:off line.size.limit

If the log is not long, one line makes it easier to read. I suggest we allow both styles.

panbingkun · 2024-04-03T11:36:29Z

common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala

@@ -21,17 +21,56 @@ package org.apache.spark.internal
 * All structured logging keys should be defined here for standardization.
 */
 object LogKey extends Enumeration {
-  val APPLICATION_ID = Value
+  val EXECUTOR_ID = Value


Perhaps we need to first category by some businesses, and then sort them by alphabetically within each one.

Also, we may need a rule, is it an abbreviation or a complete spelling? For example:
APPLICATION_ID OR APP_ID

Otherwise, I am more worried that this class will become very large soon and there will be some duplicate key meanings

Created #45862
BTW, what do you mean by first category?

I originally planned to categorize by category first, and then sort it in alphabetical order for the second level.
Let me give an example:

APPLICATION-ID K8S1ID MEMOSL_ID MAX_SIZE MIN_SIZE

If we only sort by alphabetically, we will get:

APPLICATION_ID MAX_SIZE MEMOSL_ID MIN_SIZE K8S_ID

It's a bit weird for me to see MEMOS_ID between MAX_SIZE and MIN-SIZE.

If we first classify by category at the first level and then by alphabetically at the second level, we will obtain

# ID APPLICATION_ID MEMOSL_ID K8S_ID # SHUFFLE Value MAX_SIZE MIN_SIZE

Just like:

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

Lines 602 to 610 in 7dec5eb

// url functions

expression[UrlEncode]("url_encode"),

expression[UrlDecode]("url_decode"),

expression[ParseUrl]("parse_url"),

// datetime functions

expression[AddMonths]("add_months"),

expression[CurrentDate]("current_date"),

expressionBuilder("curdate", CurDateExpressionBuilder, setAlias = true),

I think as our log migration work progresses, this class will become more and more large. If only sort by alphabetically , it is not sure whether developers and the final log searcher can quickly find the LogKey they want.

@panbingkun I see. We can have a secondary category later.
In the migration, we should use generic keys and try to control the number of keys, so that the logs are easier to be queried.

Of course, its disadvantage is that we cannot use UT like #45857 to fully guarantee that it is written in alphabetical order. It requires some manual intervention.

@panbingkun I see. We can have a secondary category later. In the migration, we should use generic keys and try to control the number of keys, so that the logs are easier to be queried.

Okay.

gengliangwang · 2024-04-04T04:42:06Z

@panbingkun @pan3793 @HyukjinKwon Thanks for the review.
Merging to master.

…red logging framework ### What changes were proposed in this pull request? Migrate logError with variables of core module to structured logging framework. This is part2 which transforms the logError entries of the following API ``` def logError(msg: => String, throwable: Throwable): Unit ``` to ``` def logError(entry: LogEntry, throwable: Throwable): Unit ``` migration Part1 was in #45834 ### Why are the changes needed? To enhance Apache Spark's logging system by implementing structured logging. ### Does this PR introduce _any_ user-facing change? Yes, Spark core logs will contain additional MDC ### How was this patch tested? Compiler and scala style checks, as well as code review. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45890 from gengliangwang/coreError2. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>

cloud-fan · 2024-06-11T22:07:05Z

core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala

@@ -1033,8 +1037,8 @@ private[spark] class TaskSetManager(
        info.host, info.executorId, index, failureReason))
      numFailures(index) += 1
      if (numFailures(index) >= maxTaskFailures) {
-        logError("Task %d in stage %s failed %d times; aborting job".format(
-          index, taskSet.id, maxTaskFailures))
+        logError(log"Task ${MDC(TASK_ID, index)} in stage ${MDC(STAGE_ID, taskSet.id)} failed " +


task id and task index are different things. Here it's task index. @gengliangwang

You are right. I will find time to revisit all the usages of the TASK_ID log key.

and TASK_ATTEMPT_ID should be the same as TASK_ID?

@cloud-fan I created #46951
@pan3793 what do you mean by the same?

spark/core/src/main/scala/org/apache/spark/executor/Executor.scala

Line 639 in 772d7d3

taskAttemptId = taskId,

yes these two are the same

gengliangwang added 6 commits April 2, 2024 11:15

log migrations

faf43c4

Merge remote-tracking branch 'upstream/master' into coreError

8d02242

more migration

48fe976

more migration

34b076c

more migrations

b7ab8b1

Merge remote-tracking branch 'upstream/master' into coreError

8759ceb

github-actions bot added CORE PYTHON labels Apr 3, 2024

revise

f7ccd4f

gengliangwang requested review from cloud-fan and HyukjinKwon April 3, 2024 06:14

gengliangwang requested a review from Ngone51 April 3, 2024 06:14

HyukjinKwon changed the title ~~[SPARK-47577][Core][PART1] Migrate logError with variables to structured logging framework~~ [SPARK-47577][CORE][PART1] Migrate logError with variables to structured logging framework Apr 3, 2024

pan3793 reviewed Apr 3, 2024

View reviewed changes

common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala Outdated Show resolved Hide resolved

pan3793 reviewed Apr 3, 2024

View reviewed changes

common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala Outdated Show resolved Hide resolved

unify APPLICATION_ID and APP_ID

6be2368

github-actions bot added the YARN label Apr 3, 2024

panbingkun reviewed Apr 3, 2024

View reviewed changes

gengliangwang added 5 commits April 3, 2024 14:29

make MDC Serializable

b8c31e4

sort log keys

3c14e1a

APPLICATION_STATE => APP_STATE

cee2fa2

fix

7e5b1ed

Merge remote-tracking branch 'upstream/master' into coreError

f51cf6e

HyukjinKwon approved these changes Apr 4, 2024

View reviewed changes

gengliangwang closed this in 3f6ac60 Apr 4, 2024

gengliangwang mentioned this pull request Apr 5, 2024

[SPARK-47577][CORE][PART2] Migrate logError with variables to structured logging framework #45890

Closed

gengliangwang mentioned this pull request Apr 12, 2024

[SPARK-47804] Add Dataframe cache debug log #45990

Closed

cloud-fan reviewed Jun 11, 2024

View reviewed changes

gengliangwang mentioned this pull request Jun 12, 2024

[SPARK-47577][SPARK-47579] Correct misleading usage of log key TASK_ID #46951

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-47577][CORE][PART1] Migrate logError with variables to structured logging framework #45834

[SPARK-47577][CORE][PART1] Migrate logError with variables to structured logging framework #45834

gengliangwang commented Apr 3, 2024 •

edited

gengliangwang commented Apr 3, 2024

panbingkun Apr 3, 2024

gengliangwang Apr 3, 2024

panbingkun Apr 3, 2024

panbingkun Apr 3, 2024

gengliangwang Apr 3, 2024 •

edited

panbingkun Apr 3, 2024

gengliangwang Apr 3, 2024

panbingkun Apr 3, 2024

panbingkun Apr 3, 2024

gengliangwang commented Apr 4, 2024

cloud-fan Jun 11, 2024

gengliangwang Jun 11, 2024

pan3793 Jun 12, 2024

gengliangwang Jun 12, 2024

pan3793 Jun 12, 2024

cloud-fan Jun 12, 2024

	// url functions
	expression[UrlEncode]("url_encode"),
	expression[UrlDecode]("url_decode"),
	expression[ParseUrl]("parse_url"),

	// datetime functions
	expression[AddMonths]("add_months"),
	expression[CurrentDate]("current_date"),
	expressionBuilder("curdate", CurDateExpressionBuilder, setAlias = true),

[SPARK-47577][CORE][PART1] Migrate logError with variables to structured logging framework #45834

[SPARK-47577][CORE][PART1] Migrate logError with variables to structured logging framework #45834

Conversation

gengliangwang commented Apr 3, 2024 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

gengliangwang commented Apr 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gengliangwang Apr 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gengliangwang commented Apr 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gengliangwang commented Apr 3, 2024 •

edited

gengliangwang Apr 3, 2024 •

edited