[SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to structured logging framework #45904

dtenedor · 2024-04-05T23:31:41Z

What changes were proposed in this pull request?

Migrate logWarning with variables of the Catalyst module to structured logging framework. This transforms the logWarning entries of the following API

def logWarning(msg: => String): Unit

to

def logWarning(entry: LogEntry): Unit

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

Yes, Spark core logs will contain additional MDC

How was this patch tested?

Compiler and scala style checks, as well as code review.

Was this patch authored or co-authored using generative AI tooling?

No

commit

dtenedor · 2024-04-05T23:32:57Z

@gengliangwang here is the structured logging migration for the logWarning calls within Catalyst per https://issues.apache.org/jira/browse/SPARK-47581.

gengliangwang · 2024-04-08T20:46:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/StreamingJoinHelper.scala

@@ -284,7 +287,8 @@ object StreamingJoinHelper extends PredicateHelper with Logging {
          Seq(negateIfNeeded(castedLit, negate))
        case a @ _ =>
          logWarning(
-            s"Failed to extract state value watermark from condition $exprToCollectFrom due to $a")
+            log"Failed to extract state value watermark from condition " +
+              log"${MDC(JOIN_CONDITION, exprToCollectFrom)} due to ${MDC(JOIN_CONDITION, a)}")


What is a here? It seems that you are using a same key for two variables

Sorry about this, I did not know we had to have unique keys for the log keys. Fixed now.

gengliangwang · 2024-04-08T20:50:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVHeaderChecker.scala

-                  |Expected: ${fieldNames(i)} but found: ${columnNames(i)}
-                  |$source""".stripMargin)
+              log"""|CSV header does not conform to the schema.
+                    | Header: ${MDC(COLUMN_NAME, columnNames.mkString(", "))}


We can't use COLUMN_NAME for 4 different variables...

gengliangwang · 2024-04-08T20:54:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharVarcharUtils.scala

-        s" them as string type as same as Spark 3.0 and earlier")
+      logWarning(log"The Spark cast operator does not support char/varchar type and simply treats" +
+        log" them as string type. Please use string type directly to avoid confusion. Otherwise," +
+        log" you can set ${MDC(SQL_CONF_KEY, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " +


Suggested change

log" you can set ${MDC(SQL_CONF_KEY, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " +

log" you can set ${MDC(CONFIG, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " +

gengliangwang · 2024-04-08T20:55:24Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala

-              s"${l.dataType} to $targetType due to ${e.getMessage}")
+            logWarning(log"Failed to cast default value '${MDC(COLUMN_DEFAULT_VALUE, l)}' " +
+              log"for column ${MDC(COLUMN_NAME, colName)} " +
+              log"from ${MDC(COLUMN_DATA_TYPE, l.dataType)} " +


we can't use COLUMN_DATA_TYPE for two variables here.

gengliangwang · 2024-04-08T20:56:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala

+            log"plan had length ${MDC(QUERY_PLAN_LENGTH, length)} " +
+            log"and the maximum is ${MDC(QUERY_PLAN_LENGTH, maxLength)}. This behavior " +
+            log"can be adjusted by setting " +
+            log"'${MDC(QUERY_PLAN_LENGTH, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.")


Suggested change

log"'${MDC(QUERY_PLAN_LENGTH, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.")

log"'${MDC(CONFIG, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.")

gengliangwang · 2024-04-09T00:17:33Z

common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala

@@ -95,6 +117,8 @@ object LogKey extends Enumeration {
  val SHUFFLE_MERGE_ID = Value
  val SIZE = Value
  val SLEEP_TIME = Value
+  val SLEEP_TIME_SECONDS = Value


This is not used.

common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala

…cala

gengliangwang · 2024-04-09T05:55:49Z

Thanks, merging to master

panbingkun · 2024-04-09T12:14:59Z

@dtenedor @gengliangwang
Unfortunately, GA's master workflow has failed.
https://github.com/apache/spark/actions/runs/8611002654/job/23597440628

panbingkun · 2024-04-09T12:18:43Z

I submitted a follow-up PR, let it recover first
#45958

### What changes were proposed in this pull request? The pr aims to restore GA's master workflow. ### Why are the changes needed? Make GA's master workflow happy. After #45904, unfortunately, GA's master workflow has failed. https://github.com/apache/spark/actions/runs/8611002654/job/23597440628 <img width="967" alt="image" src="https://github.com/apache/spark/assets/15246973/5a12af6f-8d3d-491b-afdd-61e3bed20f47"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually test. Pas GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45958 from panbingkun/SPARK-47581_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Gengliang Wang <gengliang@apache.org>

dtenedor · 2024-04-09T17:41:14Z

Thanks @panbingkun for the fix!!

dtenedor added 13 commits April 3, 2024 16:57

commit

f2f9ca8

respond to code review comments

bc30662

sync from master

ba6e1ce

improve code health in Logging.scala

8855128

improve code health in Logging.scala

0493a53

fix scalastyle

7b198b8

respond to code review comments

598c8c1

respond to code review comments

f855def

sync

dc2ef8a

respond to code review comments

5d7b984

commit

2d7d3ff

commit

commit

8883301

commit

8889346

github-actions bot added the SQL label Apr 5, 2024

dtenedor added 2 commits April 8, 2024 11:45

sync

6885080

fix compile

ede21ac

gengliangwang changed the title ~~[SPARK-47581][INFRA] SQL catalyst: Migrate logWarning with variables to structured logging framework~~ [SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to structured logging framework Apr 8, 2024

gengliangwang reviewed Apr 8, 2024

View reviewed changes

respond to code review comments

98ac749

gengliangwang approved these changes Apr 8, 2024

View reviewed changes

dtenedor added 2 commits April 8, 2024 16:57

fix test

b1bade9

fix test

33a6075

gengliangwang reviewed Apr 9, 2024

View reviewed changes

common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala Outdated Show resolved Hide resolved

Update common/utils/src/main/scala/org/apache/spark/internal/LogKey.s…

f5fb4fd

…cala

gengliangwang closed this in 149ac0f Apr 9, 2024

panbingkun mentioned this pull request Apr 9, 2024

[SPARK-47581][CORE][FOLLOWUP] Fix GA failure #45958

Closed

panbingkun mentioned this pull request Apr 9, 2024

[SPARK-47595][STREAMING] Streaming: Migrate logError with variables to structured logging framework #45910

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to structured logging framework #45904

[SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to structured logging framework #45904

dtenedor commented Apr 5, 2024

dtenedor commented Apr 5, 2024

gengliangwang Apr 8, 2024

dtenedor Apr 8, 2024

gengliangwang Apr 8, 2024

gengliangwang Apr 8, 2024

gengliangwang Apr 8, 2024

gengliangwang Apr 8, 2024

gengliangwang Apr 9, 2024

gengliangwang commented Apr 9, 2024

panbingkun commented Apr 9, 2024 •

edited

panbingkun commented Apr 9, 2024

dtenedor commented Apr 9, 2024

	log" you can set ${MDC(SQL_CONF_KEY, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " +
	log" you can set ${MDC(CONFIG, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " +

	log"'${MDC(QUERY_PLAN_LENGTH, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.")
	log"'${MDC(CONFIG, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.")

[SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to structured logging framework #45904

[SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to structured logging framework #45904

Conversation

dtenedor commented Apr 5, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

dtenedor commented Apr 5, 2024

gengliangwang Apr 8, 2024

Choose a reason for hiding this comment

dtenedor Apr 8, 2024

Choose a reason for hiding this comment

gengliangwang Apr 8, 2024

Choose a reason for hiding this comment

gengliangwang Apr 8, 2024

Choose a reason for hiding this comment

gengliangwang Apr 8, 2024

Choose a reason for hiding this comment

gengliangwang Apr 8, 2024

Choose a reason for hiding this comment

gengliangwang Apr 9, 2024

Choose a reason for hiding this comment

gengliangwang commented Apr 9, 2024

panbingkun commented Apr 9, 2024 • edited

panbingkun commented Apr 9, 2024

dtenedor commented Apr 9, 2024

panbingkun commented Apr 9, 2024 •

edited