-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to structured logging framework #45904
Conversation
@gengliangwang here is the structured logging migration for the |
@@ -284,7 +287,8 @@ object StreamingJoinHelper extends PredicateHelper with Logging { | |||
Seq(negateIfNeeded(castedLit, negate)) | |||
case a @ _ => | |||
logWarning( | |||
s"Failed to extract state value watermark from condition $exprToCollectFrom due to $a") | |||
log"Failed to extract state value watermark from condition " + | |||
log"${MDC(JOIN_CONDITION, exprToCollectFrom)} due to ${MDC(JOIN_CONDITION, a)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a
here? It seems that you are using a same key for two variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about this, I did not know we had to have unique keys for the log keys. Fixed now.
|Expected: ${fieldNames(i)} but found: ${columnNames(i)} | ||
|$source""".stripMargin) | ||
log"""|CSV header does not conform to the schema. | ||
| Header: ${MDC(COLUMN_NAME, columnNames.mkString(", "))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use COLUMN_NAME for 4 different variables...
s" them as string type as same as Spark 3.0 and earlier") | ||
logWarning(log"The Spark cast operator does not support char/varchar type and simply treats" + | ||
log" them as string type. Please use string type directly to avoid confusion. Otherwise," + | ||
log" you can set ${MDC(SQL_CONF_KEY, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log" you can set ${MDC(SQL_CONF_KEY, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " + | |
log" you can set ${MDC(CONFIG, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " + |
s"${l.dataType} to $targetType due to ${e.getMessage}") | ||
logWarning(log"Failed to cast default value '${MDC(COLUMN_DEFAULT_VALUE, l)}' " + | ||
log"for column ${MDC(COLUMN_NAME, colName)} " + | ||
log"from ${MDC(COLUMN_DATA_TYPE, l.dataType)} " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't use COLUMN_DATA_TYPE for two variables here.
log"plan had length ${MDC(QUERY_PLAN_LENGTH, length)} " + | ||
log"and the maximum is ${MDC(QUERY_PLAN_LENGTH, maxLength)}. This behavior " + | ||
log"can be adjusted by setting " + | ||
log"'${MDC(QUERY_PLAN_LENGTH, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log"'${MDC(QUERY_PLAN_LENGTH, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.") | |
log"'${MDC(CONFIG, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.") |
@@ -95,6 +117,8 @@ object LogKey extends Enumeration { | |||
val SHUFFLE_MERGE_ID = Value | |||
val SIZE = Value | |||
val SLEEP_TIME = Value | |||
val SLEEP_TIME_SECONDS = Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used.
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
Outdated
Show resolved
Hide resolved
Thanks, merging to master |
@dtenedor @gengliangwang |
I submitted a follow-up PR, let it recover first |
### What changes were proposed in this pull request? The pr aims to restore GA's master workflow. ### Why are the changes needed? Make GA's master workflow happy. After #45904, unfortunately, GA's master workflow has failed. https://github.com/apache/spark/actions/runs/8611002654/job/23597440628 <img width="967" alt="image" src="https://github.com/apache/spark/assets/15246973/5a12af6f-8d3d-491b-afdd-61e3bed20f47"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually test. Pas GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45958 from panbingkun/SPARK-47581_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Gengliang Wang <gengliang@apache.org>
Thanks @panbingkun for the fix!! |
What changes were proposed in this pull request?
Migrate logWarning with variables of the Catalyst module to structured logging framework. This transforms the logWarning entries of the following API
to
Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.
Does this PR introduce any user-facing change?
Yes, Spark core logs will contain additional MDC
How was this patch tested?
Compiler and scala style checks, as well as code review.
Was this patch authored or co-authored using generative AI tooling?
No