Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to structured logging framework #45904

Closed
wants to merge 19 commits into from

Conversation

dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Apr 5, 2024

What changes were proposed in this pull request?

Migrate logWarning with variables of the Catalyst module to structured logging framework. This transforms the logWarning entries of the following API

def logWarning(msg: => String): Unit

to

def logWarning(entry: LogEntry): Unit

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

Yes, Spark core logs will contain additional MDC

How was this patch tested?

Compiler and scala style checks, as well as code review.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Apr 5, 2024
@dtenedor
Copy link
Contributor Author

dtenedor commented Apr 5, 2024

@gengliangwang here is the structured logging migration for the logWarning calls within Catalyst per https://issues.apache.org/jira/browse/SPARK-47581.

@gengliangwang gengliangwang changed the title [SPARK-47581][INFRA] SQL catalyst: Migrate logWarning with variables to structured logging framework [SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to structured logging framework Apr 8, 2024
@@ -284,7 +287,8 @@ object StreamingJoinHelper extends PredicateHelper with Logging {
Seq(negateIfNeeded(castedLit, negate))
case a @ _ =>
logWarning(
s"Failed to extract state value watermark from condition $exprToCollectFrom due to $a")
log"Failed to extract state value watermark from condition " +
log"${MDC(JOIN_CONDITION, exprToCollectFrom)} due to ${MDC(JOIN_CONDITION, a)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a here? It seems that you are using a same key for two variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about this, I did not know we had to have unique keys for the log keys. Fixed now.

|Expected: ${fieldNames(i)} but found: ${columnNames(i)}
|$source""".stripMargin)
log"""|CSV header does not conform to the schema.
| Header: ${MDC(COLUMN_NAME, columnNames.mkString(", "))}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use COLUMN_NAME for 4 different variables...

s" them as string type as same as Spark 3.0 and earlier")
logWarning(log"The Spark cast operator does not support char/varchar type and simply treats" +
log" them as string type. Please use string type directly to avoid confusion. Otherwise," +
log" you can set ${MDC(SQL_CONF_KEY, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log" you can set ${MDC(SQL_CONF_KEY, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " +
log" you can set ${MDC(CONFIG, SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key)} " +

s"${l.dataType} to $targetType due to ${e.getMessage}")
logWarning(log"Failed to cast default value '${MDC(COLUMN_DEFAULT_VALUE, l)}' " +
log"for column ${MDC(COLUMN_NAME, colName)} " +
log"from ${MDC(COLUMN_DATA_TYPE, l.dataType)} " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't use COLUMN_DATA_TYPE for two variables here.

log"plan had length ${MDC(QUERY_PLAN_LENGTH, length)} " +
log"and the maximum is ${MDC(QUERY_PLAN_LENGTH, maxLength)}. This behavior " +
log"can be adjusted by setting " +
log"'${MDC(QUERY_PLAN_LENGTH, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log"'${MDC(QUERY_PLAN_LENGTH, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.")
log"'${MDC(CONFIG, SQLConf.MAX_PLAN_STRING_LENGTH.key)}'.")

@@ -95,6 +117,8 @@ object LogKey extends Enumeration {
val SHUFFLE_MERGE_ID = Value
val SIZE = Value
val SLEEP_TIME = Value
val SLEEP_TIME_SECONDS = Value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used.

@gengliangwang
Copy link
Member

Thanks, merging to master

@panbingkun
Copy link
Contributor

panbingkun commented Apr 9, 2024

@dtenedor @gengliangwang
Unfortunately, GA's master workflow has failed.
https://github.com/apache/spark/actions/runs/8611002654/job/23597440628
image

@panbingkun
Copy link
Contributor

I submitted a follow-up PR, let it recover first
#45958

gengliangwang pushed a commit that referenced this pull request Apr 9, 2024
### What changes were proposed in this pull request?
The pr aims to restore GA's master workflow.

### Why are the changes needed?
Make  GA's master workflow happy.
After #45904, unfortunately, GA's master workflow has failed.
https://github.com/apache/spark/actions/runs/8611002654/job/23597440628
<img width="967" alt="image" src="https://github.com/apache/spark/assets/15246973/5a12af6f-8d3d-491b-afdd-61e3bed20f47">

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually test.
Pas GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45958 from panbingkun/SPARK-47581_FOLLOWUP.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
@dtenedor
Copy link
Contributor Author

dtenedor commented Apr 9, 2024

Thanks @panbingkun for the fix!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants