Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework #46151

Closed
wants to merge 9 commits into from

Conversation

zeotuan
Copy link
Contributor

@zeotuan zeotuan commented Apr 21, 2024

The PR aims to migrate logInfo in module MLLib with variables to structured logging framework.

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

@zeotuan zeotuan marked this pull request as ready for review April 22, 2024 05:58
@zeotuan
Copy link
Contributor Author

zeotuan commented Apr 22, 2024

cc @gengliangwang

@@ -550,14 +553,15 @@ class KMeans @Since("1.5.0") (
blocks.unpersist()

val iterationTimeInSeconds = (System.nanoTime() - iterationStartTime) / 1e9
instr.logInfo(f"Iterations took $iterationTimeInSeconds%.3f seconds.")
instr.logInfo(log"Iterations took ${MDC(TIME_UNITS, iterationTimeInSeconds%.3f)} seconds.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: As per #45903, we are using ms for all the time related context

logInfo(s"Stage class: $className")
logInfo(s"Stage uid: ${stage.uid}")
logInfo(log"Stage class: ${MDC(CLASS_NAME, className)}")
logInfo(log"Stage uid: ${MDC(STAGE_ID, stage.uid)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logInfo(log"Stage uid: ${MDC(STAGE_ID, stage.uid)}")
logInfo(log"Stage uid: ${MDC(PIPELINE_STAGE_UID, stage.uid)}")

@@ -253,6 +254,13 @@ private[spark] class OptionalInstrumentation private(
}
}

override def logInfo(logEntry: LogEntry): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that classes that use the OptionalInstrumentation for logging such as WeightedLeastSquares, CrossValidator can also log MessageWithContext. Else I will get Type Mismatched String => MessageWithContext

} else {
logInfo(s"KMeans converged in $iteration iterations.")
logInfo(log"KMeans converged in " +
log"${MDC(NUM_ITERATIONS, iteration)} iterations.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we merge into one line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged to one line

@gengliangwang
Copy link
Member

@zeotuan Thanks for the work! LGTM overall except for some minor comments.

@gengliangwang
Copy link
Member

Thanks, merging to master

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…ed logging framework

The PR aims to migrate `logInfo` in module MLLib with variables to structured logging framework.

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46151 from zeotuan/mllibInfo.

Lead-authored-by: Tuan Pham <Tuan.Pham@wisetechglobal.com>
Co-authored-by: zeotuan <zeotuan@gmailo.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants