[SPARK-49423][CONNECT][SQL] Consolidate Observation in sql/api #47921

hvanhovell · 2024-08-28T23:27:06Z

What changes were proposed in this pull request?

This PR moves Observation into sql/api. For classic I moved the wiring into the observe method itself, and the required listener is now part of the org.apache.spark.sql.internal package. I have also take the liberty to get rid of most of the homegrown locking, and I have replaced it with a promise.

Why are the changes needed?

We are creating a shared interface for the classic and connect Scala DataFrame API. This class is part of that API.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

hvanhovell · 2024-08-28T23:31:41Z

sql/api/src/main/scala/org/apache/spark/sql/Observation.scala

-      }
-    }
+  private[spark] def setMetricsAndNotify(metrics: Row): Boolean = {
+    val metricsMap = metrics.getValuesMap(metrics.schema.map(_.name))


I am still wondering why this needed to be a map...

hvanhovell · 2024-08-28T23:32:21Z

@xupefei can you take a look?

hvanhovell · 2024-08-30T12:01:08Z

Merging to master

### What changes were proposed in this pull request? This patch proposes a fix to a deadlock bug in `Observation`. It replaces `synchronized` locks with a promise to avoid deadlock happened between `get` and `onFinish` methods ### Why are the changes needed? `Observation` class has been evolved a few times during Spark 3.5 to Spark 4.0.0. Previously it uses locking mechanism (`synchronized`) between `get` and `onFinish` methods to coordinate metrics update and retrieval. But it has a potential deadlocking bug. If `get` is called before `ObservationListener` is triggered to call `onFinish`, `get` will forever be waiting for metrics because it locks the observation object by `synchronized` so later `onFinish` call is locked out from updating the metrics. This locking mechanism was replaced by a promise by #47921 that is a large refacroring on observation feature. But in the PR, I don’t see the deadlock bug was mentioned, and there is no bug fix PR proposed to earlier versions. So I think that the bug was not known and the fix is unintentional in Spark 4.0.0. The bug is still in Spark 3.5 branch. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The deadlock bug was hit by customer and is tricky to reproducing by unit test. This patch should pass existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #52657 from viirya/fix_observation_deadlock. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Gengliang Wang <gengliang@apache.org>

hvanhovell added 2 commits August 28, 2024 19:17

Consolidate Observation in sql/api

0cdb8be

Docs

01a82cb

github-actions bot added SQL CONNECT labels Aug 28, 2024

hvanhovell commented Aug 28, 2024

View reviewed changes

MiMa

092816a

github-actions bot added the BUILD label Aug 28, 2024

fix

21803a5

HyukjinKwon approved these changes Aug 29, 2024

View reviewed changes

Fix python

bb2897f

github-actions bot added the PYTHON label Aug 29, 2024

Fixes...

2008372

asfgit closed this in 528ba0e Aug 30, 2024

viirya mentioned this pull request Oct 19, 2025

[SPARK-53948][SQL][3.5] Fix deadlock in Observation #52657

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-49423][CONNECT][SQL] Consolidate Observation in sql/api #47921

[SPARK-49423][CONNECT][SQL] Consolidate Observation in sql/api #47921

Uh oh!

hvanhovell commented Aug 28, 2024

Uh oh!

hvanhovell Aug 28, 2024

Uh oh!

hvanhovell commented Aug 28, 2024

Uh oh!

hvanhovell commented Aug 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-49423][CONNECT][SQL] Consolidate Observation in sql/api #47921

[SPARK-49423][CONNECT][SQL] Consolidate Observation in sql/api #47921

Uh oh!

Conversation

hvanhovell commented Aug 28, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

hvanhovell Aug 28, 2024

Choose a reason for hiding this comment

Uh oh!

hvanhovell commented Aug 28, 2024

Uh oh!

hvanhovell commented Aug 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants