[SPARK-57073][SS][PYTHON][TESTS] Fix flaky test_listener_events_spark_command#56109
Draft
zhengruifeng wants to merge 1 commit into
Draft
[SPARK-57073][SS][PYTHON][TESTS] Fix flaky test_listener_events_spark_command#56109zhengruifeng wants to merge 1 commit into
zhengruifeng wants to merge 1 commit into
Conversation
0c58ed1 to
5c34f7e
Compare
### What changes were proposed in this pull request? Reintroduce the `time.sleep(30)` before reading the listener-written tables in `test_listener_events_spark_command`, and add a sticky comment explaining why the sleep must not be removed or replaced with `@eventually`. ### Why are the changes needed? SPARK-54957 replaced a `time.sleep(60)` with `@eventually(timeout=60, catch_assertions=True)`. `eventually` only retries `AssertionError` by default. `spark.read.table()` on a missing table raises `AnalysisException` (`TABLE_OR_VIEW_NOT_FOUND`), which is not in the caught set, so the decorator aborts on the first attempt instead of polling. The `onQueryTerminated` callback fires asynchronously after `q.stop()`, so the `listener_terminated_events` table is sometimes not yet written when the test first reads it. This makes the test flaky; it has failed on the `Build / Python-only (master, Python 3.12, MacOS26)` scheduled workflow on: - 2026-05-23 — https://github.com/apache/spark/actions/runs/26346300968/job/77556662680 - 2026-05-25 — https://github.com/apache/spark/actions/runs/26423905857/job/77783724134 Both failed with the same exception at the same line. ### Does this PR introduce _any_ user-facing change? No, test-only change. ### How was this patch tested? Existing test `pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests.test_listener_events_spark_command`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7
5c34f7e to
7065ea1
Compare
Contributor
|
I think the better way to solve this is to catch the potential exception in eventually. We can either add |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Reintroduce the
time.sleep(10)before reading the listener-written tables intest_listener_events_spark_command, and add a sticky comment explaining why the sleep must not be removed or replaced with@eventually.Why are the changes needed?
SPARK-54957 replaced a
time.sleep(60)with@eventually(timeout=60, catch_assertions=True).eventuallyonly retriesAssertionErrorby default.spark.read.table()on a missing table raisesAnalysisException(TABLE_OR_VIEW_NOT_FOUND), which is not in the caught set, so the decorator aborts on the first attempt instead of polling.The
onQueryTerminatedcallback fires asynchronously afterq.stop(), so thelistener_terminated_eventstable is sometimes not yet written when the test first reads it. This makes the test flaky; it has failed on theBuild / Python-only (master, Python 3.12, MacOS26)scheduled workflow on:Both failed with the same exception at the same line.
A 10s sleep gives an order-of-magnitude safety margin over the observed sub-second race window, while avoiding the 60s waste that SPARK-54957 was trying to eliminate.
Does this PR introduce any user-facing change?
No, test-only change.
How was this patch tested?
Existing test
pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests.test_listener_events_spark_command.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7