fix(tests): poll for on_timeout middleware in prefork test#154
Merged
Conversation
handle_result() flips the DB status to 'dead' before dispatch_outcome fires on_timeout, so a fast test thread can observe 'dead' and assert on timeouts_seen before the worker reaches the middleware call. Poll for the spy with a small budget instead.
b27c6ae to
4f88e58
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Master CI run 25539963938 failed on
tests/worker/test_prefork.py::test_prefork_kills_hung_task(Python 3.10 / Ubuntu) with:The job was killed correctly (status
dead, error"timed out", well under the 12s budget), but theon_timeoutspy hadn't fired yet.Race: the worker thread, on receiving the timeout result, runs
_wait_for_terminalpolls the DB and returns as soon as it seesdead— which can happen between those two calls. On a fast Ubuntu 3.10 runner the test thread observesdead, races back, and asserts ontimeouts_seenbeforedispatch_outcomereaches the middleware call.Fix: wait for the side effect we actually care about. The
poll_untilfixture intests/conftest.pyalready exists for this exact pattern (used intest_prefork_cancel_running_job_stops_quickly). Replace the bareassert job.id in timeouts_seenwithpoll_until(lambda: job.id in timeouts_seen, timeout=5, …).Five seconds is well above the worst-case window between the DB write and the middleware call (which is microseconds in practice), but it gives slow CI runners enough headroom to never flake again.
The middleware ordering itself isn't a bug —
dispatch_outcomedeliberately runs afterhandle_resultso the DB is the source of truth for status. The test was simply asserting on a moving target.Test plan
uv run python -m pytest tests/worker/test_prefork.py::test_prefork_kills_hung_task -vpasses locallyruff check,mypyclean