test(background): fix flaky approval-wait tests via wait_for_status#2008
Open
ahyangyi wants to merge 1 commit intoMoonshotAI:mainfrom
Open
test(background): fix flaky approval-wait tests via wait_for_status#2008ahyangyi wants to merge 1 commit intoMoonshotAI:mainfrom
ahyangyi wants to merge 1 commit intoMoonshotAI:mainfrom
Conversation
5 tasks
a2a1824 to
99fc1cb
Compare
7b0cf9c to
23186ca
Compare
23186ca to
9b0e3a2
Compare
9b0e3a2 to
9332957
Compare
9332957 to
4a150ed
Compare
Two tests in test_agent_tool.py polled task status with tight 200ms budgets (20 iterations of 10ms sleeps), which flake on slow runners. The status transition goes through an asyncio.create_task + asyncio.to_thread hop in BackgroundAgentRunner._apply_approval_runtime_event, so the wire-visible tool-call publication can race ahead of the on-disk status flip. Add an event-driven wait_for_status primitive on BackgroundTaskManager: each _mark_task_* writer now calls _notify_status_changed, which resolves any futures registered by concurrent wait_for_status calls. This avoids changing production behavior while giving tests a deterministic observation point for non-terminal transitions (e.g. 'awaiting_approval'). To avoid a lost-wakeup race where a notification fires after the store read but before the future is registered, the waiter registers its future BEFORE reading the store. The post-registration merged_view then either observes the target status (and returns immediately) or the future will be resolved by any subsequent notification. The waiter removes its future in a finally block so timed-out or cancelled waits do not accumulate stale entries. Because _resolve_status_waiters pops the whole list atomically, the cleanup tolerates the list already being gone; empty lists are dropped so the dict cannot grow unboundedly. The cross-thread branch of _notify_status_changed checks loop.is_closed() and also wraps call_soon_threadsafe in try/except RuntimeError, so a background agent_runner thread that races with event-loop shutdown cannot surface a spurious error. Replace the polling loops in: - test_agent_tool_background_agent_waits_for_approval - test_task_stop_kills_background_agent_waiting_for_approval with wait_for_status(task_id, 'awaiting_approval', timeout_s=2). Add unit tests covering the new primitive: immediate return, transition event wake-up, timeout, thread-boundary notification, predicate form, cleanup on timeout and cancellation, the register-before-read no- lost-wakeup property, and the closed-loop no-op guarantee.
5dbabae to
959376d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issue
N/A
Description
Two tests in test_agent_tool.py polled task status with tight 200ms budgets (20 iterations of 10ms sleeps), which flake on slow runners. The status transition goes through an asyncio.create_task + asyncio.to_thread hop in BackgroundAgentRunner._apply_approval_runtime_event, so the wire-visible tool-call publication can race ahead of the on-disk status flip.
Add an event-driven wait_for_status primitive on BackgroundTaskManager: each mark_task* writer now calls _notify_status_changed, which resolves any futures registered by concurrent wait_for_status calls. This avoids changing production behavior while giving tests a deterministic observation point for non-terminal transitions (e.g. 'awaiting_approval').
Replace the polling loops in:
Add unit tests covering the new primitive: immediate return, transition event wake-up, timeout, thread-boundary notification, and predicate form.
Checklist
make gen-changelogto update the changelog.make gen-docsto update the user documentation.