Skip to content

test(background): fix flaky approval-wait tests via wait_for_status#2008

Open
ahyangyi wants to merge 1 commit intoMoonshotAI:mainfrom
ahyangyi:head-wait-for-status
Open

test(background): fix flaky approval-wait tests via wait_for_status#2008
ahyangyi wants to merge 1 commit intoMoonshotAI:mainfrom
ahyangyi:head-wait-for-status

Conversation

@ahyangyi
Copy link
Copy Markdown
Contributor

@ahyangyi ahyangyi commented Apr 22, 2026

Related Issue

N/A

Description

Two tests in test_agent_tool.py polled task status with tight 200ms budgets (20 iterations of 10ms sleeps), which flake on slow runners. The status transition goes through an asyncio.create_task + asyncio.to_thread hop in BackgroundAgentRunner._apply_approval_runtime_event, so the wire-visible tool-call publication can race ahead of the on-disk status flip.

Add an event-driven wait_for_status primitive on BackgroundTaskManager: each mark_task* writer now calls _notify_status_changed, which resolves any futures registered by concurrent wait_for_status calls. This avoids changing production behavior while giving tests a deterministic observation point for non-terminal transitions (e.g. 'awaiting_approval').

Replace the polling loops in:

  • test_agent_tool_background_agent_waits_for_approval
  • test_task_stop_kills_background_agent_waiting_for_approval with wait_for_status(task_id, 'awaiting_approval', timeout_s=2).

Add unit tests covering the new primitive: immediate return, transition event wake-up, timeout, thread-boundary notification, and predicate form.

Checklist

  • I have read the CONTRIBUTING document.
  • I have linked the related issue, if any.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have run make gen-changelog to update the changelog.
  • I have run make gen-docs to update the user documentation.

Open in Devin Review

Copilot AI review requested due to automatic review settings April 22, 2026 14:13
chatgpt-codex-connector[bot]

This comment was marked as resolved.

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

This comment was marked as resolved.

@ahyangyi ahyangyi force-pushed the head-wait-for-status branch 3 times, most recently from 7b0cf9c to 23186ca Compare April 22, 2026 16:40
@ahyangyi ahyangyi requested a review from Copilot April 22, 2026 20:06

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

@ahyangyi ahyangyi force-pushed the head-wait-for-status branch from 9332957 to 4a150ed Compare April 23, 2026 10:44
Two tests in test_agent_tool.py polled task status with tight 200ms
budgets (20 iterations of 10ms sleeps), which flake on slow runners.
The status transition goes through an asyncio.create_task +
asyncio.to_thread hop in BackgroundAgentRunner._apply_approval_runtime_event,
so the wire-visible tool-call publication can race ahead of the
on-disk status flip.

Add an event-driven wait_for_status primitive on BackgroundTaskManager:
each _mark_task_* writer now calls _notify_status_changed, which
resolves any futures registered by concurrent wait_for_status calls.
This avoids changing production behavior while giving tests a
deterministic observation point for non-terminal transitions
(e.g. 'awaiting_approval').

To avoid a lost-wakeup race where a notification fires after the store
read but before the future is registered, the waiter registers its
future BEFORE reading the store. The post-registration merged_view
then either observes the target status (and returns immediately) or
the future will be resolved by any subsequent notification.

The waiter removes its future in a finally block so timed-out or
cancelled waits do not accumulate stale entries. Because
_resolve_status_waiters pops the whole list atomically, the cleanup
tolerates the list already being gone; empty lists are dropped so the
dict cannot grow unboundedly.

The cross-thread branch of _notify_status_changed checks
loop.is_closed() and also wraps call_soon_threadsafe in try/except
RuntimeError, so a background agent_runner thread that races with
event-loop shutdown cannot surface a spurious error.

Replace the polling loops in:
  - test_agent_tool_background_agent_waits_for_approval
  - test_task_stop_kills_background_agent_waiting_for_approval
with wait_for_status(task_id, 'awaiting_approval', timeout_s=2).

Add unit tests covering the new primitive: immediate return, transition
event wake-up, timeout, thread-boundary notification, predicate form,
cleanup on timeout and cancellation, the register-before-read no-
lost-wakeup property, and the closed-loop no-op guarantee.
@ahyangyi ahyangyi force-pushed the head-wait-for-status branch from 5dbabae to 959376d Compare April 24, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants