fix(sdk): poll-driven cross-reasoner pause propagation#562
Merged
Conversation
The v0.1.80 listener-based propagation never fired in production because
it depended on the AsyncExecutionManager's SSE event-stream loop, which is
gated behind ``enable_event_stream`` (default False) and was not enabled
on any deployed service. Result: parents waiting on an ``app.call`` that
hit a hax-sdk human-approval gate kept ticking wallclock, and the parent
watchdog tripped at exactly the budget despite the visible WAITING state
(reproduced on Railway run_1778268481826_8c9dd544).
Replace the SSE-listener mechanism with a poll-driven toggle inside
``wait_for_result``: when the awaited child's status reads as WAITING
(updated unconditionally by the existing polling task), pause the parent's
pause-clock; when it reads back as not-WAITING, end the pause. A finally
block closes any in-flight pause if we exit via terminal/timeout. No SSE
subscription required, no listener registry, no refcount machinery.
Removes from ``async_execution_manager.py``:
- ``register_status_listener`` / ``_status_listeners`` / ``_fire_status_listeners``
- the ``execution.waiting`` event-type override (the WAITING-status branch
in ``_handle_event_stream_payload`` stays for the case where SSE is on)
Removes from ``agent.py``:
- ``_on_child_status_change`` and the ``_waiting_children`` /
``_parent_paused_children`` registries it consumed
- the listener registration in ``call()``; the ``pause_clock`` kwarg
pass-through (the actual fix) is preserved
Tests: the previous tests poked ``_on_child_status_change`` and
``_handle_event_stream_payload`` directly, which bypassed the
``enable_event_stream`` gate and never exercised the production data path.
The new tests drive the production path: they update ``_executions[id].status``
the same way the polling task does and assert that ``wait_for_result``
toggles the parent clock and survives a long WAITING window.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Performance
✓ No regressions detected |
Contributor
📊 Coverage gateThresholds from
✅ Gate passedNo surface regressed past the allowed threshold and the aggregate stayed above the floor. |
Contributor
📐 Patch coverage gateThreshold: 80% on lines this PR touches vs
✅ Patch gate passedEvery surface whose lines were touched by this PR has patch coverage at or above the threshold. |
2 tasks
AbirAbbas
added a commit
to Agent-Field/SWE-AF
that referenced
this pull request
May 9, 2026
Picks up the poll-driven cross-reasoner pause propagation fix shipped in agentfield v0.1.81 (Agent-Field/agentfield#562). v0.1.80 attempted the same fix via an SSE listener that was gated behind ``enable_event_stream`` (default off) and never fired in production — observed on a long implement_from_issue run that timed out at exactly the 7200s wallclock budget despite a long hax-sdk approval gap. Bumping the constraint string is required to bust the Docker pip-install layer cache; otherwise the cached layer would keep restoring 0.1.80 even once 0.1.81 is on PyPI. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The v0.1.80 listener-based pause propagation never fired in production because it depended on the SDK's SSE event-stream subscription, which is gated behind
enable_event_stream(defaultFalse) and not enabled on any deployed service.Reproduced on Railway run
run_1778268481826_8c9dd544:implement_from_issuetimed out at exactly 7200s wallclock despite a ~21min hax-sdk approval gap clearly visible in the SWE-AF logs (20:10:42 → 20:31:35) and a second silent gap before the timeout.Why the listener didn't fire
execution_waitingto the bus (verified inexecute_approval.go).enable_event_stream=True. Default isFalseand no service overrides it. Verified viarailway ssh ... env: no service has the var set, and the only subscriber on the live SSE bus is the user's browser.register_status_listenercallbacks were never invoked,pause_clock.start_pause()was never called,total_paused()stayed at 0,active_elapsed == wallclock, watchdog tripped at the budget.The previous tests poked
_on_child_status_changeand_handle_event_stream_payloaddirectly, which bypassed theenable_event_streamgate entirely. They passed; production was broken.Fix
Replace the SSE-listener mechanism with a poll-driven toggle inside
wait_for_result:_executions[id].statusfrom control-plane responses (_update_execution_from_status).wait_for_resultnow reads that status each loop iteration. When it reads asWAITING, the parent'spause_clock.start_pause()is called; when it reads back as anything else,end_pause()is called. Afinallyblock closes any in-flight pause if we exit via terminal / timeout / exception.enable_event_streamis on.Removed
async_execution_manager.py:register_status_listener,_status_listeners,_fire_status_listeners, theexecution.waitingevent-type override (the WAITING-status branch in_handle_event_stream_payloadstays for when SSE is on).agent.py:_on_child_status_change,_waiting_children,_parent_paused_children, the listener registration incall(). Thepause_clockkwarg pass-through (the actual fix) is preserved.Net: −387 / +274 lines across 4 files.
Tests
Replaced the listener tests with integration tests that drive the production data path: they update
_executions[id].statusthe same way the polling task does, then assertwait_for_resulttoggles the parent clock and survives a longWAITINGwindow. Includes a regression for the headline scenario (WAITING window > wait_timeout, must complete) and a finally-block check for cancelled-while-waiting.Test plan
cd sdk/python && ruff check .— cleancd sdk/python && python -m pytest --no-cov— 1482 passed, 4 skipped (integration tests requiring server sources, pre-existing)implement_from_issueflow that hits the hax-sdk gate; parent should not time out at wallclock 2hr if active work < 2hr🤖 Generated with Claude Code