refactor(context): make ToolContext.abort_controller non-optional#140
Merged
Conversation
The previous ``Any | None = None`` default papered over a hazard class that masked the ESC-into-subagent regression PR #135 fixed: every caller that owns the per-run cancellation lifecycle (TUI bridge, REPL engine) had to remember to plumb the controller onto the context, and a forgotten plumbing call silently degraded ESC propagation rather than failing loudly. Use ``field(default_factory=AbortController)`` so every fresh context carries a real (untripped) controller. Downstream readers (subagent inheritance, streaming executor, Bash supervisor, tool execution, tool hooks, tasks_v2 abort fast-path) drop the defensive ``or AbortController()`` and ``if abort_ctrl and …`` guards that existed only to compensate for the impossible-now ``None`` state. The TUI bridge's ``_finish()`` now installs a fresh controller instead of clearing to ``None`` so the dataclass invariant holds across runs. Three new contract tests pin the invariants: default exists, default is per-context (not shared), explicit override still wins. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
ed5c9d7 to
a0f9e4c
Compare
This was referenced May 15, 2026
ericleepi314
added a commit
that referenced
this pull request
May 15, 2026
Even after PRs #135/#140/#141/#143 wired ESC through subagents, headless SIGINT, the non-optional ``tool_context.abort_controller`` contract, and ripgrep mid-search, the user-visible ESC latency was still 20+ seconds when the model emitted a multi-tool_use response. Root cause: the provider's streaming HTTP read had no way to observe the abort controller. ``query.py`` passed no abort plumbing to ``chat_stream_response``; the only existing interrupt point was the ``on_text_chunk`` callback, which never fires for a turn that emits tool_use blocks without intervening text. The model could spend ~20s generating eight parallel ``Write`` blocks, and only after the stream returned naturally would the outer query loop check the abort signal and yield "Interrupted by user" for all eight. Thread an ``AbortSignal`` through ``chat_stream_response``. In the Anthropic provider, register a listener that calls ``stream.response.close()`` (the same close pattern the existing ``StreamWatchdog`` uses for idle timeout). Closing the underlying HTTP response causes the SDK's blocking socket read to raise in the consumer thread; the provider catches it, detects the abort via the signal state (not the exception class — different SDK versions raise different classes), and re-raises ``AbortError``. Three defenses against the registration race: * Pre-call fast-path bails when the signal was already tripped at call entry — skips the round-trip entirely. * Register-then-recheck after entering the stream context — closes the sub-microsecond window between an ``aborted`` read and a ``add_listener`` append where ``_fire`` could snapshot the listener list and silently drop our newly-appended listener. * Post-with-block recheck — surfaces a signal that fires between ``__exit__`` and ``return``. Plumbing: * ``query.py``: ``_call_model_sync`` grows ``abort_signal`` keyword and forwards to the provider; call site passes ``params.abort_controller.signal``. New ``except AbortError: raise`` in ``_call_model_sync`` and ``except AbortError: pass`` in the query loop route the cancel to the existing post-API abort-handling block in one place. * ``agent_loop.py``: ``_call_provider_for_turn`` grows ``cancel_signal`` and forwards as ``abort_signal=`` to the provider; ``run_agent_loop`` passes through. * ``minimax_provider.py`` / ``openai_compatible.py``: pre-call fast-path for parity (mid-stream listener for these providers is a future PR — same underlying anthropic SDK for Minimax makes it a small lift). Five regression tests pin the contract: pre-aborted fast-path skips ``_ensure_client``, mid-stream abort closes the stream and raises ``AbortError`` within <1s (vs the 20s+ symptom this PR fixes), unaborted streams return normally, ``abort_signal=None`` is a no-op for legacy callers, and the listener detaches after normal completion. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ToolContext.abort_controller: AbortController = field(default_factory=AbortController)— every fresh context carries an untripped controller; bridge/engine still overwrite with their per-run controller, but tests, SDK callers, and any forgotten call site land on a safe defaultor AbortController()/if abort_ctrl and …defensive guards in 6 downstream readers (sibling+per-tool controllers, abort fast-paths, hook gates)Why
PR #135 wired
tool_context.abort_controllerso ESC actually reaches subagents and Bash. But the field stayed typedAny | Nonewith aNonedefault — meaning a future caller that forgets to plumb the controller would re-introduce the exact regression we just fixed, with the same silent-failure shape. This refactor tightens the contract so that's impossible.(Originally merged stacked-on-top-of #135 as PR #137; this PR re-applies the same content on top of current
mainafter #135 squash-merged and #136 landed.)Changes
src/tool_system/context.py—abort_controller: AbortController = field(default_factory=AbortController)src/agent/subagent_context.py— drop the impossible "parent has no controller → mint fresh" branch; the chain collapses toexplicit override > share parent's > child of parent'ssrc/agent/run_agent.py— same; sync subagents now reliably share with parent (which is exactly the path that broke ESC propagation before)src/services/tool_execution/streaming_executor.py— dropor AbortController()defensive wrapper; simplify truthiness guards in_get_abort_reasonand_on_tool_abort; drop now-unusedAbortControllerimportsrc/services/tool_execution/tool_execution.py— drop truthiness guard in the dispatch fast-abortsrc/services/tool_execution/tool_hooks.py— same, for PreToolUse hook cancellationsrc/tool_system/tools/bash/bash_tool.py— dropgetattr(..., None)indirection in_get_abort_signalsrc/tool_system/tools/tasks_v2.py— dropgetattr(..., None)indirection in the TaskOutput abort fast-pathsrc/tui/agent_bridge.py—_finish()installs a fresh controller instead of clearing toNone(the field is no longer optional)tests/test_esc_cancel_propagation.py— updated three assertions for the new contract (None→ "fresh, untripped controller")tests/test_tool_context_abort_default.py(new) — pins three contract invariants: default exists, per-context isolation, explicit override winsHook executor (
src/hooks/hook_executor.py) and stop hooks (src/query/stop_hooks.py) keep theirgetattr(..., None)defensive pattern because theirtool_use_context: Anysignatures accept mock contexts from tests and external callers; tightening them would expand scope.Test plan
🤖 Generated with Claude Code