feat(grep,glob): abort ripgrep mid-search instead of waiting out timeout#143
Merged
Conversation
…out (#142) Before this fix, the ripgrep wrapper used ``subprocess.run(timeout=20)``. A SIGINT/ESC that tripped the abort controller mid-search had to wait out the full 20-second timeout before the subprocess returned and the agent loop could observe the cancellation. On a large repo Glob and Grep felt exactly like the pre-PR-#135 Bash supervisor — "ESC is ignored for 20+ seconds." Add ``_run_rg_with_abort`` that mirrors bash_tool's pattern: Popen + 50ms poll loop watching both the deadline and an ``AbortSignal``, SIGTERM → 2s grace → SIGKILL. New ``RipgrepAbortedError`` is distinct from ``RipgrepTimeoutError`` because the two outcomes warrant different downstream handling — timeout surfaces partial results (useful to the agent), abort drops them (the user already cancelled). Glob and Grep pass ``context.abort_controller.signal`` to the helper and re-raise ``AbortError`` on ``RipgrepAbortedError`` so the agent loop's ``except AbortError: raise`` branch from PR #135 unwinds to the outer cancel boundary. ``ripgrep()``'s ``abort_signal=None`` default keeps SDK consumers of the bare helper working without changes. Six regression tests pin the contract: bare helper sanity / pre-call trip / abort-tripped-subprocess-returns-promptly (deterministic event handshake against ``sleep`` — also exercises the supervisor without requiring ripgrep installed) / Glob abort → AbortError / Grep abort → AbortError / unavailable-rg path unchanged. Read, WebFetch, WebSearch and MCP are deliberately deferred: * Read: ``open().read()`` returns in <1s for typical files; the agent loop's pre-dispatch ``_check_cancel`` already short-circuits abort fired between tools. * WebFetch/WebSearch: ``urllib.request.urlopen(timeout=15)`` is bounded at 15s; a watchdog-thread approach is bigger surgery best done as a separate PR. * MCP: out-of-process JSON-RPC needs transport-level cancellation — cross-cutting change, separate PR. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
5 tasks
ericleepi314
added a commit
that referenced
this pull request
May 15, 2026
Even after PRs #135/#140/#141/#143 wired ESC through subagents, headless SIGINT, the non-optional ``tool_context.abort_controller`` contract, and ripgrep mid-search, the user-visible ESC latency was still 20+ seconds when the model emitted a multi-tool_use response. Root cause: the provider's streaming HTTP read had no way to observe the abort controller. ``query.py`` passed no abort plumbing to ``chat_stream_response``; the only existing interrupt point was the ``on_text_chunk`` callback, which never fires for a turn that emits tool_use blocks without intervening text. The model could spend ~20s generating eight parallel ``Write`` blocks, and only after the stream returned naturally would the outer query loop check the abort signal and yield "Interrupted by user" for all eight. Thread an ``AbortSignal`` through ``chat_stream_response``. In the Anthropic provider, register a listener that calls ``stream.response.close()`` (the same close pattern the existing ``StreamWatchdog`` uses for idle timeout). Closing the underlying HTTP response causes the SDK's blocking socket read to raise in the consumer thread; the provider catches it, detects the abort via the signal state (not the exception class — different SDK versions raise different classes), and re-raises ``AbortError``. Three defenses against the registration race: * Pre-call fast-path bails when the signal was already tripped at call entry — skips the round-trip entirely. * Register-then-recheck after entering the stream context — closes the sub-microsecond window between an ``aborted`` read and a ``add_listener`` append where ``_fire`` could snapshot the listener list and silently drop our newly-appended listener. * Post-with-block recheck — surfaces a signal that fires between ``__exit__`` and ``return``. Plumbing: * ``query.py``: ``_call_model_sync`` grows ``abort_signal`` keyword and forwards to the provider; call site passes ``params.abort_controller.signal``. New ``except AbortError: raise`` in ``_call_model_sync`` and ``except AbortError: pass`` in the query loop route the cancel to the existing post-API abort-handling block in one place. * ``agent_loop.py``: ``_call_provider_for_turn`` grows ``cancel_signal`` and forwards as ``abort_signal=`` to the provider; ``run_agent_loop`` passes through. * ``minimax_provider.py`` / ``openai_compatible.py``: pre-call fast-path for parity (mid-stream listener for these providers is a future PR — same underlying anthropic SDK for Minimax makes it a small lift). Five regression tests pin the contract: pre-aborted fast-path skips ``_ensure_client``, mid-stream abort closes the stream and raises ``AbortError`` within <1s (vs the 20s+ symptom this PR fixes), unaborted streams return normally, ``abort_signal=None`` is a no-op for legacy callers, and the listener detaches after normal completion. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Glob/Grepusedsubprocess.run(timeout=20)for ripgrep — a SIGINT/ESC that tripped the abort controller had to wait out the full 20s timeout before the agent loop noticedRipgrepAbortedErroris distinct fromRipgrepTimeoutError: timeout surfaces partial results (useful to the agent), abort drops them (user already cancelled — surfacing them would be noise)Why
PR #135 wired
tool_context.abort_controllerso subagents and Bash honor ESC. PR #141 brought the same to headless. Other long-running tools (Read, Glob, Grep, WebFetch, WebSearch, MCP) still don't poll the signal — they return promptly only at natural completion. This PR closes the gap for the search tools, the most visible offenders: searching a large monorepo can pin the user's terminal for 20s after they ESC.Changes
src/tool_system/utils/ripgrep.py:_run_rg_with_abort(argv, *, timeout_s, abort_signal)— Popen-based supervisor mirroringbash_tool._run_bash_with_abort(50ms poll, SIGTERM → 2s grace → SIGKILL,start_new_session=True/ WindowsCREATE_NEW_PROCESS_GROUPfor process-group teardown)RipgrepAbortedError(message, partial_results=...)raised when the signal tripsripgrep(...)grows anabort_signal: AbortSignal | None = Nonekeyword param; default-None preserves previous semantics exactly for SDK consumerssrc/tool_system/tools/glob.py+src/tool_system/tools/grep.py:context.abort_controller.signaltoripgrep(...)RipgrepAbortedErrorand re-raiseAbortError(...)so the agent loop'sexcept AbortError: raisebranch (PR fix(esc): propagate cancel signal into tool_context for subagents #135) unwinds to the outer cancel boundary instead of synthesizing a partial tool resulttests/test_ripgrep_abort.py(new) — 6 regression tests:RipgrepAbortedErrorfastsleep 60(no real ripgrep required — also serves as CI coverage on machines withoutrginstalled)AbortErrorAbortErrorRipgrepUnavailableErrorstill raised correctlyTest plan
Grepagainst a large monorepo, press ESC — should unwind within ~1s instead of 20sOut of scope (deferred to separate follow-ups)
_check_cancelalready short-circuits abort fired between tools.urllib.request.urlopen(timeout=15). Cleanest abort requires a watchdog thread that closes the socket — separate PR.Originally PR #142 stacked on PR #141; re-opened against
mainafter #141 merged but #142's content stayed on the stacked branch.🤖 Generated with Claude Code