Skip to content

feat(grep,glob): abort ripgrep mid-search instead of waiting out timeout#142

Merged
ericleepi314 merged 1 commit into
feat/headless-sigint-mid-toolfrom
feat/per-tool-abort-polling
May 15, 2026
Merged

feat(grep,glob): abort ripgrep mid-search instead of waiting out timeout#142
ericleepi314 merged 1 commit into
feat/headless-sigint-mid-toolfrom
feat/per-tool-abort-polling

Conversation

@ericleepi314
Copy link
Copy Markdown
Collaborator

Summary

  • Before: Glob/Grep used subprocess.run(timeout=20) for ripgrep — a SIGINT/ESC that tripped the abort controller had to wait out the full 20s timeout before the agent loop noticed
  • After: Popen + 50ms poll loop (mirrors bash_tool's pattern from PR fix(esc): propagate cancel signal into tool_context for subagents #135) watches both the deadline and the abort signal; SIGTERM → 2s grace → SIGKILL on abort, returning within ~150ms instead of 20s
  • New RipgrepAbortedError is distinct from RipgrepTimeoutError: timeout surfaces partial results (useful to the agent), abort drops them (user already cancelled — surfacing them would be noise)

Why

PR #135 wired tool_context.abort_controller so subagents and Bash honor ESC. Other long-running tools (Read, Glob, Grep, WebFetch, WebSearch, MCP) still don't poll the signal — they'd return promptly only at natural completion. This PR closes the gap for the search tools, the most visible offenders: searching a large monorepo can pin the user's terminal for 20s after they ESC.

Changes

  • src/tool_system/utils/ripgrep.py:
    • New _run_rg_with_abort(argv, *, timeout_s, abort_signal) — Popen-based supervisor mirroring bash_tool._run_bash_with_abort (50ms poll, SIGTERM → 2s grace → SIGKILL, start_new_session=True / Windows CREATE_NEW_PROCESS_GROUP for process-group teardown)
    • New RipgrepAbortedError(message, partial_results=...) raised when the signal trips
    • ripgrep(...) grows an abort_signal: AbortSignal | None = None keyword param; default-None preserves previous semantics exactly for SDK consumers
  • src/tool_system/tools/glob.py + src/tool_system/tools/grep.py:
  • tests/test_ripgrep_abort.py (new) — 6 regression tests:
    • Bare helper sanity: never-tripped signal preserves existing semantics
    • Pre-tripped signal raises RipgrepAbortedError fast
    • Deterministic abort-mid-subprocess test using sleep 60 (no real ripgrep required — also serves as CI coverage on machines without rg installed)
    • Glob abort → AbortError
    • Grep abort → AbortError
    • Unavailable-rg path: RipgrepUnavailableError still raised correctly

Test plan

  • 12 directly related tests pass (6 new + headless)
  • 4404 broader related tests pass
  • Critic subagent review: APPROVE (after applying recommended deterministic test variant + minor type/comment cleanups)
  • Manual: in a real Claude Code session, kick off Grep against a large monorepo, press ESC — should unwind within ~1s instead of 20s

Out of scope (deferred to separate follow-ups)

  • Read: typical reads complete in <1s; agent loop's pre-dispatch _check_cancel already short-circuits abort fired between tools.
  • WebFetch / WebSearch: bounded by urllib.request.urlopen(timeout=15). Cleanest abort requires a watchdog thread that closes the socket — separate PR.
  • MCP tools: out-of-process JSON-RPC; transport-level cancellation is a cross-cutting change across the entire MCP client.

Dependencies

🤖 Generated with Claude Code

Before this fix, the ripgrep wrapper used ``subprocess.run(timeout=20)``.
A SIGINT/ESC that tripped the abort controller mid-search had to wait
out the full 20-second timeout before the subprocess returned and the
agent loop could observe the cancellation. On a large repo Glob and
Grep felt exactly like the pre-PR-#135 Bash supervisor — "ESC is
ignored for 20+ seconds."

Add ``_run_rg_with_abort`` that mirrors bash_tool's pattern: Popen +
50ms poll loop watching both the deadline and an ``AbortSignal``,
SIGTERM → 2s grace → SIGKILL. New ``RipgrepAbortedError`` is distinct
from ``RipgrepTimeoutError`` because the two outcomes warrant
different downstream handling — timeout surfaces partial results
(useful to the agent), abort drops them (the user already cancelled).

Glob and Grep pass ``context.abort_controller.signal`` to the helper
and re-raise ``AbortError`` on ``RipgrepAbortedError`` so the agent
loop's ``except AbortError: raise`` branch from PR #135 unwinds to the
outer cancel boundary. ``ripgrep()``'s ``abort_signal=None`` default
keeps SDK consumers of the bare helper working without changes.

Six regression tests pin the contract: bare helper sanity / pre-call
trip / abort-tripped-subprocess-returns-promptly (deterministic event
handshake against ``sleep`` — also exercises the supervisor without
requiring ripgrep installed) / Glob abort → AbortError / Grep abort →
AbortError / unavailable-rg path unchanged.

Read, WebFetch, WebSearch and MCP are deliberately deferred:
* Read: ``open().read()`` returns in <1s for typical files; the agent
  loop's pre-dispatch ``_check_cancel`` already short-circuits abort
  fired between tools.
* WebFetch/WebSearch: ``urllib.request.urlopen(timeout=15)`` is bounded
  at 15s; a watchdog-thread approach is bigger surgery best done as a
  separate PR.
* MCP: out-of-process JSON-RPC needs transport-level cancellation —
  cross-cutting change, separate PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ericleepi314 ericleepi314 merged commit 2406dac into feat/headless-sigint-mid-tool May 15, 2026
ericleepi314 added a commit that referenced this pull request May 15, 2026
…out (#142)

Before this fix, the ripgrep wrapper used ``subprocess.run(timeout=20)``.
A SIGINT/ESC that tripped the abort controller mid-search had to wait
out the full 20-second timeout before the subprocess returned and the
agent loop could observe the cancellation. On a large repo Glob and
Grep felt exactly like the pre-PR-#135 Bash supervisor — "ESC is
ignored for 20+ seconds."

Add ``_run_rg_with_abort`` that mirrors bash_tool's pattern: Popen +
50ms poll loop watching both the deadline and an ``AbortSignal``,
SIGTERM → 2s grace → SIGKILL. New ``RipgrepAbortedError`` is distinct
from ``RipgrepTimeoutError`` because the two outcomes warrant
different downstream handling — timeout surfaces partial results
(useful to the agent), abort drops them (the user already cancelled).

Glob and Grep pass ``context.abort_controller.signal`` to the helper
and re-raise ``AbortError`` on ``RipgrepAbortedError`` so the agent
loop's ``except AbortError: raise`` branch from PR #135 unwinds to the
outer cancel boundary. ``ripgrep()``'s ``abort_signal=None`` default
keeps SDK consumers of the bare helper working without changes.

Six regression tests pin the contract: bare helper sanity / pre-call
trip / abort-tripped-subprocess-returns-promptly (deterministic event
handshake against ``sleep`` — also exercises the supervisor without
requiring ripgrep installed) / Glob abort → AbortError / Grep abort →
AbortError / unavailable-rg path unchanged.

Read, WebFetch, WebSearch and MCP are deliberately deferred:
* Read: ``open().read()`` returns in <1s for typical files; the agent
  loop's pre-dispatch ``_check_cancel`` already short-circuits abort
  fired between tools.
* WebFetch/WebSearch: ``urllib.request.urlopen(timeout=15)`` is bounded
  at 15s; a watchdog-thread approach is bigger surgery best done as a
  separate PR.
* MCP: out-of-process JSON-RPC needs transport-level cancellation —
  cross-cutting change, separate PR.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
ericleepi314 added a commit that referenced this pull request May 15, 2026
…out (#142) (#143)

Before this fix, the ripgrep wrapper used ``subprocess.run(timeout=20)``.
A SIGINT/ESC that tripped the abort controller mid-search had to wait
out the full 20-second timeout before the subprocess returned and the
agent loop could observe the cancellation. On a large repo Glob and
Grep felt exactly like the pre-PR-#135 Bash supervisor — "ESC is
ignored for 20+ seconds."

Add ``_run_rg_with_abort`` that mirrors bash_tool's pattern: Popen +
50ms poll loop watching both the deadline and an ``AbortSignal``,
SIGTERM → 2s grace → SIGKILL. New ``RipgrepAbortedError`` is distinct
from ``RipgrepTimeoutError`` because the two outcomes warrant
different downstream handling — timeout surfaces partial results
(useful to the agent), abort drops them (the user already cancelled).

Glob and Grep pass ``context.abort_controller.signal`` to the helper
and re-raise ``AbortError`` on ``RipgrepAbortedError`` so the agent
loop's ``except AbortError: raise`` branch from PR #135 unwinds to the
outer cancel boundary. ``ripgrep()``'s ``abort_signal=None`` default
keeps SDK consumers of the bare helper working without changes.

Six regression tests pin the contract: bare helper sanity / pre-call
trip / abort-tripped-subprocess-returns-promptly (deterministic event
handshake against ``sleep`` — also exercises the supervisor without
requiring ripgrep installed) / Glob abort → AbortError / Grep abort →
AbortError / unavailable-rg path unchanged.

Read, WebFetch, WebSearch and MCP are deliberately deferred:
* Read: ``open().read()`` returns in <1s for typical files; the agent
  loop's pre-dispatch ``_check_cancel`` already short-circuits abort
  fired between tools.
* WebFetch/WebSearch: ``urllib.request.urlopen(timeout=15)`` is bounded
  at 15s; a watchdog-thread approach is bigger surgery best done as a
  separate PR.
* MCP: out-of-process JSON-RPC needs transport-level cancellation —
  cross-cutting change, separate PR.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant