feat(grep,glob): abort ripgrep mid-search instead of waiting out timeout by ericleepi314 · Pull Request #142 · agentforce314/clawcodex

ericleepi314 · 2026-05-15T09:20:10Z

Summary

Before: Glob/Grep used subprocess.run(timeout=20) for ripgrep — a SIGINT/ESC that tripped the abort controller had to wait out the full 20s timeout before the agent loop noticed
After: Popen + 50ms poll loop (mirrors bash_tool's pattern from PR fix(esc): propagate cancel signal into tool_context for subagents #135) watches both the deadline and the abort signal; SIGTERM → 2s grace → SIGKILL on abort, returning within ~150ms instead of 20s
New RipgrepAbortedError is distinct from RipgrepTimeoutError: timeout surfaces partial results (useful to the agent), abort drops them (user already cancelled — surfacing them would be noise)

Why

PR #135 wired tool_context.abort_controller so subagents and Bash honor ESC. Other long-running tools (Read, Glob, Grep, WebFetch, WebSearch, MCP) still don't poll the signal — they'd return promptly only at natural completion. This PR closes the gap for the search tools, the most visible offenders: searching a large monorepo can pin the user's terminal for 20s after they ESC.

Changes

src/tool_system/utils/ripgrep.py:
- New _run_rg_with_abort(argv, *, timeout_s, abort_signal) — Popen-based supervisor mirroring bash_tool._run_bash_with_abort (50ms poll, SIGTERM → 2s grace → SIGKILL, start_new_session=True / Windows CREATE_NEW_PROCESS_GROUP for process-group teardown)
- New RipgrepAbortedError(message, partial_results=...) raised when the signal trips
- ripgrep(...) grows an abort_signal: AbortSignal | None = None keyword param; default-None preserves previous semantics exactly for SDK consumers
src/tool_system/tools/glob.py + src/tool_system/tools/grep.py:
- Pass context.abort_controller.signal to ripgrep(...)
- Catch RipgrepAbortedError and re-raise AbortError(...) so the agent loop's except AbortError: raise branch (PR fix(esc): propagate cancel signal into tool_context for subagents #135) unwinds to the outer cancel boundary instead of synthesizing a partial tool result
tests/test_ripgrep_abort.py (new) — 6 regression tests:
- Bare helper sanity: never-tripped signal preserves existing semantics
- Pre-tripped signal raises RipgrepAbortedError fast
- Deterministic abort-mid-subprocess test using sleep 60 (no real ripgrep required — also serves as CI coverage on machines without rg installed)
- Glob abort → AbortError
- Grep abort → AbortError
- Unavailable-rg path: RipgrepUnavailableError still raised correctly

Test plan

12 directly related tests pass (6 new + headless)
4404 broader related tests pass
Critic subagent review: APPROVE (after applying recommended deterministic test variant + minor type/comment cleanups)
Manual: in a real Claude Code session, kick off Grep against a large monorepo, press ESC — should unwind within ~1s instead of 20s

Out of scope (deferred to separate follow-ups)

Read: typical reads complete in <1s; agent loop's pre-dispatch _check_cancel already short-circuits abort fired between tools.
WebFetch / WebSearch: bounded by urllib.request.urlopen(timeout=15). Cleanest abort requires a watchdog thread that closes the socket — separate PR.
MCP tools: out-of-process JSON-RPC; transport-level cancellation is a cross-cutting change across the entire MCP client.

Dependencies

Stacked on PR feat(headless): SIGINT mid-tool cancels via AbortController #141 (headless SIGINT). Rebase onto main once feat(headless): SIGINT mid-tool cancels via AbortController #141 merges. Originally PR feat(grep,glob): abort ripgrep mid-search instead of waiting out timeout #139 stacked on PR feat(headless): SIGINT mid-tool cancels via AbortController #138.

🤖 Generated with Claude Code

Before this fix, the ripgrep wrapper used ``subprocess.run(timeout=20)``. A SIGINT/ESC that tripped the abort controller mid-search had to wait out the full 20-second timeout before the subprocess returned and the agent loop could observe the cancellation. On a large repo Glob and Grep felt exactly like the pre-PR-#135 Bash supervisor — "ESC is ignored for 20+ seconds." Add ``_run_rg_with_abort`` that mirrors bash_tool's pattern: Popen + 50ms poll loop watching both the deadline and an ``AbortSignal``, SIGTERM → 2s grace → SIGKILL. New ``RipgrepAbortedError`` is distinct from ``RipgrepTimeoutError`` because the two outcomes warrant different downstream handling — timeout surfaces partial results (useful to the agent), abort drops them (the user already cancelled). Glob and Grep pass ``context.abort_controller.signal`` to the helper and re-raise ``AbortError`` on ``RipgrepAbortedError`` so the agent loop's ``except AbortError: raise`` branch from PR #135 unwinds to the outer cancel boundary. ``ripgrep()``'s ``abort_signal=None`` default keeps SDK consumers of the bare helper working without changes. Six regression tests pin the contract: bare helper sanity / pre-call trip / abort-tripped-subprocess-returns-promptly (deterministic event handshake against ``sleep`` — also exercises the supervisor without requiring ripgrep installed) / Glob abort → AbortError / Grep abort → AbortError / unavailable-rg path unchanged. Read, WebFetch, WebSearch and MCP are deliberately deferred: * Read: ``open().read()`` returns in <1s for typical files; the agent loop's pre-dispatch ``_check_cancel`` already short-circuits abort fired between tools. * WebFetch/WebSearch: ``urllib.request.urlopen(timeout=15)`` is bounded at 15s; a watchdog-thread approach is bigger surgery best done as a separate PR. * MCP: out-of-process JSON-RPC needs transport-level cancellation — cross-cutting change, separate PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…out (#142) Before this fix, the ripgrep wrapper used ``subprocess.run(timeout=20)``. A SIGINT/ESC that tripped the abort controller mid-search had to wait out the full 20-second timeout before the subprocess returned and the agent loop could observe the cancellation. On a large repo Glob and Grep felt exactly like the pre-PR-#135 Bash supervisor — "ESC is ignored for 20+ seconds." Add ``_run_rg_with_abort`` that mirrors bash_tool's pattern: Popen + 50ms poll loop watching both the deadline and an ``AbortSignal``, SIGTERM → 2s grace → SIGKILL. New ``RipgrepAbortedError`` is distinct from ``RipgrepTimeoutError`` because the two outcomes warrant different downstream handling — timeout surfaces partial results (useful to the agent), abort drops them (the user already cancelled). Glob and Grep pass ``context.abort_controller.signal`` to the helper and re-raise ``AbortError`` on ``RipgrepAbortedError`` so the agent loop's ``except AbortError: raise`` branch from PR #135 unwinds to the outer cancel boundary. ``ripgrep()``'s ``abort_signal=None`` default keeps SDK consumers of the bare helper working without changes. Six regression tests pin the contract: bare helper sanity / pre-call trip / abort-tripped-subprocess-returns-promptly (deterministic event handshake against ``sleep`` — also exercises the supervisor without requiring ripgrep installed) / Glob abort → AbortError / Grep abort → AbortError / unavailable-rg path unchanged. Read, WebFetch, WebSearch and MCP are deliberately deferred: * Read: ``open().read()`` returns in <1s for typical files; the agent loop's pre-dispatch ``_check_cancel`` already short-circuits abort fired between tools. * WebFetch/WebSearch: ``urllib.request.urlopen(timeout=15)`` is bounded at 15s; a watchdog-thread approach is bigger surgery best done as a separate PR. * MCP: out-of-process JSON-RPC needs transport-level cancellation — cross-cutting change, separate PR. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…out (#142) (#143) Before this fix, the ripgrep wrapper used ``subprocess.run(timeout=20)``. A SIGINT/ESC that tripped the abort controller mid-search had to wait out the full 20-second timeout before the subprocess returned and the agent loop could observe the cancellation. On a large repo Glob and Grep felt exactly like the pre-PR-#135 Bash supervisor — "ESC is ignored for 20+ seconds." Add ``_run_rg_with_abort`` that mirrors bash_tool's pattern: Popen + 50ms poll loop watching both the deadline and an ``AbortSignal``, SIGTERM → 2s grace → SIGKILL. New ``RipgrepAbortedError`` is distinct from ``RipgrepTimeoutError`` because the two outcomes warrant different downstream handling — timeout surfaces partial results (useful to the agent), abort drops them (the user already cancelled). Glob and Grep pass ``context.abort_controller.signal`` to the helper and re-raise ``AbortError`` on ``RipgrepAbortedError`` so the agent loop's ``except AbortError: raise`` branch from PR #135 unwinds to the outer cancel boundary. ``ripgrep()``'s ``abort_signal=None`` default keeps SDK consumers of the bare helper working without changes. Six regression tests pin the contract: bare helper sanity / pre-call trip / abort-tripped-subprocess-returns-promptly (deterministic event handshake against ``sleep`` — also exercises the supervisor without requiring ripgrep installed) / Glob abort → AbortError / Grep abort → AbortError / unavailable-rg path unchanged. Read, WebFetch, WebSearch and MCP are deliberately deferred: * Read: ``open().read()`` returns in <1s for typical files; the agent loop's pre-dispatch ``_check_cancel`` already short-circuits abort fired between tools. * WebFetch/WebSearch: ``urllib.request.urlopen(timeout=15)`` is bounded at 15s; a watchdog-thread approach is bigger surgery best done as a separate PR. * MCP: out-of-process JSON-RPC needs transport-level cancellation — cross-cutting change, separate PR. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

ericleepi314 merged commit 2406dac into feat/headless-sigint-mid-tool May 15, 2026

ericleepi314 mentioned this pull request May 15, 2026

feat(grep,glob): abort ripgrep mid-search instead of waiting out timeout #143

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(grep,glob): abort ripgrep mid-search instead of waiting out timeout#142

feat(grep,glob): abort ripgrep mid-search instead of waiting out timeout#142
ericleepi314 merged 1 commit into
feat/headless-sigint-mid-toolfrom
feat/per-tool-abort-polling

ericleepi314 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericleepi314 commented May 15, 2026

Summary

Why

Changes

Test plan

Out of scope (deferred to separate follow-ups)

Dependencies

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant