feat(engine): bound claude -p phases with a timeout, cost cap, and heartbeat#53
Merged
Merged
Conversation
…artbeat Every token-spending claude -p phase (build, the test/security gates, simplify, retrospective) now runs under a per-phase wall-clock timeout (FL_PHASE_TIMEOUT, default 1200s) via GNU timeout, so a throttled or wedged hosted call can no longer hang the run indefinitely. A timed-out gate has its failure file synthesized so the loop retries instead of mis-reading the unfinished gate as a pass; a timed-out post-green /code-simplify discards its partial edits and ships the already-green tip. Where timeout is unavailable the calls run unbounded rather than failing. Adds opt-in FL_MAX_BUDGET_USD, passed as --max-budget-usd to each phase, to bound a runaway agentic loop by cost (the plan proposed --max-turns, but the pinned CLI exposes --max-budget-usd). Adds an off-TTY progress heartbeat: long phases emit a plain "still running (Nm)" tick every FL_HEARTBEAT_SECS (default 60) and STATUS.md stamps each running phase's start time, so a slow-but-progressing phase is distinguishable from a hung one. Adds bats coverage: a timed-out gate is marked failed (no hang), a simplify timeout ships the green tip, --max-budget-usd is passed through only when set, and a slow phase emits a heartbeat tick. Closes #48.
8913ad2 to
a9d0b1a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #48. Bounds every token-spending
claude -pphase so a throttled or wedged hostedcall can no longer hang a
feature-looprun indefinitely, and makes a slow-but-progressingphase distinguishable from a hung one.
What changed
Per-phase timeout (
FL_PHASE_TIMEOUT, default 1200s) — everyclaude -pphase (build,the test/security gates, simplify, retrospective) runs under GNU
timeout -k 30.and retries — rather than mis-reading "no failure file written" as a pass.
/code-simplifydiscards its partial edits and ships thealready-green tip (matching the recommendation in engine: claude -p phases have no timeout or progress heartbeat — one stalled ~29 min (ISSUE-43 run) #48 itself).
timeoutisn't on PATH (e.g. a bare macOS host), calls run unbounded rather thanfailing — and the prefix array is guarded so an empty expansion can't abort under
set -uon macOS's Bash 3.2.Cost cap (
FL_MAX_BUDGET_USD, opt-in) — when set, passed as--max-budget-usdto eachphase, bounding a runaway agentic loop by cost. The issue proposed
--max-turns, but thepinned CLI (2.1.156) exposes
--max-budget-usd, not--max-turns; this implements thebound with the flag the CLI actually offers. Unset = no cap, so a guessed default can't
truncate a legitimate multi-step phase.
Progress heartbeat (
FL_HEARTBEAT_SECS, default 60) — off-TTY (piped/CI/Docker), wherethe spinner is a no-op, long phases emit a plain
… still running (Nm)tick, and STATUS.mdstamps each running phase's start time. (Mitigation #4 in the issue — structured 429
parsing — is intentionally not added: the CLI logs no structured rate-limit signal to key
on; the elapsed tick + the timed-out phase's captured log tail surface throttling instead.)
Test plan
the green tip,
--max-budget-usdis passed through only when set, and a slow phase emitsa heartbeat tick. The two timeout tests
skipgracefully wheretimeoutis unavailable.on this macOS host —
script(1)differs from Linux — and fail identically onmain;they pass in CI.
Provenance
Implemented by an autonomous
feature-looprun on #48 (dogfooding), then rebased ontov0.4.2, squashed to a single clean commit, AI-attribution trailers stripped (per repo
convention), and reviewed by hand — including a Bash 3.2 portability pass and confirming
--max-budget-usdis a real CLI flag.