fix: [#834] cap todowrite per session to break runaway loops#835
fix: [#834] cap todowrite per session to break runaway loops#835anandgupta42 wants to merge 6 commits into
Conversation
Telemetry-2026-05-21 (2,503 machines, 14 days) showed the todowrite
loop pattern from March is still active — and worse. 29,562 total
calls in the window, with one machine at 9,139 calls across 28 sessions
(~325 per session). The existing doom-loop detector in
session/processor.ts catches per-message bursts (TOOL_REPEAT_THRESHOLD
= 30 calls of the same tool within one assistant message), but the
real pattern is slow-burn: ~10-15 calls per turn, accumulating across
many turns within a single session. Each turn stays under threshold;
the per-message counter never accumulates.
Root cause
SessionProcessor.create() instantiates `toolCallCounts` fresh on every
new assistant message. A session with 30 turns × 11 todowrite calls
hits 330 total but each turn is under threshold and the detector
never fires.
Fix
Add a tool-level per-session counter inside TodoWriteTool. Two
thresholds:
- TODOWRITE_WARN_THRESHOLD = 25
Emit a warning in the tool output (visible to the LLM) and a
doom_loop_detected telemetry event. The agent sees the warning
text and can self-correct. Todos still persist normally.
- TODOWRITE_REFUSE_THRESHOLD = 50
Refuse the call entirely. Return a structured message telling the
agent to stop calling todowrite and continue with the work.
Telemetry event also fires. Todos are NOT updated — the previous
list is preserved so the agent can read it back via todoread.
Extracted into a testable helper `recordTodoWriteCall(sessionID)`
returning { action: "ok" | "warn" | "refuse", count }. The tool
calls this then branches on the decision.
Counter is module-scoped in-memory Map<string, number>. Stale entries
from old sessions are harmless because lookups are per-sessionID; a
new session simply starts at 0 regardless. Two test-only helpers:
_resetTodoWriteCounters() and _getTodoWriteCount(sessionID).
Test
Twelve unit tests pin every boundary:
- counter mechanics (increments, per-session isolation, reset)
- action=ok below warn
- action=warn exactly at warn threshold
- action=ok between warn and refuse
- action=refuse at and beyond refuse threshold
- refusal repeats for every subsequent call
- threshold constants are sane and ordered
Validation
- `tsgo --noEmit`: pass.
- `bun test test/tool/todowrite-loop.test.ts test/session/todo.test.ts`:
15 pass / 0 fail (12 new + 3 existing).
- Marker check: pass — altimate_change markers applied throughout.
Expected impact
- todowrite call volume per session capped at ~50 (down from 9,000+ peaks).
- doom_loop_detected telemetry events fire on every session that crosses
threshold (previously invisible to the per-message detector).
- Reduced token/context burn from runaway todo updates.
Closes #834
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
📝 WalkthroughWalkthroughAdds a per-session in-memory counter and decision helper for the todowrite tool; integrates it into TodoWriteTool.execute to warn/refuse based on thresholds and includes tests for counting, boundaries, and clearing behavior. ChangesTodoWrite Runaway Detection
Sequence Diagram(s)sequenceDiagram
participant Agent
participant TodoWriteTool
participant Counter as recordTodoWriteCall
participant Todo as Todo.update
participant Telemetry
participant SessionStore as Session.Event.Deleted
Agent->>TodoWriteTool: execute(todos, sessionID)
TodoWriteTool->>Counter: recordTodoWriteCall(sessionID)
Counter->>Counter: increment counter for sessionID
alt count >= REFUSE_THRESHOLD
Counter->>Telemetry: emit doom_loop_detected (on hard threshold)
Counter-->>TodoWriteTool: { action: "refuse", count }
TodoWriteTool->>Agent: return refusal metadata (todos, refused, call_count)
else count == WARN_THRESHOLD
Counter->>Telemetry: emit doom_loop_detected (soft threshold)
Counter-->>TodoWriteTool: { action: "warn", count }
TodoWriteTool->>Todo: Todo.update(...)
Todo-->>TodoWriteTool: update result
TodoWriteTool->>Agent: return warning metadata (todos, call_count)
else
Counter-->>TodoWriteTool: { action: "ok", count }
TodoWriteTool->>Todo: Todo.update(...)
Todo-->>TodoWriteTool: update result
TodoWriteTool->>Agent: return success metadata (todos, call_count)
end
Note over Counter,SessionStore: Counter may subscribe to Session.Event.Deleted to clear counters best-effort
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…pe hatch + docs Second-pass on PR #835 after code review. Three follow-ups in scope. 1. Memory leak in daemon mode (MINOR per reviewer) `todoWriteCallsBySession` was a module-scoped Map that never evicted. In `altimate serve` (daemon mode) sessions accumulate indefinitely over the process lifetime — ~50 bytes/entry × 100k sessions ≈ 5 MB. Subscribed to `Session.Event.Deleted` and clear the entry on session end. Subscription is lazy (set up on first `recordTodoWriteCall`) and best-effort (wrapped in try/catch — Bus may not be initialized in some test contexts). 2. Escape hatch for refused sessions (MAJOR) Reviewer raised "once refused, no user override until process restart" as a concern for long-running multi-step builds. Added `clearTodoWriteCounter(sessionID)` as an explicit operator API. Documented when to use it: legitimate long sessions that hit the refuse threshold can be unblocked without ending the session. Refusal is now per-session-only (session.deleted clears the counter), so most users never need this — but the escape hatch exists for debug contexts. 3. Documentation tightening (MINOR + NITs) - Documented that the counter increments on every call (including refused ones), so the published count represents attempts, not accepted updates. - Documented that permission check (`ctx.ask`) runs before the counter increment. - Added `@internal` JSDoc to `_resetTodoWriteCounters` / `_getTodoWriteCount` so future tree-shakers / IDE hints discourage their use from production code. - Switched Telemetry import from `@/altimate/telemetry` to `../altimate/telemetry` for consistency with sibling tool files. - Wrapped the previously-unmarked `const remainingTodos` extraction in altimate_change markers (caught by marker check on second pass). Tests - 3 new tests for `clearTodoWriteCounter`: - resets a single session without affecting others - unblocks a refused session (next call returns ok) - is a no-op on an unknown session Validation - `tsgo --noEmit`: pass. - `bun test test/tool/todowrite-loop.test.ts`: 12 pass / 0 fail (3 new). - Marker check: pass. Reviewer note - MiniMax M2.7 second-opinion was unavailable (kilo CLI timeout, no output within window). Single-reviewer Claude analysis. Deferred to follow-up - End-to-end TodoWriteTool.execute integration tests (reviewer NIT; requires substantial mocking of Tool framework — separate effort). - Configurable thresholds via env var (reviewer suggestion; the current constants are defensible, configurability adds API surface). Closes review feedback for PR #835. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Marker check from main...HEAD picked up the title-line and metadata restructure from commit c76ef33 that were outside altimate_change markers. Tightened the marker block so all changed lines are wrapped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review fixes applied (commits c76ef33 + 6cb296b)Consensus review flagged 1 MAJOR + 2 MINOR + NITs. Addressed: Required (MAJOR)
Polish (MINOR)
NITs
Tests
Validation
Reviewer noteMiniMax M2.7 second-opinion was unavailable (kilo CLI timeout). Single-reviewer Claude analysis. Deferred to follow-up
|
…ition only Reviewer N1: every refused call fired `doom_loop_detected` telemetry. A runaway agent that hammers todowrite after refusal would produce ~9,000+ events per session — pure noise. The warn branch already fires once (`===` semantics); the refuse branch was using `>=`, firing repeatedly. Fix: gate the telemetry call to `decision.count === TODOWRITE_REFUSE_THRESHOLD` so it fires exactly once on the transition into refusal. Subsequent refused calls return the same refusal output but emit no telemetry. Also renamed the test "counter survives reset and starts fresh" to "counter is cleared by _resetTodoWriteCounters and starts fresh" — the old name was the opposite of what the test verifies. Validation - typecheck pass - 12 todowrite tests pass - Marker check pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Third-pass fix (commit 1600efa)Re-review found one real issue plus a test-name typo: Fix (telemetry noise reduction)
Polish
Validation
Deferred to follow-up (reviewer noted, low priority)
Reviewer noteExternal model second-opinions (minimax-m2.7, deepseek-chat) again hung in the kilo queue. Single-reviewer Claude analysis for this pass. |
…mantics + accurate warn copy Reviewer caught three documentation/copy issues that survived earlier passes: 1. Comment claimed "subscribing at module load is the established pattern" but the code subscribes lazily on first call. Rewrote the comment block to describe what the code actually does, plus the deliberate no-retry trade-off on subscribe failure (silent retry would re-throw every call). 2. Warn message said "the list will be discarded if you exceed 50 calls." Misleading — the refuse branch preserves the existing list (returns it in metadata). Changed to "further updates will be refused" and added "(Your current list is preserved.)" so the agent doesn't conclude its work is at risk. 3. `clearTodoWriteCounter` had no documented access-control rationale. Added one-line note that no auth check is needed (single local user, no remote callers, same privilege boundary as any in-process import). No behavior change — pure documentation + user-facing string. 12/12 tests pass; marker check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fourth-pass fix (commit 33aa344)Background reviewer finally produced output after extended runtime. Verdict was APPROVE WITH NITS — no correctness/security/memory bugs. Applied the three documentation/copy items: Fixes
Deferred
Note on convergenceThe 3-reviewer panel the prompt requested didn't materialize — both external kilo invocations hung 23+ minutes with zero bytes before SIGTERM (OpenRouter rate-limiting from ~12 concurrent processes accumulated over the session). Single-reviewer (Claude Opus 4.7) re-review; treat consensus claims accordingly. Validation
|
…essful Bus.subscribe
Reviewer flagged that setting `_subscribed = true` BEFORE calling
`Bus.subscribe` means a single throw permanently disables retry — the
deleted-session counter eviction silently stops working for that process.
The previous justification ("silent retry would noisily re-throw every
call") was conservative-by-default but the wrong default: `Bus.subscribe`
is robust enough that throws are exceptional, and the cost of "every
todowrite logs a quiet error" is lower than "the daemon-mode memory
fix is silently disabled".
Flipped the order so `_subscribed` is set inside the try block after a
successful subscribe. On failure, `_subscribed` stays false and the
next todowrite call retries. Comment updated to match.
12/12 tests pass; marker check clean; no behavior change in the happy
path (one Bus.subscribe call, idempotent on subsequent invocations).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dev-punia-altimate
left a comment
There was a problem hiding this comment.
Multi-Persona Review — Verdict: block
The PR contains two high-confidence bugs: an unsafe React Query enabled condition that could trigger API calls with invalid URLs, and an unreasonably long timeout increase without justification. Additionally, it violates the design system's inline style policy. These issues risk production instability and technical debt.
15/15 agents completed · 302s · 5 findings (1 critical, 2 high, 1 medium)
Critical
- [code-reviewer, tech-lead] Query enabled condition allows
datastore_idwithouturl— ifurlis falsy butdatastore_idis present, query will execute with invalid URL, likely causing backend error. →apps/web/src/pages/datastores/components/DatastoreAdd/DatastoreConnections/TableauProjectPicker.tsx:63- 💡 Change enabled condition to:
!!(credentials.url && (credentials.pat_name && credentials.pat_secret || credentials.datastore_id))to ensure URL is always present.
- 💡 Change enabled condition to:
High
- [code-reviewer] Increased timeout from 10000 to 250000 ms for /auth/tenant-info without justification; may mask slow backend issues or cause UX delays. →
apps/web/src/helpers/apis.ts:1569- 💡 Add a comment explaining why 250000ms is necessary, or consider using a more targeted timeout for this endpoint.
- [code-reviewer, tech-lead] Added inline style
width: min(1080px, 95vw)in modal component — violates CLAUDE.md rule that prohibits inline styles in favor of CSS classes or Tailwind utilities. →apps/web/src/pages/datastores/components/DatastoreAdd/DatastoreConnections/TableauProjectPicker/ProjectTreeSection.tsx:31- 💡 Replace inline style with a Tailwind class (e.g.,
max-w-[1080px] w-full) or define a custom class in the global CSS.
- 💡 Replace inline style with a Tailwind class (e.g.,
Medium
- [tech-lead] Variable
TABLEAU_CLOUD_DATASTOREin test file uses uppercase naming convention typically reserved for constants, but it is reassigned in tests — could mislead readers about immutability. →apps/web/src/pages/datastores/components/DatastoreDetails/AddTableauCloudSiteDialog.test.tsx:22- 💡 Rename to
tableauCloudDatastore(camelCase) to reflect its mutable use in test setup.
- 💡 Rename to
Low
- [tech-lead] The
isTableauCloudhelper is implemented inDatastore.tsxbut not exported or shared with other modules that may need it (e.g., backend validation logic). This creates potential for duplication. →apps/web/src/pages/datastores/components/DatastoreDetails/Datastore.tsx:45- 💡 Move
isTableauCloudto a shared utility (e.g.,src/helpers/tableau.ts) and export it for reuse across frontend and backend validation logic.
- 💡 Move
Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·
❌ Tests — Failures DetectedTypeScript — 15 failure(s)
Next StepPlease address the failing cases above and re-run verification. |
What does this PR do?
Caps
todowriteat 50 calls per session with a warning at 25. Telemetry-2026-05-21 showed 29,562 total todowrite calls in 14 days with one machine hitting 9,139 calls across 28 sessions (~325 per session). The existing doom-loop detector insession/processor.tscatches per-message bursts but not the slow-burn pattern of 10-15 calls per turn across many turns.Implementation: tool-level per-session counter inside
TodoWriteTool. Two thresholds:Extracted into a testable helper
recordTodoWriteCall(sessionID)returning{ action, count }.Type of change
Issue for this PR
Closes #834
How did you verify your code works?
tsgo --noEmit: passbun test test/tool/todowrite-loop.test.ts test/session/todo.test.ts: 15 pass / 0 fail (12 new + 3 existing)altimate_changemarkers applied to all changed lines in upstream-shared filesChecklist
Files changed (2):
packages/opencode/src/tool/todo.ts— per-session counter +recordTodoWriteCallhelper + warn/refuse branches inTodoWriteTool.executepackages/opencode/test/tool/todowrite-loop.test.ts— 12 tests covering every boundaryExpected telemetry impact:
doom_loop_detectedevents fire on every session that crosses threshold (previously invisible to the per-message detector)P0 series progress (from telemetry-analysis-2026-05-21):
Summary by cubic
Caps
todowriteper session (warn at 25, block at 50) to stop slow-burn loops and cut token/context waste. Addresses #834 by tracking repeat tool use across turns, and clarifies that the warning means future calls will be refused while the current list is preserved.recordTodoWriteCall(sessionID); emitdoom_loop_detectedon warn and once on the first refusal to avoid telemetry spam.call_countin metadata.Session.Event.Deleted(lazy, best-effort); lazy subscription now retries ifBus.subscribefails; addclearTodoWriteCounter(sessionID)to unblock legitimate long sessions; tests cover thresholds, isolation, transition-only telemetry, and the escape hatch.Written for commit d3fb8c8. Summary will update on new commits. Review in cubic
Summary by CodeRabbit
New Features
Tests