Skip to content

feat(agent-mode): tool_response envelope + tool_call.id correlation#730

Merged
kelsonpw merged 34 commits into
mainfrom
feat/agent-mode-tool-response
May 15, 2026
Merged

feat(agent-mode): tool_response envelope + tool_call.id correlation#730
kelsonpw merged 34 commits into
mainfrom
feat/agent-mode-tool-response

Conversation

@kelsonpw
Copy link
Copy Markdown
Member

@kelsonpw kelsonpw commented May 11, 2026

Summary

Stacks on #723. Audit signal: 3-of-5 agent-mode subagents flagged this as the biggest parent-agent UX cliff in the wire today.

Today the wizard emits tool_call at PreToolUse but nothing at PostToolUse for non-write tools. Orchestrators (Claude Code, Cursor, Codex) see "Bash: pnpm install" and then radio silence until the next tool_call. Claude Code can render the action but can't render the outcome, duration, or output preview. The accumulator recordToolOutcome() already runs at agent-ui.ts:1001 but emits nothing — it's only consumed by the eventual tool_call_summary rollup.

This PR closes the gap with two changes:

1. tool_response envelope (data_version: 1)

Emitted at PostToolUse for every tool the inner agent calls. Pairs with the preceding tool_call via the SDK-provided tool_use_id correlation field.

  • outcome: 'success' | 'error' | 'denied'
  • durationMs: prefers the SDK's duration_ms; falls back to a tool_use_id-keyed PreToolUse wall-clock delta (bounded map, max 256 entries)
  • exitCode: Bash only
  • contentHead: first 1024 bytes of stdout/stderr/output, then redactString
  • errorMessage: 256 bytes, sanitized
  • summary: 120 chars (mirrors tool_call.summary)
  • isError: derived boolean for orchestrators that don't enumerate the outcome enum
  • All free-form text caps live in EVENT_DATA_VERSIONS.tool_response neighbours as exported constants — the regression suite reads them directly so a future bump only needs one edit

2. tool_call.id correlation field (tool_call bumped to data_version: 2)

Mirrors the SDK's tool_use_id. v1 readers continue to work (field is optional in the type signature); strict-correlation consumers branch on data_version >= 2.

Sample envelopes

Success path (Bash):

{"v":1,"@timestamp":"2026-05-11T07:50:00.123Z","type":"progress","message":"tool_response: Bash -> success (842ms)","data":{"event":"tool_response","tool":"Bash","id":"toolu_01abc","outcome":"success","durationMs":842,"exitCode":0,"contentHead":"+ pnpm install\n\nLockfile is up to date, resolution step is skipped\nProgress: resolved 1234, reused 1234, downloaded 0, added 0, done","isError":false,"summary":"pnpm install"},"data_version":1}

Error path (Edit):

{"v":1,"@timestamp":"2026-05-11T07:50:01.456Z","type":"progress","message":"tool_response: Edit -> error (15ms)","data":{"event":"tool_response","tool":"Edit","id":"toolu_01def","outcome":"error","durationMs":15,"contentHead":"File has not been read yet. Read it first before writing to it.","isError":true,"errorMessage":"File has not been read yet. Read it first before writing to it.","summary":"edit src/foo.ts"},"data_version":1}

Safety

  • PostToolUse emission is wrapped in try/catch — observation must NEVER block tool execution
  • Zod schema rejects malformed payloads at the boundary (logged to disk, not stdout)
  • redactString applied to contentHead and errorMessage so env-value secrets (e.g. WIZARD_OAUTH_TOKEN, Bearer ...) can't leak
  • truncateToBytes is UTF-8-safe — won't split multi-byte codepoints
  • Closure-local tool_use_id -> start-time map is capped at 256 entries (FIFO eviction) so a pathological SDK input can't leak forever

Test plan

  • pnpm exec tsc --noEmit — clean
  • pnpm exec eslint src/ --cache --max-warnings=0 — only the pre-existing EventPlanFullScreen.test.tsx:9 warning (known on main)
  • pnpm exec prettier --check '{lib,src,test}/**/*.ts' — clean
  • pnpm exec vitest run --pool=forks --maxWorkers=13974 / 3974 tests pass across 262 test files
  • 26 new tests in src/ui/__tests__/agent-ui-tool-response.test.ts covering: envelope shape, data_version stamping, tool_call.idtool_response.id correlation invariant, failure path with truncated errorMessage, success path with truncated + redacted contentHead, denied outcome, duration monotonicity, schema rejection, no-op on LoggingUI, and pure helpers (truncateToBytes UTF-8 boundary, extractToolUseId snake_case + camelCase, extractToolContentHead for string / {stdout,stderr,exitCode} / content[] shapes)
  • Updated existing agent-ui.test.ts > stamps data_version on tool_call to read from the registry so future bumps don't churn this assertion

🤖 Generated with Claude Code


Note

Medium Risk
Medium risk because it expands the NDJSON wire protocol (new events and data_version bumps) and changes inner-agent lifecycle emission logic, which could affect orchestrator parsers and audit trails if consumers assume older shapes.

Overview
Improves agent-mode NDJSON observability by adding a tool_response envelope emitted at PostToolUse for every tool call, correlated to tool_call via a new optional tool_call.id (SDK tool_use_id), and by gating write-tool audit events so failures emit file_change_failed and no longer incorrectly emit file_change_applied.

Expands the protocol with additional orchestrator-facing signals: tool_call_summary rollups, attempt_started retry boundaries, stall_status coaching tiers, cold_start_breakdown phase timings, progress_estimate for multi-item operations, mcp_status lifecycle transitions, current_file rollups, and run_resumed on checkpoint restore.

Upgrades needs_input to data_version: 3 by replacing the old English responseSchema map with a JSON Schema 2020-12 fragment (including a shared appIdResponseSchema factory), adds decisionId correlation support (including manual projects list agent JSON output), and introduces a waiting_for_user type-only alias for needs_input. Extensive new regression tests pin the new wire shapes and versioning.

Reviewed by Cursor Bugbot for commit 7b6ca71. Bugbot is set up for automated code reviews on this repo. Configure here.

kelsonpw and others added 26 commits May 10, 2026 22:46
Today the SIGINT handler only registers in the TUI branch — agent + CI
runs get hard-killed by Node's default behaviour, so an orchestrator
parsing the NDJSON stream sees an abrupt EOF instead of a clean
`run_completed: cancelled` envelope. No checkpoint is saved on the way
out, either.

Extract the handler into `installAbortSignalHandler` (shared helper in
`src/lib/graceful-exit.ts`) and wire it into the agent + CI branches in
`src/commands/default.ts` immediately after the session is built. The
TUI keeps its inline registration because it needs direct access to the
Ink store's `setCommandFeedback`.

`WizardAbortOptions` gains a `reason` field so SIGINT (and downstream
PRs: LOCK_HELD, AUTH_REQUIRED midRun) can pass a stable machine-readable
code to the terminal `run_completed` envelope, instead of relying on
the sanitized human-readable `message`.
The 2s graceful-exit timer in `performGracefulExit` called
`process.exit(130)` with no terminal NDJSON event — a parent agent
parsing the stream saw an abrupt EOF that's indistinguishable from a
crash. Add an `emitRunCompleted({ outcome: 'cancelled', exitCode: 130,
reason: 'sigint' })` call before the timer fires so AgentUI lands the
terminal envelope on stdout (InkUI / LoggingUI no-op as before).

Regression test in `graceful-exit.test.ts` spies on the UI and asserts
the call lands before the 2s timer.
When a long agent run hits an LLM-gateway 401 (or an Amplitude OAuth
expiry) the wizard previously aborted with a generic `error` envelope
plus a TUI-only OutroScreen prompt. Agent-mode orchestrators saw a
plain error and had to guess at whether files were preserved or what
command to re-run.

Wire the AUTH_ERROR path in `agent-runner.ts` to call
`emitAuthRequired` BEFORE `wizardAbort` with the structured payload:
mid-run discriminator, `preserveFiles: true`, `partialProgress` flags
from `classifyAgentOutcome`, and the canonical login + resume argv.

Extends the auth_required wire shape with `midRun`, `preserveFiles`,
`partialProgress`, `authSubkind`, plus two new reason discriminators
(`amplitude_token_expired`, `gateway_token_expired`). Bumps
`EVENT_DATA_VERSIONS.auth_required` from 1 -> 2; existing v1 callers
keep working since every new field is optional.

`emitAuthRequired` graduates to the WizardUI interface (optional) so
business logic can call it through `getUI()` without an AgentUI cast.
InkUI / LoggingUI no-op as designed.
After AUTH_RETRY_LIMIT consecutive 401-flavoured api_retry messages
the wizard short-circuits the SDK's ~3-minute retry storm and aborts.
Today that boundary fires `controller.abort('auth_failed')` silently
— orchestrators only see the downstream AUTH_ERROR routing and can't
distinguish "single 401, transient" from "we tried twice, stuck".

Add `lifecycle: auth_retry_exhausted` event with `attempts` + `subkind`
discriminator (`amplitude` / `llm-gateway`). Wired into the SDK
retry-loop boundary in `agent-interface.ts` immediately before
`controller.abort('auth_failed')`. AgentUI implements; InkUI /
LoggingUI no-op (the TUI's AUTH_ERROR outro covers the human path).

New WizardUI optional method `emitAuthRetryExhausted`. Registered in
`EVENT_DATA_VERSIONS` at v1. Extended the existing AUTH_RETRY_LIMIT
test in `agent-interface.test.ts` to assert the emit fires exactly
once with the expected payload.
…emit

Two fixes that together harden the NDJSON wire contract:

1. `data.event` discriminator on anonymous error envelopes. Previously
   `AgentUI.setRunError` emitted `data: { name, recoverable,
   suggestedAction }` with no `event` field — every other event family
   uses `data.event` as the discriminator. Orchestrators filtering on
   `data.event` saw nothing for run-aborting errors. Add
   `event: 'run_error'` and register `run_error: 1` in
   `EVENT_DATA_VERSIONS` so the envelope ships with `data_version`.

2. Zod-validate every envelope at the emit boundary. A new
   `validateEnvelopeOrLog` runs each event through a Zod schema (v=1,
   ISO timestamp, valid `type`, string `message`) plus a coherence
   check that asserts `data_version === EVENT_DATA_VERSIONS[event]`
   when the discriminator is set. Failures route to `logToFile` — never
   throw, never stderr (Ink owns stdout in TUI mode).

Regression tests in `agent-ui.test.ts`: a property-style check that
every documented emit method produces a wire-valid envelope, plus a
focused test pinning the `event=run_error` shape.
Every AgentErrorType branch in agent-runner.ts emitted a generic
`error` envelope with no machine-readable code, then routed to
wizardAbort with a human-readable message. Orchestrators couldn't
distinguish "retry your launch" (GATEWAY_DOWN 5xx blip) from
"upgrade wizard" (GATEWAY_INVALID_REQUEST schema rejection) from
"back off" (RATE_LIMIT) without parsing message strings.

Adds a new optional `emitRunError(data)` method on WizardUI with a
typed `code` discriminator covering GATEWAY_DOWN /
GATEWAY_INVALID_REQUEST / RATE_LIMIT / API_ERROR / MCP_MISSING /
RESOURCE_MISSING plus an optional `mcpServer` field (wizard-tools vs
amplitude-wizard) and a `recoverable` hint. AgentUI implements; InkUI
/ LoggingUI no-op.

Wired into every AgentErrorType branch in agent-runner.ts immediately
before the corresponding wizardAbort. The envelope reuses the
`run_error` discriminator + data_version=1 registered in the previous
commit, so consumers branching on `data.event` + `data.code` get a
stable contract.

Regression tests in `agent-ui.test.ts` parameterize over every code
and assert the emitted envelope shape.
`wizard apply` exited with `ExitCode.INVALID_ARGS=2` whenever the
per-project apply-lock (`acquireApplyLock`) detected an in-flight
holder — an orchestrator routinely interprets exit 2 as "bad flags"
and doesn't retry. Lock contention is automatically retryable.

Add `ExitCode.LOCK_HELD=14` and route the apply-lock collision path
in `src/commands/apply.ts` to it. Emit a terminal `run_completed:
{ outcome: 'error', exitCode: 14, reason: 'lock_held' }` envelope
before `process.exit` so an orchestrator parsing the NDJSON stream
sees the canonical signal instead of an abrupt EOF.

Regression test in `src/commands/__tests__/apply-lock-held.test.ts`
pins the numeric code at 14 (orchestrator contract) and asserts the
holder shape that the handler ships in the `apply_refused` envelope.
Today an orchestrator parsing the NDJSON stream has to infer coarse
progress from the firehose of tool_call / status / progress events.
Adds a dedicated `lifecycle: run_phase` event that transits five
fixed states in order: cold_start -> agent_running -> finalizing ->
completed | error.

Wiring (each is best-effort, never blocks the run):
- cold_start: first cold-start activity in agent-runner.ts
- agent_running: first PreToolUse in inner-lifecycle.ts (the
  emitter dedups so subsequent tool calls don't spam the stream)
- finalizing: just before seedPostAgentSteps in agent-runner.ts
- completed: in wizardSuccessExit before run_completed
- error: in wizardAbort before run_completed

New optional `emitRunPhase` on WizardUI. AgentUI implements with
deduplication; InkUI / LoggingUI no-op (their UX already implies
phase via the journey stepper / log line cadence). Registered
`run_phase: 1` in `EVENT_DATA_VERSIONS`.

Regression tests in agent-ui.test.ts: per-phase envelope shape,
dedup behaviour for repeated calls, and the wire-level ordering
invariant.
The SIGINT handler routes through wizardAbort via a dynamic import.
In production wizardAbort calls process.exit and never resolves, but
vitest's strict-exit guard throws a synthetic exception when
process.exit is called in a test — surfacing as an unhandled rejection
on the test stream. Wrap the runner in try/catch so the rejection
stops at the boundary; production behaviour is unchanged because the
throw path doesn't fire there.
PR B subagent's final commit skipped a prettier pass on these five
files. Pure whitespace / wrap reflows — no semantic change.
The TUI's "Discovered facts" chips (vertical, app type, framework,
package manager, region) only rendered in Ink. Mirror them onto the
agent-mode NDJSON stream as `progress: discovery_fact { id, label,
value, discoveredAt }` so parent agents (Claude Code, Cursor, Codex)
can render the same chips without parsing every status line.

Hooks into the existing `publishDiscoveryFact` in agent-runner.ts —
the abstract `WizardUI.pushDiscoveryFact` now reaches AgentUI as a
typed emit instead of a no-op. Stable `id` lets orchestrators upsert
chips on retry without flicker.

Adds `discovery_fact: 1` to EVENT_DATA_VERSIONS and a regression
test pinning envelope shape + version stamping.
Every Edit/Write tool call already emits its own `tool_call` event
plus a `file_change_planned`. That's great for an audit log but
noisy for parent agents that just want a "now editing X" header.

Add `progress: current_file { path, relativePath, operation }` —
the coarse rollup orchestrators subscribe to instead of parsing
every fine-grained tool event. Debounced at the wire boundary
(250ms per identical (path, op) tuple) so a tight edit chain
collapses into one event.

`relativePath` is computed against `installDir` so the wire ships a
short label; absolute path preserved for audit. Wired from the
existing inner-lifecycle PreToolUse hook; AgentUI debounces, InkUI /
LoggingUI no-op (optional WizardUI method).

Adds `current_file: 1` to EVENT_DATA_VERSIONS and regression tests
covering envelope shape, version stamping, and the
(path, op) debounce / non-debounce paths.
The wizard already detects stalls (no SDK traffic for N seconds) and
shows TUI hints. Mirror that escalation onto NDJSON so parent agents
can render the same coaching cadence.

`progress: stall_status { tier, durationMs, lastActivity, hint? }`
fires at 10s / 30s / 60s of silence, mapped to `noticed` /
`concerning` / `critical`. Per-tier dedup at the AgentUI boundary
(`resetStallStatus()` clears the gate when activity resumes), so
each tier hits the wire at most once per stall window.

Wired from the existing 10s heartbeat tick in agent-interface — the
same loop that emits `heartbeat` now also derives the silence-window
tier via `deriveStallTier()` (pure helper, regression-tested). An
`onActivity()` shim resets `lastActivityAt` from `onStatus` /
non-`stream_event` SDK messages, so a long tool-call interlude isn't
spuriously stamped as "concerning".

InkUI / LoggingUI no-op via the optional WizardUI methods.
Adds `stall_status: 1` to EVENT_DATA_VERSIONS and tests covering
envelope shape, tier dedup, reset semantics, and the pure helper's
threshold boundaries.
When the wizard restarts from a checkpoint (post-crash, post-SIGINT,
post-token-expiry), the existing `checkpoint_loaded` event carries
only the file age — useful for policy gates but not for UX. Add
`lifecycle: run_resumed { from_checkpoint_at, last_phase,
restored_state_summary }` as the orchestrator-facing companion so
parent agents can render "continuing from where we left off"
instead of "fresh run".

Wired from `loadCheckpoint()` immediately after `emitCheckpointLoaded`.
The summary is composed from the typed checkpoint fields (region,
org, project, env, framework, intro flag) — pre-redacted, no
free-form user input. `lastPhase` defaults to `'unknown'` because
the checkpoint deliberately drops `runPhase` from its restored set
(so it re-evaluates on resume).

InkUI / LoggingUI no-op via the optional WizardUI method.
Adds `run_resumed: 1` to EVENT_DATA_VERSIONS and tests covering
envelope shape, the 'unknown' last_phase fallback, and version
stamping.
Previously the inner-lifecycle PostToolUse hook unconditionally
emitted `file_change_applied` for any Edit/Write/MultiEdit even when
the tool itself reported `is_error: true`. That falsely advertised
the write as successful on the orchestrator's audit trail.

Add `error: file_change_failed { path, operation, errorClass,
errorMessage }`. Gated on the `tool_response.is_error` /
`tool_response.error` shape via the new `extractToolFailureMessage()`
helper — when a failure is detected, AgentUI emits the typed error
event and the success-side `recordFileChangeApplied` + ledger
post-write are skipped (the pre-write ledger entry stays so a
cancelled run can still roll back).

`errorClass` discriminates the common failure modes:
  - `permission`  — EACCES / "permission denied" / write_refused
  - `not_found`   — ENOENT / "no such file"
  - `syntax`      — Edit/MultiEdit string-match failures
  - `generic`     — fallback

InkUI / LoggingUI no-op via the optional WizardUI method.
Adds `file_change_failed: 1` to EVENT_DATA_VERSIONS, regression
tests for the wire shape, the classifier's match precedence
(syntax wins over not_found for "String to replace not found"),
and the inner-lifecycle gate ensuring failed writes don't
double-emit as applied.
…se content

Final CI-gate cleanup for the v2 protocol stack:
- prettier-formatted the v2 event interfaces, AgentUI emitters, and
  the regression test file
- narrowed `obj.content[0]` to `unknown` in `extractToolFailureMessage`
  to silence `@typescript-eslint/no-unsafe-assignment`; runtime checks
  unchanged (we already gate on `typeof first === 'object'` before any
  property read)
…t/decision_auto correlation

Introduce a process-local counter that mints stable `dec_<NNN>` ids and
extend the `needs_input` + `decision_auto` data shapes with an optional
`decisionId` field. Bumps both events to `data_version: 2` so consumers
that want strict request/response correlation can branch on
`data_version >= 2`; v1 readers continue to work because the field is
optional at the consumer end.

The generator (`nextDecisionId`) uses a monotonic counter rather than a
UUID so a transcript replayed offline reads the SAME ids it read live —
critical for deterministic log analysis. Wraps at 999 isn't a concern
for single wizard runs (the wizard never asks hundreds of questions).

Test-only `__resetDecisionIdCounterForTests` keeps regression tests
deterministic across files without exposing counter mutation to
production code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and decision_auto pair

Wires the new correlation id through the four `needs_input` callsites
in `AgentUI`:

  - `promptConfirm`           → code='confirm', recommended='yes'
  - `promptChoice`            → code='choice', recommended=options[0]
  - `promptEventPlan`         → code='event_plan', recommended='approved'
  - env-picker auto-fallback  → code='environment_selection'

`emitNeedsInput` now returns the freshly-minted `decisionId`; each
caller captures it locally and threads it into the matching
`emitDecisionAuto` call. The env-picker previously never emitted a
`decision_auto` for its auto-select fallback path — adding one here
closes the request/response correlation loop for that prompt too.

The env-var-resolved `promptEventPlan` short-circuit (where
`AMPLITUDE_WIZARD_EVENT_PLAN_DECISION` answers the prompt without ever
emitting `needs_input`) is intentionally unchanged — there's no request
envelope to correlate against, which is the documented contract: the
ABSENCE of `needs_input + decision_auto` for the same code signals
"resolved by user-driven flag." The stdin-resolved env-picker path is
also unchanged: an orchestrator that wrote the answer to stdin already
knows which request it answered, and the wire today has no
`decision_human` envelope to annotate (would be a new envelope, which
this PR is explicitly scoped to avoid).

Regression coverage in `agent-ui-decision-id.test.ts`:

  - Generator emits `dec_NNN` zero-padded, monotonic across calls
  - `promptConfirm` and `promptChoice` request/response pairs share id
  - Two back-to-back `confirm` prompts mint DISTINCT ids — the bug this
    PR exists to fix (orchestrators previously relied on timing +
    `code` heuristics, which fails for same-code prompts in series)
  - `emitNeedsInput` returns the minted id and honors an explicit
    override (forward-compat for paged emitters that want stable ids
    across pages)
  - Both events stamp `data_version: 2`

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…or needs_input

PR B2 deferred reconciling the two names ("needs_input" on the wire,
"waiting for user" in human-facing copy and early design notes) because
`NeedsInputData` already provided the typed schema. The deferral was
correct — there is no separate wire event to ship — but orchestrators
reading protocol docs still see both spellings and have no way to
import the one that matches their mental model.

This commit adds three type-identical re-exports in `agent-events.ts`:
  - `WaitingForUserData<V>`     → `NeedsInputData<V>`
  - `WaitingForUserWireData<V>` → `NeedsInputWireData<V>`
  - `WaitingForUserEvent<V>`    → `NeedsInputEvent<V>`

Pure type aliases — no new emit path, no `EVENT_DATA_VERSIONS` entry
(documented as a load-bearing absence — adding one would require a
separate emitter + schema + tests). The wire-format `event`
discriminator remains `'needs_input'` so existing subscribers keep
working.

Regression test pins both directions of the assignability proof, the
wire-event discriminator invariant, and the absence of a
`EVENT_DATA_VERSIONS.waiting_for_user` registry entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patterns

PR B2 shipped `file_change_failed` with four errorClass values
(`permission` / `not_found` / `syntax` / `generic`). Orchestrators
parsing the stream couldn't distinguish a transient timeout from a
permanent failure — both fell into `generic` and burned retry budget.

This commit:

  1. Adds `'timeout'` to the `FileChangeErrorClass` union, documented
     as the transient-by-definition variant. Retry-aware orchestrators
     can re-issue the write without changing input.

  2. Extends `classifyFileChangeError` patterns:
       - `permission`: + EPERM, + read-only file system / EROFS
       - `not_found`:  + "does not exist"
       - `syntax`:     + "did not match", + "unexpected token", + "invalid JSON"
       - `timeout`:    NEW — ETIMEDOUT, "timed out", "timeout",
                       "deadline exceeded"

  3. Pins the classifier ordering so `timeout` wins over `not_found`
     when both signals appear in the message (the SDK occasionally
     wraps an ETIMEDOUT with secondary "not found in cache" text).

Wire-shape change: the `FileChangeErrorClass` union grew. Orchestrators
already keyed off the literal string — adding a new variant is a
breaking enum change in principle, but the existing `data_version: 1`
on `file_change_failed` doesn't bump because we are ADDING, not
RENAMING. New consumers branch on the wider set; old consumers fall
through their default arm. The schema doc was updated to flag the
"adding a variant is still a data_version bump in principle" rule.

`emitFileChangeFailed` now consumes the exported `FileChangeErrorClass`
type rather than an inline literal — a future widening lives in one
place. `wizard-ui.ts` mirrors the wider union on the optional method.

Regression tests cover all new patterns + the timeout-over-not_found
ordering guard. Existing tests continue to pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR B (#714) shipped `transient_retry`, the "wizard decided to retry, sleeping
Ns" decision envelope. Orchestrators parsing the stream could see retries
decided, but had no way to tell when the retry attempt actually BEGAN — the
backoff sleep (up to 30s) and proactive bearer refresh both fell into a
silent gap. A consumer rendering "retrying… (attempt 2/4)" had to guess at
the boundary from timing.

This commit adds `attempt_started`, emitted at the top of each outer-loop
iteration in `agent-interface.ts`, AFTER the backoff sleep has elapsed,
AFTER the proactive bearer refresh has run, but BEFORE the inner SDK query
fires. Paired with `transient_retry`, orchestrators see:

  transient_retry { attempt: 2, nextRetryInMs: 2500, reason: 'stall' }
  ...2.5s of silence...
  attempt_started { attemptNumber: 2, backoffMs: 2500, reason: 'stall_retry' }
  ...inner work...

Schema additions in `agent-events.ts`:
  - `AttemptStartedReason` — `'cold_start' | 'stall_retry' | 'auth_refresh' |
    'network_retry'`
  - `AttemptStartedData` — `{ attemptNumber, totalBudget, reason, backoffMs? }`
  - `EVENT_DATA_VERSIONS.attempt_started: 1`
  - Added to `InnerAgentLifecycleData` union

Wire-up in `agent-interface.ts`:
  - Stash `attemptStartedReason` + `attemptStartedBackoffMs` at the retry
    decision points (already-existing `lastRetryReason` mapping).
  - Track `bearerRotatedThisAttempt` in the `refreshGatewayBearer` callback
    so an attempt starting with a freshly-rotated token reports
    `'auth_refresh'` instead of the underlying stall/network reason.
  - Emit AFTER the bearer refresh but BEFORE the SDK query so the envelope
    fires when work actually begins.
  - Cold-start attempt fires with `reason: 'cold_start'` and OMITS `backoffMs`
    on the wire (orchestrators read absence as "no preceding sleep").

`emitAttemptStarted` is wired as an optional method on `WizardUI` —
InkUI / LoggingUI no-op (their existing retry banner / log line already
covers their UX), so orchestrators are the only consumers.

Regression coverage: 7 new tests on `agent-ui-v2-events.test.ts` covering
the envelope shape, the data_version stamp, all four reason values, the
cold-start no-backoffMs invariant, the explicit-0 round-trip, and the
non-AgentUI no-op contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on boundaries

Long-running multi-item operations (post-agent step queue, multi-editor MCP
install, event-plan write) emit a stream of fine-grained per-item events
(`post_agent_step`, `tool_call`). Orchestrators that want a single progress
bar have to reconstruct the rollup from those events — track the seeded
total, count completions, compute the percent themselves. The same logic
gets repeated in every consumer.

This commit adds `progress_estimate` — the canonical
`(stage, current, total, percent)` rollup. Orchestrators subscribe to
this single event and ignore the fine-grained stream when all they want
is a progress bar.

Schema additions in `agent-events.ts`:
  - `ProgressEstimateData` — `{ event, stage, current, total, percent }`
  - `EVENT_DATA_VERSIONS.progress_estimate: 1`
  - `buildProgressEstimate(stage, current, total)` — pure helper that
    clamps current to `[0, total]`, floors fractional values, and returns
    null for zero-item operations (no `progress_estimate` on the wire
    when there's no work to do).

Wire-up:
  - `AgentUI.emitProgressEstimate` — new emitter, delegates to
    `buildProgressEstimate` for the clamp/floor/null logic.
  - `AgentUI.setPostAgentStep` — emit at terminal-state transitions
    (`completed` / `skipped`) only. `in_progress` / `pending` don't bump
    so a bouncing step can't lower the percent. Set-based dedup
    guarantees the count is monotone across duplicate completions.
  - `AgentUI.seedPostAgentSteps` — track ids + done-set so the rollup
    can compute total locally. A second seed call resets the tracking
    so the new queue doesn't inherit the old queue's progress.
  - `addMCPServer` (mcp-install step) — emit one rollup per editor
    after each successful `addServer`. Failures don't bump (orchestrators
    infer "stuck at X/N" from absence + error envelope).
  - `WizardUI.emitProgressEstimate?` — optional. InkUI / LoggingUI
    no-op (their existing journey stepper / panel covers the UX).

Stages emitted today:
  - `'post_agent_steps'` — post-agent queue advance
  - `'mcp_install'`      — multi-editor MCP install loop

Additional stages (e.g. `'event_plan_write'`) land as new long-running
operations get wired.

Regression coverage: 13 new tests on `agent-ui-v2-events.test.ts`
covering the envelope shape, the data_version stamp, terminal-state-only
transitions, idempotent re-emission, queue reset on re-seed, the
zero-total no-op, current clamping in both directions, and the pure
helper's full behaviour matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today the wizard emits a coarse `run_phase: cold_start` lifecycle event
when cold start begins. Parent agents driving the spinner know the
wizard IS cold-starting but have zero visibility into WHICH phase
consumed the time. Cold start is 5-30s of perceived silence on the
spinner, and the lion's share is one of a handful of identifiable
phases (skill staging, package-manager probing, framework preflight,
MCP bootstrap, gateway probe).

This commit adds `cold_start_breakdown` — a `{phase, startedAt,
finishedAt, durationMs}` envelope emitted at the END of each of the
five cold-start phase boundaries. Parent agents subscribe and can:

  - render which phase is CURRENTLY active during the spinner
    (update UI on each fire),
  - surface "your cold start is slow because phase X took Ys"
    diagnostics on a hung run,
  - aggregate per-phase timings across runs for performance tracking
    without re-parsing log lines.

Critical: each phase is wrapped in a `try/finally` boundary so the
breakdown event always ships, even when the phase throws —
orchestrators see a timing breadcrumb for the failing phase (paired
with the subsequent thrown error envelope) instead of a silent gap on
the wire.

Schema additions in `agent-events.ts`:
  - `ColdStartPhase` — the five canonical phase ids
  - `ColdStartBreakdownData` — the wire shape
  - `EVENT_DATA_VERSIONS.cold_start_breakdown: 1`
  - `buildColdStartBreakdown(phase, startedAt, finishedAt)` — pure
    helper that floors `durationMs` at 0 (guards a non-monotonic
    clock from shipping a negative duration) and bumps `finishedAt`
    up to `startedAt` so the invariant `finishedAt >= startedAt`
    holds on the wire.

Wire-up:
  - `AgentUI.emitColdStartBreakdown` — new emitter, delegates to
    `buildColdStartBreakdown` for the clamp logic.
  - `WizardUI.emitColdStartBreakdown?` — optional. InkUI / LoggingUI
    no-op (the TUI already surfaces the labeled cold-start activity
    line; CI logs the per-phase step messages).
  - `agent-runner.ts` — wraps skill_staging, package_manager_detection,
    framework_detection in `try/finally`.
  - `agent-interface.ts` — wraps gateway_probe (`checkGatewayLiveness`)
    and mcp_bootstrap (`createWizardToolsServer`) in `try/finally`.

Regression coverage: 11 new tests covering the envelope shape, the
data_version stamp, per-phase emission for all five phases, the
non-monotonic-clock floor, the pure helper's full behaviour matrix,
the WizardUI optional-method contract, and the critical
"emission survives a thrown phase" path. Total file: 64 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ries

Today the wizard ships one fine-grained `tool_call` NDJSON envelope per
PreToolUse (30-200 per run). Parent agents that want a "tool usage"
summary at completion have to maintain their own running aggregate. The
wizard already has all the data — it should emit the rollup.

This PR adds a `tool_call_summary { totalCalls, byTool, byOutcome,
durationMsTotal, durationMsAvg, topToolByCount? }` event emitted at two
boundaries:

  1. Phase finalize — before `run_phase: finalizing` so an orchestrator
     can render the inner-agent tool summary before the post-agent
     steps section appears.
  2. Terminal exit — inside `wizardSuccessExit` / `wizardAbort` (after
     post-agent steps), so the orchestrator always sees a final
     cumulative rollup covering the whole run.

Dedup-safety: the emitter no-ops when the payload signature is
identical to the previous emission (so a duplicate boundary call
doesn't double-emit). `totalCalls === 0` is suppressed at the wire —
absence of the event means no tools were called.

Implementation:

  - `ToolCallStats` pure accumulator in `agent-events.ts` (per-tool
    counts, per-outcome counts, FIFO-by-tool duration pairing).
  - `AgentUI.emitToolCall` increments the accumulator; new
    `AgentUI.recordToolOutcome` is called from the PostToolUse hook in
    `inner-lifecycle.ts` for ALL tools (not just write tools) so the
    outcome breakdown is faithful.
  - `WizardUI.emitToolCallSummary?(): void` — optional method; AgentUI
    implements, InkUI/LoggingUI no-op.
  - `data_version: 1` registered in `EVENT_DATA_VERSIONS`.
  - Emit callsites in `agent-runner.ts` (before finalize) and
    `wizard-abort.ts` (`wizardSuccessExit` + `wizardAbort`) wrapped in
    try/catch so the rollup can never block the phase transition or
    exit path.

Regression tests cover the pure accumulator (counting, outcome math,
duration FIFO pairing, non-monotonic clock floor, orphaned outcomes,
top-tool tie-breaking), the wire envelope (shape, data_version, dedup
guard, zero-call suppression), and the inner-lifecycle wiring
(success and error outcomes flowing through PostToolUse).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…or_install

Parent agents (Claude Code, Codex, Cursor) parsing the wizard's
NDJSON stream had no visibility into MCP server state transitions:
the in-process wizard-tools server bootstrapped silently during
cold start, and the editor-install flow silently detected (or
didn't) supported editors and silently installed (or skipped) the
editor MCP config. mcp_status fills that gap with a
{ server, state, transition_ts, detail? } envelope emitted at every
state boundary for both lifecycles.

Wiring:

  wizard_tools server (in-process MCP that the inner agent calls):
    - available — emitted in agent-interface.ts after
      createWizardToolsServer() resolves successfully
    - failed    — emitted in the same try/catch before the boot
      error rethrows, so the lifecycle event lands on the wire
      before any subsequent run_error envelope

  editor_install flow (wizard-mcp install into the user's editor):
    - not_applicable   — no supported clients detected on this
      machine
    - install_skipped  — CI mode short-circuit OR user declined
      the install prompt in McpScreen
    - needs_user_choice — multiple editors detected; awaiting the
      user's pick in McpScreen
    - installed        — at least one client took the config
    - failed           — write-time error (permissions, disk
      full) OR every client install failed

Schema: EVENT_DATA_VERSIONS.mcp_status = 1. Adds MCPStatusData /
MCPStatusServer / MCPStatusState type aliases to agent-events.ts;
optional emitMcpStatus on WizardUI (AgentUI implements with Zod
defense-in-depth on the payload, InkUI / LoggingUI no-op).

Tests: agent-ui-mcp-status.test.ts pins the envelope shape, all
seven (server, state) transitions, the Zod guard rejecting invalid
literals, no-op behaviour on LoggingUI, and stream silence on
non-MCP-related runs. mcp-status-wiring.test.ts covers the
addMCPServerToClientsStep call sites that don't require
host-filesystem probes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today the wizard emits `tool_call` at PreToolUse but NOTHING at
PostToolUse for non-write tools — orchestrators (Claude Code, Cursor,
Codex) see "Bash: pnpm install" and then radio silence until the
next `tool_call`. Three independent audit subagents flagged this as
the biggest parent-agent UX cliff in the wire today.

This PR closes the gap with:

  - `tool_response` envelope (data_version: 1) emitted at PostToolUse
    for EVERY tool the inner agent calls, carrying `outcome`
    (success/error/denied), `durationMs`, optional `exitCode` (Bash),
    a 1024-byte-bounded `contentHead`, `isError`, a 256-byte-bounded
    `errorMessage`, and a 120-char-bounded `summary`.

  - `tool_call.id` correlation field (mirrors the SDK's
    `tool_use_id`). Bumps `tool_call` to data_version: 2 — v1 readers
    continue to work because the field is optional; strict-correlation
    consumers branch on `data_version >= 2`.

  - All free-form text fields go through `redactString` at the emit
    boundary so a misbehaving Bash command can't leak env-value
    secrets. New `truncateToBytes` helper handles UTF-8-safe byte-
    bounded truncation (won't split multi-byte codepoints).

  - PostToolUse emission is wrapped in try/catch — observation must
    NEVER block tool execution.

Stacks on PR #723 (feat/agent-mode-mcp-status).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw kelsonpw requested a review from a team as a code owner May 11, 2026 14:49
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Orphaned JSDoc separated from its function by insertions
    • Moved the JSDoc block for extractToolFailureMessage from above extractToolUseId to directly above extractToolFailureMessage where it belongs.

Create PR

Or push these changes by commenting:

@cursor push 93b6b4782a
Preview (93b6b4782a)
diff --git a/src/lib/inner-lifecycle.ts b/src/lib/inner-lifecycle.ts
--- a/src/lib/inner-lifecycle.ts
+++ b/src/lib/inner-lifecycle.ts
@@ -102,14 +102,6 @@
 }
 
 /**
- * Inspect a PostToolUse hook input for a tool-result error. The SDK
- * surfaces failures via either `tool_response.is_error` /
- * `tool_response.error` (newer hook shape) or a stringified result
- * containing common error markers. Returns the sanitized message when
- * a failure is detected, null when the tool succeeded. Pure — no
- * I/O, no throws.
- */
-/**
  * Extract the SDK's `tool_use_id` from a hook input — the stable
  * correlation key that pairs a PreToolUse `tool_call` with its
  * matching PostToolUse `tool_response`. The SDK consistently surfaces
@@ -216,6 +208,14 @@
   return { content: null, exitCode: undefined };
 }
 
+/**
+ * Inspect a PostToolUse hook input for a tool-result error. The SDK
+ * surfaces failures via either `tool_response.is_error` /
+ * `tool_response.error` (newer hook shape) or a stringified result
+ * containing common error markers. Returns the sanitized message when
+ * a failure is detected, null when the tool succeeded. Pure — no
+ * I/O, no throws.
+ */
 export function extractToolFailureMessage(
   input: Record<string, unknown>,
 ): string | null {

You can send follow-ups to the cloud agent here.

Comment thread src/lib/inner-lifecycle.ts
…response

# Conflicts:
#	src/lib/__tests__/graceful-exit.test.ts
#	src/lib/agent-events.ts
#	src/lib/agent-runner.ts
#	src/lib/graceful-exit.ts
#	src/lib/inner-lifecycle.ts
#	src/ui/agent-ui.ts
#	src/utils/wizard-abort.ts
@kelsonpw kelsonpw force-pushed the feat/agent-mode-tool-response branch from 6c325e5 to 74e67fd Compare May 14, 2026 18:03
@kelsonpw kelsonpw changed the base branch from feat/agent-mode-mcp-status to main May 14, 2026 18:05
…hema 2020-12 fragments (#732)

* feat(agent-mode): replace responseSchema English strings with JSON Schema 2020-12 fragments

Audit #3 (Codex/non-Anthropic orchestrator POV) flagged the v2
needs_input.responseSchema shape as a vendor-portability blocker:

    responseSchema: { appId: 'string (required, from choices[].value)' }

Non-Claude orchestrators (Codex, GPT-5, Mistral) couldn't programmatically
validate stdin payloads against the English-in-JSON string — they had to
run an LLM over the description to decide whether `{ appId: '769610' }`
was a legal response. That violates the "perfectly suited for ... codex,
anything that runs this against code bases via a parent agent" charter.

Replaces the shape with a proper JSON Schema 2020-12 fragment so any
orchestrator can load `ajv` / `jsonschema` and validate directly:

    responseSchema: {
      \$schema: 'https://json-schema.org/draft/2020-12/schema',
      type: 'object',
      properties: {
        appId: {
          type: 'string',
          pattern: '^\\\\d+$',
          description: 'Numeric Amplitude app ID — ...',
        },
      },
      required: ['appId'],
    }

Changes:
- Add ResponseSchemaFragment + JsonSchemaProperty exports to agent-events.
- Update NeedsInputData / NeedsInputWireData to use the structured type.
- Bump EVENT_DATA_VERSIONS.needs_input from 2 to 3 (BREAKING for any
  consumer that parsed the prior English strings — most orchestrators
  round-trip via resumeFlags and are unaffected).
- Update emitter callsites in src/ui/agent-ui.ts (legacy `prompt` +
  structured `needs_input` for environment_selection) and
  src/commands/projects.ts (project_selection).
- Use `pattern: '^\\d+\$'` rather than `enum` because both callsites
  set `allowManualEntry: true` — a closed enum would contradict the
  manualEntry contract that lets orchestrators submit an above-cap
  app-id.
- New regression suite (agent-ui-response-schema.test.ts, 9 tests)
  pinning the wire shape, data_version: 3 stamp, validateEnvelopeOrLog
  round-trip, and an inline snapshot for the env-selection fragment.
- Update agent-ui-decision-id.test.ts to match the new v3 stamp.

Stacks on #730 (feat/agent-mode-tool-response).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bugbot): stamp data_version on manual projects list needs_input envelope

PR B12 bumped `EVENT_DATA_VERSIONS.needs_input` to 3 for the JSON
Schema 2020-12 responseSchema shape. The `wizard projects list
--agent` envelope at `commands/projects.ts:59` is manually
constructed via `JSON.stringify` and writes directly to
`process.stdout`, bypassing the `emit()` helper that auto-stamps
`data_version` from the registry. Result: the new v3 envelope
ships with no `data_version` field at all — orchestrators that
branch on `data_version >= 3` for the new shape treat this
envelope as pre-v3 and misinterpret the JSON Schema object as
the old English-string format.

Import `EVENT_DATA_VERSIONS` and read `.needs_input` at emit
time so future bumps (v4, v5…) stay in sync without touching
this file.

* fix(bugbot): include decisionId on manual projects list needs_input envelope

The previous fix stamped `data_version: EVENT_DATA_VERSIONS.needs_input`
(currently 3) but omitted the `decisionId` field. `NeedsInputWireData`
declares `decisionId: string` as required since v2 — orchestrators
that branch on `data_version >= 2` expect `data.decisionId` for
correlation with the matching `decision_auto` / `user_input_received`
partner.

Manual envelope construction bypasses `emitNeedsInput` (which mints
the id internally), so mint via `nextDecisionId()` directly.

* refactor(bugbot): extract appIdResponseSchema factory to prevent wire drift

The same JSON Schema 2020-12 fragment (`$schema`, `type: 'object'`,
`properties.appId.{type:'string', pattern:'^\\d+$', description}`,
`required: ['appId']`) was hand-written identically in three places:
- agent-ui.ts env-selection legacy `prompt` event emit
- agent-ui.ts env-selection canonical `emitNeedsInput`
- commands/projects.ts manual envelope

Two of three were byte-identical. A future change (drop `pattern` for
a real `enum`, add a property, bump dialect) lands in three places
silently or — worse — in one place, drifting the wire contract.

Extracted `appIdResponseSchema(description)` as a typed factory in
`agent-events.ts` alongside the `ResponseSchemaFragment` interface.
Only the human-facing `description` varies between callsites; the
structural parts are now one place. All 9 response-schema tests
still pass.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Redundant extractToolFailureMessage call in PostToolUse hook
    • Replaced the redundant second extractToolFailureMessage(input) call at line 578 with the already-computed outcomeFailureMessage variable from line 470, eliminating both the redundancy and the inconsistent error handling.

Create PR

Or push these changes by commenting:

@cursor push e70c5bb870
Preview (e70c5bb870)
diff --git a/src/lib/inner-lifecycle.ts b/src/lib/inner-lifecycle.ts
--- a/src/lib/inner-lifecycle.ts
+++ b/src/lib/inner-lifecycle.ts
@@ -154,8 +154,8 @@
     typeof input.tool_response !== 'undefined'
       ? input.tool_response
       : typeof input.tool_result !== 'undefined'
-      ? input.tool_result
-      : null;
+        ? input.tool_result
+        : null;
   if (result === null || result === undefined) {
     return { content: null, exitCode: undefined };
   }
@@ -224,8 +224,8 @@
     typeof input.tool_response !== 'undefined'
       ? input.tool_response
       : typeof input.tool_result !== 'undefined'
-      ? input.tool_result
-      : null;
+        ? input.tool_result
+        : null;
   if (result === null || result === undefined) return null;
   if (typeof result === 'object') {
     const obj = result as Record<string, unknown>;
@@ -334,14 +334,14 @@
       typeof input.tool_name === 'string'
         ? input.tool_name
         : typeof input.toolName === 'string'
-        ? input.toolName
-        : 'unknown';
+          ? input.toolName
+          : 'unknown';
     const toolInput =
       typeof input.tool_input !== 'undefined'
         ? input.tool_input
         : typeof input.toolInput !== 'undefined'
-        ? input.toolInput
-        : null;
+          ? input.toolInput
+          : null;
     const summary = summarizeToolInput(toolName, toolInput);
     // Capture the SDK's `tool_use_id` so the PostToolUse `tool_response`
     // envelope can correlate with this `tool_call`. Stored in a closure-
@@ -411,8 +411,8 @@
         typeof obj.file_path === 'string'
           ? obj.file_path
           : typeof obj.path === 'string'
-          ? obj.path
-          : null;
+            ? obj.path
+            : null;
       if (path) {
         try {
           getUI().recordFileChangePlanned({ path, operation });
@@ -454,8 +454,8 @@
       typeof input.tool_name === 'string'
         ? input.tool_name
         : typeof input.toolName === 'string'
-        ? input.toolName
-        : 'unknown';
+          ? input.toolName
+          : 'unknown';
 
     // Record the run-level tool-call outcome for ALL tools (not just
     // write tools) so `tool_call_summary` carries an accurate
@@ -524,8 +524,8 @@
           typeof input.tool_input !== 'undefined'
             ? input.tool_input
             : typeof input.toolInput !== 'undefined'
-            ? input.toolInput
-            : null;
+              ? input.toolInput
+              : null;
         const responseSummary = summarizeToolInput(
           toolName,
           toolInputForSummary,
@@ -556,8 +556,8 @@
       typeof input.tool_input !== 'undefined'
         ? input.tool_input
         : typeof input.toolInput !== 'undefined'
-        ? input.toolInput
-        : null;
+          ? input.toolInput
+          : null;
     const obj =
       toolInput && typeof toolInput === 'object'
         ? (toolInput as Record<string, unknown>)
@@ -566,8 +566,8 @@
       typeof obj.file_path === 'string'
         ? obj.file_path
         : typeof obj.path === 'string'
-        ? obj.path
-        : null;
+          ? obj.path
+          : null;
     if (path) {
       // v2 protocol: gate on tool_result outcome. If the write tool
       // surfaced an error, emit `file_change_failed` and SKIP
@@ -575,14 +575,13 @@
       // successful write to the orchestrator's audit trail. The
       // pre-write entry stays in the rollback ledger so a cancelled
       // run can still restore the original on-disk state.
-      const failureMessage = extractToolFailureMessage(input);
-      if (failureMessage !== null) {
+      if (outcomeFailureMessage !== null) {
         try {
           getUI().emitFileChangeFailed?.({
             path,
             operation,
-            errorClass: classifyFileChangeError(failureMessage),
-            errorMessage: failureMessage,
+            errorClass: classifyFileChangeError(outcomeFailureMessage),
+            errorMessage: outcomeFailureMessage,
           });
         } catch {
           // See preToolUse — same defensive swallow.

You can send follow-ups to the cloud agent here.

Comment thread src/lib/inner-lifecycle.ts Outdated
@kelsonpw
Copy link
Copy Markdown
Member Author

@cursor push e70c5bb

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: extractToolFailureMessage false positive on is_error: false with error field
    • Added obj.is_error !== false guard to the error-string fallback branch so that when is_error is explicitly false, the error string is ignored and the tool result is not misclassified as a failure.

Create PR

Or push these changes by commenting:

@cursor push 5318fdd275
Preview (5318fdd275)
diff --git a/src/lib/inner-lifecycle.ts b/src/lib/inner-lifecycle.ts
--- a/src/lib/inner-lifecycle.ts
+++ b/src/lib/inner-lifecycle.ts
@@ -154,8 +154,8 @@
     typeof input.tool_response !== 'undefined'
       ? input.tool_response
       : typeof input.tool_result !== 'undefined'
-        ? input.tool_result
-        : null;
+      ? input.tool_result
+      : null;
   if (result === null || result === undefined) {
     return { content: null, exitCode: undefined };
   }
@@ -224,13 +224,18 @@
     typeof input.tool_response !== 'undefined'
       ? input.tool_response
       : typeof input.tool_result !== 'undefined'
-        ? input.tool_result
-        : null;
+      ? input.tool_result
+      : null;
   if (result === null || result === undefined) return null;
   if (typeof result === 'object') {
     const obj = result as Record<string, unknown>;
     // Newer SDK shape: `{ is_error: true, error: 'msg' }` or `{ error: 'msg' }`.
-    if (obj.is_error === true || (obj.error && typeof obj.error === 'string')) {
+    // When `is_error` is explicitly `false`, the boolean flag is authoritative
+    // and the `error` string (possibly an informational warning) is ignored.
+    if (
+      obj.is_error === true ||
+      (obj.is_error !== false && obj.error && typeof obj.error === 'string')
+    ) {
       const msg = typeof obj.error === 'string' ? obj.error : 'tool error';
       return msg;
     }
@@ -334,14 +339,14 @@
       typeof input.tool_name === 'string'
         ? input.tool_name
         : typeof input.toolName === 'string'
-          ? input.toolName
-          : 'unknown';
+        ? input.toolName
+        : 'unknown';
     const toolInput =
       typeof input.tool_input !== 'undefined'
         ? input.tool_input
         : typeof input.toolInput !== 'undefined'
-          ? input.toolInput
-          : null;
+        ? input.toolInput
+        : null;
     const summary = summarizeToolInput(toolName, toolInput);
     // Capture the SDK's `tool_use_id` so the PostToolUse `tool_response`
     // envelope can correlate with this `tool_call`. Stored in a closure-
@@ -411,8 +416,8 @@
         typeof obj.file_path === 'string'
           ? obj.file_path
           : typeof obj.path === 'string'
-            ? obj.path
-            : null;
+          ? obj.path
+          : null;
       if (path) {
         try {
           getUI().recordFileChangePlanned({ path, operation });
@@ -454,8 +459,8 @@
       typeof input.tool_name === 'string'
         ? input.tool_name
         : typeof input.toolName === 'string'
-          ? input.toolName
-          : 'unknown';
+        ? input.toolName
+        : 'unknown';
 
     // Record the run-level tool-call outcome for ALL tools (not just
     // write tools) so `tool_call_summary` carries an accurate
@@ -524,8 +529,8 @@
           typeof input.tool_input !== 'undefined'
             ? input.tool_input
             : typeof input.toolInput !== 'undefined'
-              ? input.toolInput
-              : null;
+            ? input.toolInput
+            : null;
         const responseSummary = summarizeToolInput(
           toolName,
           toolInputForSummary,
@@ -556,8 +561,8 @@
       typeof input.tool_input !== 'undefined'
         ? input.tool_input
         : typeof input.toolInput !== 'undefined'
-          ? input.toolInput
-          : null;
+        ? input.toolInput
+        : null;
     const obj =
       toolInput && typeof toolInput === 'object'
         ? (toolInput as Record<string, unknown>)
@@ -566,8 +571,8 @@
       typeof obj.file_path === 'string'
         ? obj.file_path
         : typeof obj.path === 'string'
-          ? obj.path
-          : null;
+        ? obj.path
+        : null;
     if (path) {
       // v2 protocol: gate on tool_result outcome. If the write tool
       // surfaced an error, emit `file_change_failed` and SKIP

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 3b9469b. Configure here.

Comment thread src/lib/inner-lifecycle.ts
@kelsonpw
Copy link
Copy Markdown
Member Author

@cursor push 5318fdd

@kelsonpw kelsonpw merged commit f49da82 into main May 15, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants