feat(agent-mode): tool_response envelope + tool_call.id correlation#730
Conversation
Today the SIGINT handler only registers in the TUI branch — agent + CI runs get hard-killed by Node's default behaviour, so an orchestrator parsing the NDJSON stream sees an abrupt EOF instead of a clean `run_completed: cancelled` envelope. No checkpoint is saved on the way out, either. Extract the handler into `installAbortSignalHandler` (shared helper in `src/lib/graceful-exit.ts`) and wire it into the agent + CI branches in `src/commands/default.ts` immediately after the session is built. The TUI keeps its inline registration because it needs direct access to the Ink store's `setCommandFeedback`. `WizardAbortOptions` gains a `reason` field so SIGINT (and downstream PRs: LOCK_HELD, AUTH_REQUIRED midRun) can pass a stable machine-readable code to the terminal `run_completed` envelope, instead of relying on the sanitized human-readable `message`.
The 2s graceful-exit timer in `performGracefulExit` called
`process.exit(130)` with no terminal NDJSON event — a parent agent
parsing the stream saw an abrupt EOF that's indistinguishable from a
crash. Add an `emitRunCompleted({ outcome: 'cancelled', exitCode: 130,
reason: 'sigint' })` call before the timer fires so AgentUI lands the
terminal envelope on stdout (InkUI / LoggingUI no-op as before).
Regression test in `graceful-exit.test.ts` spies on the UI and asserts
the call lands before the 2s timer.
When a long agent run hits an LLM-gateway 401 (or an Amplitude OAuth expiry) the wizard previously aborted with a generic `error` envelope plus a TUI-only OutroScreen prompt. Agent-mode orchestrators saw a plain error and had to guess at whether files were preserved or what command to re-run. Wire the AUTH_ERROR path in `agent-runner.ts` to call `emitAuthRequired` BEFORE `wizardAbort` with the structured payload: mid-run discriminator, `preserveFiles: true`, `partialProgress` flags from `classifyAgentOutcome`, and the canonical login + resume argv. Extends the auth_required wire shape with `midRun`, `preserveFiles`, `partialProgress`, `authSubkind`, plus two new reason discriminators (`amplitude_token_expired`, `gateway_token_expired`). Bumps `EVENT_DATA_VERSIONS.auth_required` from 1 -> 2; existing v1 callers keep working since every new field is optional. `emitAuthRequired` graduates to the WizardUI interface (optional) so business logic can call it through `getUI()` without an AgentUI cast. InkUI / LoggingUI no-op as designed.
After AUTH_RETRY_LIMIT consecutive 401-flavoured api_retry messages
the wizard short-circuits the SDK's ~3-minute retry storm and aborts.
Today that boundary fires `controller.abort('auth_failed')` silently
— orchestrators only see the downstream AUTH_ERROR routing and can't
distinguish "single 401, transient" from "we tried twice, stuck".
Add `lifecycle: auth_retry_exhausted` event with `attempts` + `subkind`
discriminator (`amplitude` / `llm-gateway`). Wired into the SDK
retry-loop boundary in `agent-interface.ts` immediately before
`controller.abort('auth_failed')`. AgentUI implements; InkUI /
LoggingUI no-op (the TUI's AUTH_ERROR outro covers the human path).
New WizardUI optional method `emitAuthRetryExhausted`. Registered in
`EVENT_DATA_VERSIONS` at v1. Extended the existing AUTH_RETRY_LIMIT
test in `agent-interface.test.ts` to assert the emit fires exactly
once with the expected payload.
…emit
Two fixes that together harden the NDJSON wire contract:
1. `data.event` discriminator on anonymous error envelopes. Previously
`AgentUI.setRunError` emitted `data: { name, recoverable,
suggestedAction }` with no `event` field — every other event family
uses `data.event` as the discriminator. Orchestrators filtering on
`data.event` saw nothing for run-aborting errors. Add
`event: 'run_error'` and register `run_error: 1` in
`EVENT_DATA_VERSIONS` so the envelope ships with `data_version`.
2. Zod-validate every envelope at the emit boundary. A new
`validateEnvelopeOrLog` runs each event through a Zod schema (v=1,
ISO timestamp, valid `type`, string `message`) plus a coherence
check that asserts `data_version === EVENT_DATA_VERSIONS[event]`
when the discriminator is set. Failures route to `logToFile` — never
throw, never stderr (Ink owns stdout in TUI mode).
Regression tests in `agent-ui.test.ts`: a property-style check that
every documented emit method produces a wire-valid envelope, plus a
focused test pinning the `event=run_error` shape.
Every AgentErrorType branch in agent-runner.ts emitted a generic `error` envelope with no machine-readable code, then routed to wizardAbort with a human-readable message. Orchestrators couldn't distinguish "retry your launch" (GATEWAY_DOWN 5xx blip) from "upgrade wizard" (GATEWAY_INVALID_REQUEST schema rejection) from "back off" (RATE_LIMIT) without parsing message strings. Adds a new optional `emitRunError(data)` method on WizardUI with a typed `code` discriminator covering GATEWAY_DOWN / GATEWAY_INVALID_REQUEST / RATE_LIMIT / API_ERROR / MCP_MISSING / RESOURCE_MISSING plus an optional `mcpServer` field (wizard-tools vs amplitude-wizard) and a `recoverable` hint. AgentUI implements; InkUI / LoggingUI no-op. Wired into every AgentErrorType branch in agent-runner.ts immediately before the corresponding wizardAbort. The envelope reuses the `run_error` discriminator + data_version=1 registered in the previous commit, so consumers branching on `data.event` + `data.code` get a stable contract. Regression tests in `agent-ui.test.ts` parameterize over every code and assert the emitted envelope shape.
`wizard apply` exited with `ExitCode.INVALID_ARGS=2` whenever the
per-project apply-lock (`acquireApplyLock`) detected an in-flight
holder — an orchestrator routinely interprets exit 2 as "bad flags"
and doesn't retry. Lock contention is automatically retryable.
Add `ExitCode.LOCK_HELD=14` and route the apply-lock collision path
in `src/commands/apply.ts` to it. Emit a terminal `run_completed:
{ outcome: 'error', exitCode: 14, reason: 'lock_held' }` envelope
before `process.exit` so an orchestrator parsing the NDJSON stream
sees the canonical signal instead of an abrupt EOF.
Regression test in `src/commands/__tests__/apply-lock-held.test.ts`
pins the numeric code at 14 (orchestrator contract) and asserts the
holder shape that the handler ships in the `apply_refused` envelope.
Today an orchestrator parsing the NDJSON stream has to infer coarse progress from the firehose of tool_call / status / progress events. Adds a dedicated `lifecycle: run_phase` event that transits five fixed states in order: cold_start -> agent_running -> finalizing -> completed | error. Wiring (each is best-effort, never blocks the run): - cold_start: first cold-start activity in agent-runner.ts - agent_running: first PreToolUse in inner-lifecycle.ts (the emitter dedups so subsequent tool calls don't spam the stream) - finalizing: just before seedPostAgentSteps in agent-runner.ts - completed: in wizardSuccessExit before run_completed - error: in wizardAbort before run_completed New optional `emitRunPhase` on WizardUI. AgentUI implements with deduplication; InkUI / LoggingUI no-op (their UX already implies phase via the journey stepper / log line cadence). Registered `run_phase: 1` in `EVENT_DATA_VERSIONS`. Regression tests in agent-ui.test.ts: per-phase envelope shape, dedup behaviour for repeated calls, and the wire-level ordering invariant.
The SIGINT handler routes through wizardAbort via a dynamic import. In production wizardAbort calls process.exit and never resolves, but vitest's strict-exit guard throws a synthetic exception when process.exit is called in a test — surfacing as an unhandled rejection on the test stream. Wrap the runner in try/catch so the rejection stops at the boundary; production behaviour is unchanged because the throw path doesn't fire there.
PR B subagent's final commit skipped a prettier pass on these five files. Pure whitespace / wrap reflows — no semantic change.
The TUI's "Discovered facts" chips (vertical, app type, framework,
package manager, region) only rendered in Ink. Mirror them onto the
agent-mode NDJSON stream as `progress: discovery_fact { id, label,
value, discoveredAt }` so parent agents (Claude Code, Cursor, Codex)
can render the same chips without parsing every status line.
Hooks into the existing `publishDiscoveryFact` in agent-runner.ts —
the abstract `WizardUI.pushDiscoveryFact` now reaches AgentUI as a
typed emit instead of a no-op. Stable `id` lets orchestrators upsert
chips on retry without flicker.
Adds `discovery_fact: 1` to EVENT_DATA_VERSIONS and a regression
test pinning envelope shape + version stamping.
Every Edit/Write tool call already emits its own `tool_call` event
plus a `file_change_planned`. That's great for an audit log but
noisy for parent agents that just want a "now editing X" header.
Add `progress: current_file { path, relativePath, operation }` —
the coarse rollup orchestrators subscribe to instead of parsing
every fine-grained tool event. Debounced at the wire boundary
(250ms per identical (path, op) tuple) so a tight edit chain
collapses into one event.
`relativePath` is computed against `installDir` so the wire ships a
short label; absolute path preserved for audit. Wired from the
existing inner-lifecycle PreToolUse hook; AgentUI debounces, InkUI /
LoggingUI no-op (optional WizardUI method).
Adds `current_file: 1` to EVENT_DATA_VERSIONS and regression tests
covering envelope shape, version stamping, and the
(path, op) debounce / non-debounce paths.
The wizard already detects stalls (no SDK traffic for N seconds) and
shows TUI hints. Mirror that escalation onto NDJSON so parent agents
can render the same coaching cadence.
`progress: stall_status { tier, durationMs, lastActivity, hint? }`
fires at 10s / 30s / 60s of silence, mapped to `noticed` /
`concerning` / `critical`. Per-tier dedup at the AgentUI boundary
(`resetStallStatus()` clears the gate when activity resumes), so
each tier hits the wire at most once per stall window.
Wired from the existing 10s heartbeat tick in agent-interface — the
same loop that emits `heartbeat` now also derives the silence-window
tier via `deriveStallTier()` (pure helper, regression-tested). An
`onActivity()` shim resets `lastActivityAt` from `onStatus` /
non-`stream_event` SDK messages, so a long tool-call interlude isn't
spuriously stamped as "concerning".
InkUI / LoggingUI no-op via the optional WizardUI methods.
Adds `stall_status: 1` to EVENT_DATA_VERSIONS and tests covering
envelope shape, tier dedup, reset semantics, and the pure helper's
threshold boundaries.
When the wizard restarts from a checkpoint (post-crash, post-SIGINT,
post-token-expiry), the existing `checkpoint_loaded` event carries
only the file age — useful for policy gates but not for UX. Add
`lifecycle: run_resumed { from_checkpoint_at, last_phase,
restored_state_summary }` as the orchestrator-facing companion so
parent agents can render "continuing from where we left off"
instead of "fresh run".
Wired from `loadCheckpoint()` immediately after `emitCheckpointLoaded`.
The summary is composed from the typed checkpoint fields (region,
org, project, env, framework, intro flag) — pre-redacted, no
free-form user input. `lastPhase` defaults to `'unknown'` because
the checkpoint deliberately drops `runPhase` from its restored set
(so it re-evaluates on resume).
InkUI / LoggingUI no-op via the optional WizardUI method.
Adds `run_resumed: 1` to EVENT_DATA_VERSIONS and tests covering
envelope shape, the 'unknown' last_phase fallback, and version
stamping.
Previously the inner-lifecycle PostToolUse hook unconditionally
emitted `file_change_applied` for any Edit/Write/MultiEdit even when
the tool itself reported `is_error: true`. That falsely advertised
the write as successful on the orchestrator's audit trail.
Add `error: file_change_failed { path, operation, errorClass,
errorMessage }`. Gated on the `tool_response.is_error` /
`tool_response.error` shape via the new `extractToolFailureMessage()`
helper — when a failure is detected, AgentUI emits the typed error
event and the success-side `recordFileChangeApplied` + ledger
post-write are skipped (the pre-write ledger entry stays so a
cancelled run can still roll back).
`errorClass` discriminates the common failure modes:
- `permission` — EACCES / "permission denied" / write_refused
- `not_found` — ENOENT / "no such file"
- `syntax` — Edit/MultiEdit string-match failures
- `generic` — fallback
InkUI / LoggingUI no-op via the optional WizardUI method.
Adds `file_change_failed: 1` to EVENT_DATA_VERSIONS, regression
tests for the wire shape, the classifier's match precedence
(syntax wins over not_found for "String to replace not found"),
and the inner-lifecycle gate ensuring failed writes don't
double-emit as applied.
…se content Final CI-gate cleanup for the v2 protocol stack: - prettier-formatted the v2 event interfaces, AgentUI emitters, and the regression test file - narrowed `obj.content[0]` to `unknown` in `extractToolFailureMessage` to silence `@typescript-eslint/no-unsafe-assignment`; runtime checks unchanged (we already gate on `typeof first === 'object'` before any property read)
…t/decision_auto correlation Introduce a process-local counter that mints stable `dec_<NNN>` ids and extend the `needs_input` + `decision_auto` data shapes with an optional `decisionId` field. Bumps both events to `data_version: 2` so consumers that want strict request/response correlation can branch on `data_version >= 2`; v1 readers continue to work because the field is optional at the consumer end. The generator (`nextDecisionId`) uses a monotonic counter rather than a UUID so a transcript replayed offline reads the SAME ids it read live — critical for deterministic log analysis. Wraps at 999 isn't a concern for single wizard runs (the wizard never asks hundreds of questions). Test-only `__resetDecisionIdCounterForTests` keeps regression tests deterministic across files without exposing counter mutation to production code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and decision_auto pair
Wires the new correlation id through the four `needs_input` callsites
in `AgentUI`:
- `promptConfirm` → code='confirm', recommended='yes'
- `promptChoice` → code='choice', recommended=options[0]
- `promptEventPlan` → code='event_plan', recommended='approved'
- env-picker auto-fallback → code='environment_selection'
`emitNeedsInput` now returns the freshly-minted `decisionId`; each
caller captures it locally and threads it into the matching
`emitDecisionAuto` call. The env-picker previously never emitted a
`decision_auto` for its auto-select fallback path — adding one here
closes the request/response correlation loop for that prompt too.
The env-var-resolved `promptEventPlan` short-circuit (where
`AMPLITUDE_WIZARD_EVENT_PLAN_DECISION` answers the prompt without ever
emitting `needs_input`) is intentionally unchanged — there's no request
envelope to correlate against, which is the documented contract: the
ABSENCE of `needs_input + decision_auto` for the same code signals
"resolved by user-driven flag." The stdin-resolved env-picker path is
also unchanged: an orchestrator that wrote the answer to stdin already
knows which request it answered, and the wire today has no
`decision_human` envelope to annotate (would be a new envelope, which
this PR is explicitly scoped to avoid).
Regression coverage in `agent-ui-decision-id.test.ts`:
- Generator emits `dec_NNN` zero-padded, monotonic across calls
- `promptConfirm` and `promptChoice` request/response pairs share id
- Two back-to-back `confirm` prompts mint DISTINCT ids — the bug this
PR exists to fix (orchestrators previously relied on timing +
`code` heuristics, which fails for same-code prompts in series)
- `emitNeedsInput` returns the minted id and honors an explicit
override (forward-compat for paged emitters that want stable ids
across pages)
- Both events stamp `data_version: 2`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…or needs_input
PR B2 deferred reconciling the two names ("needs_input" on the wire,
"waiting for user" in human-facing copy and early design notes) because
`NeedsInputData` already provided the typed schema. The deferral was
correct — there is no separate wire event to ship — but orchestrators
reading protocol docs still see both spellings and have no way to
import the one that matches their mental model.
This commit adds three type-identical re-exports in `agent-events.ts`:
- `WaitingForUserData<V>` → `NeedsInputData<V>`
- `WaitingForUserWireData<V>` → `NeedsInputWireData<V>`
- `WaitingForUserEvent<V>` → `NeedsInputEvent<V>`
Pure type aliases — no new emit path, no `EVENT_DATA_VERSIONS` entry
(documented as a load-bearing absence — adding one would require a
separate emitter + schema + tests). The wire-format `event`
discriminator remains `'needs_input'` so existing subscribers keep
working.
Regression test pins both directions of the assignability proof, the
wire-event discriminator invariant, and the absence of a
`EVENT_DATA_VERSIONS.waiting_for_user` registry entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patterns
PR B2 shipped `file_change_failed` with four errorClass values
(`permission` / `not_found` / `syntax` / `generic`). Orchestrators
parsing the stream couldn't distinguish a transient timeout from a
permanent failure — both fell into `generic` and burned retry budget.
This commit:
1. Adds `'timeout'` to the `FileChangeErrorClass` union, documented
as the transient-by-definition variant. Retry-aware orchestrators
can re-issue the write without changing input.
2. Extends `classifyFileChangeError` patterns:
- `permission`: + EPERM, + read-only file system / EROFS
- `not_found`: + "does not exist"
- `syntax`: + "did not match", + "unexpected token", + "invalid JSON"
- `timeout`: NEW — ETIMEDOUT, "timed out", "timeout",
"deadline exceeded"
3. Pins the classifier ordering so `timeout` wins over `not_found`
when both signals appear in the message (the SDK occasionally
wraps an ETIMEDOUT with secondary "not found in cache" text).
Wire-shape change: the `FileChangeErrorClass` union grew. Orchestrators
already keyed off the literal string — adding a new variant is a
breaking enum change in principle, but the existing `data_version: 1`
on `file_change_failed` doesn't bump because we are ADDING, not
RENAMING. New consumers branch on the wider set; old consumers fall
through their default arm. The schema doc was updated to flag the
"adding a variant is still a data_version bump in principle" rule.
`emitFileChangeFailed` now consumes the exported `FileChangeErrorClass`
type rather than an inline literal — a future widening lives in one
place. `wizard-ui.ts` mirrors the wider union on the optional method.
Regression tests cover all new patterns + the timeout-over-not_found
ordering guard. Existing tests continue to pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR B (#714) shipped `transient_retry`, the "wizard decided to retry, sleeping Ns" decision envelope. Orchestrators parsing the stream could see retries decided, but had no way to tell when the retry attempt actually BEGAN — the backoff sleep (up to 30s) and proactive bearer refresh both fell into a silent gap. A consumer rendering "retrying… (attempt 2/4)" had to guess at the boundary from timing. This commit adds `attempt_started`, emitted at the top of each outer-loop iteration in `agent-interface.ts`, AFTER the backoff sleep has elapsed, AFTER the proactive bearer refresh has run, but BEFORE the inner SDK query fires. Paired with `transient_retry`, orchestrators see: transient_retry { attempt: 2, nextRetryInMs: 2500, reason: 'stall' } ...2.5s of silence... attempt_started { attemptNumber: 2, backoffMs: 2500, reason: 'stall_retry' } ...inner work... Schema additions in `agent-events.ts`: - `AttemptStartedReason` — `'cold_start' | 'stall_retry' | 'auth_refresh' | 'network_retry'` - `AttemptStartedData` — `{ attemptNumber, totalBudget, reason, backoffMs? }` - `EVENT_DATA_VERSIONS.attempt_started: 1` - Added to `InnerAgentLifecycleData` union Wire-up in `agent-interface.ts`: - Stash `attemptStartedReason` + `attemptStartedBackoffMs` at the retry decision points (already-existing `lastRetryReason` mapping). - Track `bearerRotatedThisAttempt` in the `refreshGatewayBearer` callback so an attempt starting with a freshly-rotated token reports `'auth_refresh'` instead of the underlying stall/network reason. - Emit AFTER the bearer refresh but BEFORE the SDK query so the envelope fires when work actually begins. - Cold-start attempt fires with `reason: 'cold_start'` and OMITS `backoffMs` on the wire (orchestrators read absence as "no preceding sleep"). `emitAttemptStarted` is wired as an optional method on `WizardUI` — InkUI / LoggingUI no-op (their existing retry banner / log line already covers their UX), so orchestrators are the only consumers. Regression coverage: 7 new tests on `agent-ui-v2-events.test.ts` covering the envelope shape, the data_version stamp, all four reason values, the cold-start no-backoffMs invariant, the explicit-0 round-trip, and the non-AgentUI no-op contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on boundaries
Long-running multi-item operations (post-agent step queue, multi-editor MCP
install, event-plan write) emit a stream of fine-grained per-item events
(`post_agent_step`, `tool_call`). Orchestrators that want a single progress
bar have to reconstruct the rollup from those events — track the seeded
total, count completions, compute the percent themselves. The same logic
gets repeated in every consumer.
This commit adds `progress_estimate` — the canonical
`(stage, current, total, percent)` rollup. Orchestrators subscribe to
this single event and ignore the fine-grained stream when all they want
is a progress bar.
Schema additions in `agent-events.ts`:
- `ProgressEstimateData` — `{ event, stage, current, total, percent }`
- `EVENT_DATA_VERSIONS.progress_estimate: 1`
- `buildProgressEstimate(stage, current, total)` — pure helper that
clamps current to `[0, total]`, floors fractional values, and returns
null for zero-item operations (no `progress_estimate` on the wire
when there's no work to do).
Wire-up:
- `AgentUI.emitProgressEstimate` — new emitter, delegates to
`buildProgressEstimate` for the clamp/floor/null logic.
- `AgentUI.setPostAgentStep` — emit at terminal-state transitions
(`completed` / `skipped`) only. `in_progress` / `pending` don't bump
so a bouncing step can't lower the percent. Set-based dedup
guarantees the count is monotone across duplicate completions.
- `AgentUI.seedPostAgentSteps` — track ids + done-set so the rollup
can compute total locally. A second seed call resets the tracking
so the new queue doesn't inherit the old queue's progress.
- `addMCPServer` (mcp-install step) — emit one rollup per editor
after each successful `addServer`. Failures don't bump (orchestrators
infer "stuck at X/N" from absence + error envelope).
- `WizardUI.emitProgressEstimate?` — optional. InkUI / LoggingUI
no-op (their existing journey stepper / panel covers the UX).
Stages emitted today:
- `'post_agent_steps'` — post-agent queue advance
- `'mcp_install'` — multi-editor MCP install loop
Additional stages (e.g. `'event_plan_write'`) land as new long-running
operations get wired.
Regression coverage: 13 new tests on `agent-ui-v2-events.test.ts`
covering the envelope shape, the data_version stamp, terminal-state-only
transitions, idempotent re-emission, queue reset on re-seed, the
zero-total no-op, current clamping in both directions, and the pure
helper's full behaviour matrix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today the wizard emits a coarse `run_phase: cold_start` lifecycle event
when cold start begins. Parent agents driving the spinner know the
wizard IS cold-starting but have zero visibility into WHICH phase
consumed the time. Cold start is 5-30s of perceived silence on the
spinner, and the lion's share is one of a handful of identifiable
phases (skill staging, package-manager probing, framework preflight,
MCP bootstrap, gateway probe).
This commit adds `cold_start_breakdown` — a `{phase, startedAt,
finishedAt, durationMs}` envelope emitted at the END of each of the
five cold-start phase boundaries. Parent agents subscribe and can:
- render which phase is CURRENTLY active during the spinner
(update UI on each fire),
- surface "your cold start is slow because phase X took Ys"
diagnostics on a hung run,
- aggregate per-phase timings across runs for performance tracking
without re-parsing log lines.
Critical: each phase is wrapped in a `try/finally` boundary so the
breakdown event always ships, even when the phase throws —
orchestrators see a timing breadcrumb for the failing phase (paired
with the subsequent thrown error envelope) instead of a silent gap on
the wire.
Schema additions in `agent-events.ts`:
- `ColdStartPhase` — the five canonical phase ids
- `ColdStartBreakdownData` — the wire shape
- `EVENT_DATA_VERSIONS.cold_start_breakdown: 1`
- `buildColdStartBreakdown(phase, startedAt, finishedAt)` — pure
helper that floors `durationMs` at 0 (guards a non-monotonic
clock from shipping a negative duration) and bumps `finishedAt`
up to `startedAt` so the invariant `finishedAt >= startedAt`
holds on the wire.
Wire-up:
- `AgentUI.emitColdStartBreakdown` — new emitter, delegates to
`buildColdStartBreakdown` for the clamp logic.
- `WizardUI.emitColdStartBreakdown?` — optional. InkUI / LoggingUI
no-op (the TUI already surfaces the labeled cold-start activity
line; CI logs the per-phase step messages).
- `agent-runner.ts` — wraps skill_staging, package_manager_detection,
framework_detection in `try/finally`.
- `agent-interface.ts` — wraps gateway_probe (`checkGatewayLiveness`)
and mcp_bootstrap (`createWizardToolsServer`) in `try/finally`.
Regression coverage: 11 new tests covering the envelope shape, the
data_version stamp, per-phase emission for all five phases, the
non-monotonic-clock floor, the pure helper's full behaviour matrix,
the WizardUI optional-method contract, and the critical
"emission survives a thrown phase" path. Total file: 64 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ries
Today the wizard ships one fine-grained `tool_call` NDJSON envelope per
PreToolUse (30-200 per run). Parent agents that want a "tool usage"
summary at completion have to maintain their own running aggregate. The
wizard already has all the data — it should emit the rollup.
This PR adds a `tool_call_summary { totalCalls, byTool, byOutcome,
durationMsTotal, durationMsAvg, topToolByCount? }` event emitted at two
boundaries:
1. Phase finalize — before `run_phase: finalizing` so an orchestrator
can render the inner-agent tool summary before the post-agent
steps section appears.
2. Terminal exit — inside `wizardSuccessExit` / `wizardAbort` (after
post-agent steps), so the orchestrator always sees a final
cumulative rollup covering the whole run.
Dedup-safety: the emitter no-ops when the payload signature is
identical to the previous emission (so a duplicate boundary call
doesn't double-emit). `totalCalls === 0` is suppressed at the wire —
absence of the event means no tools were called.
Implementation:
- `ToolCallStats` pure accumulator in `agent-events.ts` (per-tool
counts, per-outcome counts, FIFO-by-tool duration pairing).
- `AgentUI.emitToolCall` increments the accumulator; new
`AgentUI.recordToolOutcome` is called from the PostToolUse hook in
`inner-lifecycle.ts` for ALL tools (not just write tools) so the
outcome breakdown is faithful.
- `WizardUI.emitToolCallSummary?(): void` — optional method; AgentUI
implements, InkUI/LoggingUI no-op.
- `data_version: 1` registered in `EVENT_DATA_VERSIONS`.
- Emit callsites in `agent-runner.ts` (before finalize) and
`wizard-abort.ts` (`wizardSuccessExit` + `wizardAbort`) wrapped in
try/catch so the rollup can never block the phase transition or
exit path.
Regression tests cover the pure accumulator (counting, outcome math,
duration FIFO pairing, non-monotonic clock floor, orphaned outcomes,
top-tool tie-breaking), the wire envelope (shape, data_version, dedup
guard, zero-call suppression), and the inner-lifecycle wiring
(success and error outcomes flowing through PostToolUse).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…or_install
Parent agents (Claude Code, Codex, Cursor) parsing the wizard's
NDJSON stream had no visibility into MCP server state transitions:
the in-process wizard-tools server bootstrapped silently during
cold start, and the editor-install flow silently detected (or
didn't) supported editors and silently installed (or skipped) the
editor MCP config. mcp_status fills that gap with a
{ server, state, transition_ts, detail? } envelope emitted at every
state boundary for both lifecycles.
Wiring:
wizard_tools server (in-process MCP that the inner agent calls):
- available — emitted in agent-interface.ts after
createWizardToolsServer() resolves successfully
- failed — emitted in the same try/catch before the boot
error rethrows, so the lifecycle event lands on the wire
before any subsequent run_error envelope
editor_install flow (wizard-mcp install into the user's editor):
- not_applicable — no supported clients detected on this
machine
- install_skipped — CI mode short-circuit OR user declined
the install prompt in McpScreen
- needs_user_choice — multiple editors detected; awaiting the
user's pick in McpScreen
- installed — at least one client took the config
- failed — write-time error (permissions, disk
full) OR every client install failed
Schema: EVENT_DATA_VERSIONS.mcp_status = 1. Adds MCPStatusData /
MCPStatusServer / MCPStatusState type aliases to agent-events.ts;
optional emitMcpStatus on WizardUI (AgentUI implements with Zod
defense-in-depth on the payload, InkUI / LoggingUI no-op).
Tests: agent-ui-mcp-status.test.ts pins the envelope shape, all
seven (server, state) transitions, the Zod guard rejecting invalid
literals, no-op behaviour on LoggingUI, and stream silence on
non-MCP-related runs. mcp-status-wiring.test.ts covers the
addMCPServerToClientsStep call sites that don't require
host-filesystem probes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today the wizard emits `tool_call` at PreToolUse but NOTHING at
PostToolUse for non-write tools — orchestrators (Claude Code, Cursor,
Codex) see "Bash: pnpm install" and then radio silence until the
next `tool_call`. Three independent audit subagents flagged this as
the biggest parent-agent UX cliff in the wire today.
This PR closes the gap with:
- `tool_response` envelope (data_version: 1) emitted at PostToolUse
for EVERY tool the inner agent calls, carrying `outcome`
(success/error/denied), `durationMs`, optional `exitCode` (Bash),
a 1024-byte-bounded `contentHead`, `isError`, a 256-byte-bounded
`errorMessage`, and a 120-char-bounded `summary`.
- `tool_call.id` correlation field (mirrors the SDK's
`tool_use_id`). Bumps `tool_call` to data_version: 2 — v1 readers
continue to work because the field is optional; strict-correlation
consumers branch on `data_version >= 2`.
- All free-form text fields go through `redactString` at the emit
boundary so a misbehaving Bash command can't leak env-value
secrets. New `truncateToBytes` helper handles UTF-8-safe byte-
bounded truncation (won't split multi-byte codepoints).
- PostToolUse emission is wrapped in try/catch — observation must
NEVER block tool execution.
Stacks on PR #723 (feat/agent-mode-mcp-status).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Orphaned JSDoc separated from its function by insertions
- Moved the JSDoc block for extractToolFailureMessage from above extractToolUseId to directly above extractToolFailureMessage where it belongs.
Or push these changes by commenting:
@cursor push 93b6b4782a
Preview (93b6b4782a)
diff --git a/src/lib/inner-lifecycle.ts b/src/lib/inner-lifecycle.ts
--- a/src/lib/inner-lifecycle.ts
+++ b/src/lib/inner-lifecycle.ts
@@ -102,14 +102,6 @@
}
/**
- * Inspect a PostToolUse hook input for a tool-result error. The SDK
- * surfaces failures via either `tool_response.is_error` /
- * `tool_response.error` (newer hook shape) or a stringified result
- * containing common error markers. Returns the sanitized message when
- * a failure is detected, null when the tool succeeded. Pure — no
- * I/O, no throws.
- */
-/**
* Extract the SDK's `tool_use_id` from a hook input — the stable
* correlation key that pairs a PreToolUse `tool_call` with its
* matching PostToolUse `tool_response`. The SDK consistently surfaces
@@ -216,6 +208,14 @@
return { content: null, exitCode: undefined };
}
+/**
+ * Inspect a PostToolUse hook input for a tool-result error. The SDK
+ * surfaces failures via either `tool_response.is_error` /
+ * `tool_response.error` (newer hook shape) or a stringified result
+ * containing common error markers. Returns the sanitized message when
+ * a failure is detected, null when the tool succeeded. Pure — no
+ * I/O, no throws.
+ */
export function extractToolFailureMessage(
input: Record<string, unknown>,
): string | null {You can send follow-ups to the cloud agent here.
…response # Conflicts: # src/lib/__tests__/graceful-exit.test.ts # src/lib/agent-events.ts # src/lib/agent-runner.ts # src/lib/graceful-exit.ts # src/lib/inner-lifecycle.ts # src/ui/agent-ui.ts # src/utils/wizard-abort.ts
6c325e5 to
74e67fd
Compare
…hema 2020-12 fragments (#732) * feat(agent-mode): replace responseSchema English strings with JSON Schema 2020-12 fragments Audit #3 (Codex/non-Anthropic orchestrator POV) flagged the v2 needs_input.responseSchema shape as a vendor-portability blocker: responseSchema: { appId: 'string (required, from choices[].value)' } Non-Claude orchestrators (Codex, GPT-5, Mistral) couldn't programmatically validate stdin payloads against the English-in-JSON string — they had to run an LLM over the description to decide whether `{ appId: '769610' }` was a legal response. That violates the "perfectly suited for ... codex, anything that runs this against code bases via a parent agent" charter. Replaces the shape with a proper JSON Schema 2020-12 fragment so any orchestrator can load `ajv` / `jsonschema` and validate directly: responseSchema: { \$schema: 'https://json-schema.org/draft/2020-12/schema', type: 'object', properties: { appId: { type: 'string', pattern: '^\\\\d+$', description: 'Numeric Amplitude app ID — ...', }, }, required: ['appId'], } Changes: - Add ResponseSchemaFragment + JsonSchemaProperty exports to agent-events. - Update NeedsInputData / NeedsInputWireData to use the structured type. - Bump EVENT_DATA_VERSIONS.needs_input from 2 to 3 (BREAKING for any consumer that parsed the prior English strings — most orchestrators round-trip via resumeFlags and are unaffected). - Update emitter callsites in src/ui/agent-ui.ts (legacy `prompt` + structured `needs_input` for environment_selection) and src/commands/projects.ts (project_selection). - Use `pattern: '^\\d+\$'` rather than `enum` because both callsites set `allowManualEntry: true` — a closed enum would contradict the manualEntry contract that lets orchestrators submit an above-cap app-id. - New regression suite (agent-ui-response-schema.test.ts, 9 tests) pinning the wire shape, data_version: 3 stamp, validateEnvelopeOrLog round-trip, and an inline snapshot for the env-selection fragment. - Update agent-ui-decision-id.test.ts to match the new v3 stamp. Stacks on #730 (feat/agent-mode-tool-response). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bugbot): stamp data_version on manual projects list needs_input envelope PR B12 bumped `EVENT_DATA_VERSIONS.needs_input` to 3 for the JSON Schema 2020-12 responseSchema shape. The `wizard projects list --agent` envelope at `commands/projects.ts:59` is manually constructed via `JSON.stringify` and writes directly to `process.stdout`, bypassing the `emit()` helper that auto-stamps `data_version` from the registry. Result: the new v3 envelope ships with no `data_version` field at all — orchestrators that branch on `data_version >= 3` for the new shape treat this envelope as pre-v3 and misinterpret the JSON Schema object as the old English-string format. Import `EVENT_DATA_VERSIONS` and read `.needs_input` at emit time so future bumps (v4, v5…) stay in sync without touching this file. * fix(bugbot): include decisionId on manual projects list needs_input envelope The previous fix stamped `data_version: EVENT_DATA_VERSIONS.needs_input` (currently 3) but omitted the `decisionId` field. `NeedsInputWireData` declares `decisionId: string` as required since v2 — orchestrators that branch on `data_version >= 2` expect `data.decisionId` for correlation with the matching `decision_auto` / `user_input_received` partner. Manual envelope construction bypasses `emitNeedsInput` (which mints the id internally), so mint via `nextDecisionId()` directly. * refactor(bugbot): extract appIdResponseSchema factory to prevent wire drift The same JSON Schema 2020-12 fragment (`$schema`, `type: 'object'`, `properties.appId.{type:'string', pattern:'^\\d+$', description}`, `required: ['appId']`) was hand-written identically in three places: - agent-ui.ts env-selection legacy `prompt` event emit - agent-ui.ts env-selection canonical `emitNeedsInput` - commands/projects.ts manual envelope Two of three were byte-identical. A future change (drop `pattern` for a real `enum`, add a property, bump dialect) lands in three places silently or — worse — in one place, drifting the wire contract. Extracted `appIdResponseSchema(description)` as a typed factory in `agent-events.ts` alongside the `ResponseSchemaFragment` interface. Only the human-facing `description` varies between callsites; the structural parts are now one place. All 9 response-schema tests still pass. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Redundant
extractToolFailureMessagecall in PostToolUse hook- Replaced the redundant second
extractToolFailureMessage(input)call at line 578 with the already-computedoutcomeFailureMessagevariable from line 470, eliminating both the redundancy and the inconsistent error handling.
- Replaced the redundant second
Or push these changes by commenting:
@cursor push e70c5bb870
Preview (e70c5bb870)
diff --git a/src/lib/inner-lifecycle.ts b/src/lib/inner-lifecycle.ts
--- a/src/lib/inner-lifecycle.ts
+++ b/src/lib/inner-lifecycle.ts
@@ -154,8 +154,8 @@
typeof input.tool_response !== 'undefined'
? input.tool_response
: typeof input.tool_result !== 'undefined'
- ? input.tool_result
- : null;
+ ? input.tool_result
+ : null;
if (result === null || result === undefined) {
return { content: null, exitCode: undefined };
}
@@ -224,8 +224,8 @@
typeof input.tool_response !== 'undefined'
? input.tool_response
: typeof input.tool_result !== 'undefined'
- ? input.tool_result
- : null;
+ ? input.tool_result
+ : null;
if (result === null || result === undefined) return null;
if (typeof result === 'object') {
const obj = result as Record<string, unknown>;
@@ -334,14 +334,14 @@
typeof input.tool_name === 'string'
? input.tool_name
: typeof input.toolName === 'string'
- ? input.toolName
- : 'unknown';
+ ? input.toolName
+ : 'unknown';
const toolInput =
typeof input.tool_input !== 'undefined'
? input.tool_input
: typeof input.toolInput !== 'undefined'
- ? input.toolInput
- : null;
+ ? input.toolInput
+ : null;
const summary = summarizeToolInput(toolName, toolInput);
// Capture the SDK's `tool_use_id` so the PostToolUse `tool_response`
// envelope can correlate with this `tool_call`. Stored in a closure-
@@ -411,8 +411,8 @@
typeof obj.file_path === 'string'
? obj.file_path
: typeof obj.path === 'string'
- ? obj.path
- : null;
+ ? obj.path
+ : null;
if (path) {
try {
getUI().recordFileChangePlanned({ path, operation });
@@ -454,8 +454,8 @@
typeof input.tool_name === 'string'
? input.tool_name
: typeof input.toolName === 'string'
- ? input.toolName
- : 'unknown';
+ ? input.toolName
+ : 'unknown';
// Record the run-level tool-call outcome for ALL tools (not just
// write tools) so `tool_call_summary` carries an accurate
@@ -524,8 +524,8 @@
typeof input.tool_input !== 'undefined'
? input.tool_input
: typeof input.toolInput !== 'undefined'
- ? input.toolInput
- : null;
+ ? input.toolInput
+ : null;
const responseSummary = summarizeToolInput(
toolName,
toolInputForSummary,
@@ -556,8 +556,8 @@
typeof input.tool_input !== 'undefined'
? input.tool_input
: typeof input.toolInput !== 'undefined'
- ? input.toolInput
- : null;
+ ? input.toolInput
+ : null;
const obj =
toolInput && typeof toolInput === 'object'
? (toolInput as Record<string, unknown>)
@@ -566,8 +566,8 @@
typeof obj.file_path === 'string'
? obj.file_path
: typeof obj.path === 'string'
- ? obj.path
- : null;
+ ? obj.path
+ : null;
if (path) {
// v2 protocol: gate on tool_result outcome. If the write tool
// surfaced an error, emit `file_change_failed` and SKIP
@@ -575,14 +575,13 @@
// successful write to the orchestrator's audit trail. The
// pre-write entry stays in the rollback ledger so a cancelled
// run can still restore the original on-disk state.
- const failureMessage = extractToolFailureMessage(input);
- if (failureMessage !== null) {
+ if (outcomeFailureMessage !== null) {
try {
getUI().emitFileChangeFailed?.({
path,
operation,
- errorClass: classifyFileChangeError(failureMessage),
- errorMessage: failureMessage,
+ errorClass: classifyFileChangeError(outcomeFailureMessage),
+ errorMessage: outcomeFailureMessage,
});
} catch {
// See preToolUse — same defensive swallow.You can send follow-ups to the cloud agent here.
…ureMessage call Applied via @cursor push command
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed:
extractToolFailureMessagefalse positive onis_error: falsewitherrorfield- Added
obj.is_error !== falseguard to the error-string fallback branch so that whenis_erroris explicitlyfalse, theerrorstring is ignored and the tool result is not misclassified as a failure.
- Added
Or push these changes by commenting:
@cursor push 5318fdd275
Preview (5318fdd275)
diff --git a/src/lib/inner-lifecycle.ts b/src/lib/inner-lifecycle.ts
--- a/src/lib/inner-lifecycle.ts
+++ b/src/lib/inner-lifecycle.ts
@@ -154,8 +154,8 @@
typeof input.tool_response !== 'undefined'
? input.tool_response
: typeof input.tool_result !== 'undefined'
- ? input.tool_result
- : null;
+ ? input.tool_result
+ : null;
if (result === null || result === undefined) {
return { content: null, exitCode: undefined };
}
@@ -224,13 +224,18 @@
typeof input.tool_response !== 'undefined'
? input.tool_response
: typeof input.tool_result !== 'undefined'
- ? input.tool_result
- : null;
+ ? input.tool_result
+ : null;
if (result === null || result === undefined) return null;
if (typeof result === 'object') {
const obj = result as Record<string, unknown>;
// Newer SDK shape: `{ is_error: true, error: 'msg' }` or `{ error: 'msg' }`.
- if (obj.is_error === true || (obj.error && typeof obj.error === 'string')) {
+ // When `is_error` is explicitly `false`, the boolean flag is authoritative
+ // and the `error` string (possibly an informational warning) is ignored.
+ if (
+ obj.is_error === true ||
+ (obj.is_error !== false && obj.error && typeof obj.error === 'string')
+ ) {
const msg = typeof obj.error === 'string' ? obj.error : 'tool error';
return msg;
}
@@ -334,14 +339,14 @@
typeof input.tool_name === 'string'
? input.tool_name
: typeof input.toolName === 'string'
- ? input.toolName
- : 'unknown';
+ ? input.toolName
+ : 'unknown';
const toolInput =
typeof input.tool_input !== 'undefined'
? input.tool_input
: typeof input.toolInput !== 'undefined'
- ? input.toolInput
- : null;
+ ? input.toolInput
+ : null;
const summary = summarizeToolInput(toolName, toolInput);
// Capture the SDK's `tool_use_id` so the PostToolUse `tool_response`
// envelope can correlate with this `tool_call`. Stored in a closure-
@@ -411,8 +416,8 @@
typeof obj.file_path === 'string'
? obj.file_path
: typeof obj.path === 'string'
- ? obj.path
- : null;
+ ? obj.path
+ : null;
if (path) {
try {
getUI().recordFileChangePlanned({ path, operation });
@@ -454,8 +459,8 @@
typeof input.tool_name === 'string'
? input.tool_name
: typeof input.toolName === 'string'
- ? input.toolName
- : 'unknown';
+ ? input.toolName
+ : 'unknown';
// Record the run-level tool-call outcome for ALL tools (not just
// write tools) so `tool_call_summary` carries an accurate
@@ -524,8 +529,8 @@
typeof input.tool_input !== 'undefined'
? input.tool_input
: typeof input.toolInput !== 'undefined'
- ? input.toolInput
- : null;
+ ? input.toolInput
+ : null;
const responseSummary = summarizeToolInput(
toolName,
toolInputForSummary,
@@ -556,8 +561,8 @@
typeof input.tool_input !== 'undefined'
? input.tool_input
: typeof input.toolInput !== 'undefined'
- ? input.toolInput
- : null;
+ ? input.toolInput
+ : null;
const obj =
toolInput && typeof toolInput === 'object'
? (toolInput as Record<string, unknown>)
@@ -566,8 +571,8 @@
typeof obj.file_path === 'string'
? obj.file_path
: typeof obj.path === 'string'
- ? obj.path
- : null;
+ ? obj.path
+ : null;
if (path) {
// v2 protocol: gate on tool_result outcome. If the write tool
// surfaced an error, emit `file_change_failed` and SKIPYou can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 3b9469b. Configure here.
…s false Applied via @cursor push command


Summary
Stacks on #723. Audit signal: 3-of-5 agent-mode subagents flagged this as the biggest parent-agent UX cliff in the wire today.
Today the wizard emits
tool_callat PreToolUse but nothing at PostToolUse for non-write tools. Orchestrators (Claude Code, Cursor, Codex) see "Bash: pnpm install" and then radio silence until the nexttool_call. Claude Code can render the action but can't render the outcome, duration, or output preview. The accumulatorrecordToolOutcome()already runs atagent-ui.ts:1001but emits nothing — it's only consumed by the eventualtool_call_summaryrollup.This PR closes the gap with two changes:
1.
tool_responseenvelope (data_version: 1)Emitted at PostToolUse for every tool the inner agent calls. Pairs with the preceding
tool_callvia the SDK-providedtool_use_idcorrelation field.outcome:'success' | 'error' | 'denied'durationMs: prefers the SDK'sduration_ms; falls back to atool_use_id-keyed PreToolUse wall-clock delta (bounded map, max 256 entries)exitCode: Bash onlycontentHead: first 1024 bytes of stdout/stderr/output, thenredactStringerrorMessage: 256 bytes, sanitizedsummary: 120 chars (mirrorstool_call.summary)isError: derived boolean for orchestrators that don't enumerate the outcome enumEVENT_DATA_VERSIONS.tool_responseneighbours as exported constants — the regression suite reads them directly so a future bump only needs one edit2.
tool_call.idcorrelation field (tool_callbumped todata_version: 2)Mirrors the SDK's
tool_use_id. v1 readers continue to work (field is optional in the type signature); strict-correlation consumers branch ondata_version >= 2.Sample envelopes
Success path (Bash):
{"v":1,"@timestamp":"2026-05-11T07:50:00.123Z","type":"progress","message":"tool_response: Bash -> success (842ms)","data":{"event":"tool_response","tool":"Bash","id":"toolu_01abc","outcome":"success","durationMs":842,"exitCode":0,"contentHead":"+ pnpm install\n\nLockfile is up to date, resolution step is skipped\nProgress: resolved 1234, reused 1234, downloaded 0, added 0, done","isError":false,"summary":"pnpm install"},"data_version":1}Error path (Edit):
{"v":1,"@timestamp":"2026-05-11T07:50:01.456Z","type":"progress","message":"tool_response: Edit -> error (15ms)","data":{"event":"tool_response","tool":"Edit","id":"toolu_01def","outcome":"error","durationMs":15,"contentHead":"File has not been read yet. Read it first before writing to it.","isError":true,"errorMessage":"File has not been read yet. Read it first before writing to it.","summary":"edit src/foo.ts"},"data_version":1}Safety
redactStringapplied tocontentHeadanderrorMessageso env-value secrets (e.g.WIZARD_OAUTH_TOKEN,Bearer ...) can't leaktruncateToBytesis UTF-8-safe — won't split multi-byte codepointstool_use_id-> start-time map is capped at 256 entries (FIFO eviction) so a pathological SDK input can't leak foreverTest plan
pnpm exec tsc --noEmit— cleanpnpm exec eslint src/ --cache --max-warnings=0— only the pre-existingEventPlanFullScreen.test.tsx:9warning (known on main)pnpm exec prettier --check '{lib,src,test}/**/*.ts'— cleanpnpm exec vitest run --pool=forks --maxWorkers=1— 3974 / 3974 tests pass across 262 test filessrc/ui/__tests__/agent-ui-tool-response.test.tscovering: envelope shape,data_versionstamping,tool_call.id↔tool_response.idcorrelation invariant, failure path with truncatederrorMessage, success path with truncated + redactedcontentHead,deniedoutcome, duration monotonicity, schema rejection, no-op onLoggingUI, and pure helpers (truncateToBytesUTF-8 boundary,extractToolUseIdsnake_case + camelCase,extractToolContentHeadfor string /{stdout,stderr,exitCode}/content[]shapes)agent-ui.test.ts > stamps data_version on tool_callto read from the registry so future bumps don't churn this assertion🤖 Generated with Claude Code
Note
Medium Risk
Medium risk because it expands the NDJSON wire protocol (new events and
data_versionbumps) and changes inner-agent lifecycle emission logic, which could affect orchestrator parsers and audit trails if consumers assume older shapes.Overview
Improves agent-mode NDJSON observability by adding a
tool_responseenvelope emitted atPostToolUsefor every tool call, correlated totool_callvia a new optionaltool_call.id(SDKtool_use_id), and by gating write-tool audit events so failures emitfile_change_failedand no longer incorrectly emitfile_change_applied.Expands the protocol with additional orchestrator-facing signals:
tool_call_summaryrollups,attempt_startedretry boundaries,stall_statuscoaching tiers,cold_start_breakdownphase timings,progress_estimatefor multi-item operations,mcp_statuslifecycle transitions,current_filerollups, andrun_resumedon checkpoint restore.Upgrades
needs_inputtodata_version: 3by replacing the old EnglishresponseSchemamap with a JSON Schema 2020-12 fragment (including a sharedappIdResponseSchemafactory), addsdecisionIdcorrelation support (including manualprojects listagent JSON output), and introduces awaiting_for_usertype-only alias forneeds_input. Extensive new regression tests pin the new wire shapes and versioning.Reviewed by Cursor Bugbot for commit 7b6ca71. Bugbot is set up for automated code reviews on this repo. Configure here.