feat(agent-mode): tool_response envelope + tool_call.id correlation by kelsonpw · Pull Request #730 · amplitude/wizard

kelsonpw · 2026-05-11T14:49:50Z

Summary

Stacks on #723. Audit signal: 3-of-5 agent-mode subagents flagged this as the biggest parent-agent UX cliff in the wire today.

Today the wizard emits tool_call at PreToolUse but nothing at PostToolUse for non-write tools. Orchestrators (Claude Code, Cursor, Codex) see "Bash: pnpm install" and then radio silence until the next tool_call. Claude Code can render the action but can't render the outcome, duration, or output preview. The accumulator recordToolOutcome() already runs at agent-ui.ts:1001 but emits nothing — it's only consumed by the eventual tool_call_summary rollup.

This PR closes the gap with two changes:

1. `tool_response` envelope (`data_version: 1`)

Emitted at PostToolUse for every tool the inner agent calls. Pairs with the preceding tool_call via the SDK-provided tool_use_id correlation field.

outcome: 'success' | 'error' | 'denied'
durationMs: prefers the SDK's duration_ms; falls back to a tool_use_id-keyed PreToolUse wall-clock delta (bounded map, max 256 entries)
exitCode: Bash only
contentHead: first 1024 bytes of stdout/stderr/output, then redactString
errorMessage: 256 bytes, sanitized
summary: 120 chars (mirrors tool_call.summary)
isError: derived boolean for orchestrators that don't enumerate the outcome enum
All free-form text caps live in EVENT_DATA_VERSIONS.tool_response neighbours as exported constants — the regression suite reads them directly so a future bump only needs one edit

2. `tool_call.id` correlation field (`tool_call` bumped to `data_version: 2`)

Mirrors the SDK's tool_use_id. v1 readers continue to work (field is optional in the type signature); strict-correlation consumers branch on data_version >= 2.

Sample envelopes

Success path (Bash):

{"v":1,"@timestamp":"2026-05-11T07:50:00.123Z","type":"progress","message":"tool_response: Bash -> success (842ms)","data":{"event":"tool_response","tool":"Bash","id":"toolu_01abc","outcome":"success","durationMs":842,"exitCode":0,"contentHead":"+ pnpm install\n\nLockfile is up to date, resolution step is skipped\nProgress: resolved 1234, reused 1234, downloaded 0, added 0, done","isError":false,"summary":"pnpm install"},"data_version":1}

Error path (Edit):

{"v":1,"@timestamp":"2026-05-11T07:50:01.456Z","type":"progress","message":"tool_response: Edit -> error (15ms)","data":{"event":"tool_response","tool":"Edit","id":"toolu_01def","outcome":"error","durationMs":15,"contentHead":"File has not been read yet. Read it first before writing to it.","isError":true,"errorMessage":"File has not been read yet. Read it first before writing to it.","summary":"edit src/foo.ts"},"data_version":1}

Safety

PostToolUse emission is wrapped in try/catch — observation must NEVER block tool execution
Zod schema rejects malformed payloads at the boundary (logged to disk, not stdout)
redactString applied to contentHead and errorMessage so env-value secrets (e.g. WIZARD_OAUTH_TOKEN, Bearer ...) can't leak
truncateToBytes is UTF-8-safe — won't split multi-byte codepoints
Closure-local tool_use_id -> start-time map is capped at 256 entries (FIFO eviction) so a pathological SDK input can't leak forever

Test plan

pnpm exec tsc --noEmit — clean
pnpm exec eslint src/ --cache --max-warnings=0 — only the pre-existing EventPlanFullScreen.test.tsx:9 warning (known on main)
pnpm exec prettier --check '{lib,src,test}/**/*.ts' — clean
pnpm exec vitest run --pool=forks --maxWorkers=1 — 3974 / 3974 tests pass across 262 test files
26 new tests in src/ui/__tests__/agent-ui-tool-response.test.ts covering: envelope shape, data_version stamping, tool_call.id ↔ tool_response.id correlation invariant, failure path with truncated errorMessage, success path with truncated + redacted contentHead, denied outcome, duration monotonicity, schema rejection, no-op on LoggingUI, and pure helpers (truncateToBytes UTF-8 boundary, extractToolUseId snake_case + camelCase, extractToolContentHead for string / {stdout,stderr,exitCode} / content[] shapes)
Updated existing agent-ui.test.ts > stamps data_version on tool_call to read from the registry so future bumps don't churn this assertion

🤖 Generated with Claude Code

Note

Medium Risk
Medium risk because it expands the NDJSON wire protocol (new events and data_version bumps) and changes inner-agent lifecycle emission logic, which could affect orchestrator parsers and audit trails if consumers assume older shapes.

Overview
Improves agent-mode NDJSON observability by adding a tool_response envelope emitted at PostToolUse for every tool call, correlated to tool_call via a new optional tool_call.id (SDK tool_use_id), and by gating write-tool audit events so failures emit file_change_failed and no longer incorrectly emit file_change_applied.

Expands the protocol with additional orchestrator-facing signals: tool_call_summary rollups, attempt_started retry boundaries, stall_status coaching tiers, cold_start_breakdown phase timings, progress_estimate for multi-item operations, mcp_status lifecycle transitions, current_file rollups, and run_resumed on checkpoint restore.

Upgrades needs_input to data_version: 3 by replacing the old English responseSchema map with a JSON Schema 2020-12 fragment (including a shared appIdResponseSchema factory), adds decisionId correlation support (including manual projects list agent JSON output), and introduces a waiting_for_user type-only alias for needs_input. Extensive new regression tests pin the new wire shapes and versioning.

^{Reviewed by Cursor Bugbot for commit 7b6ca71. Bugbot is set up for automated code reviews on this repo. Configure here.}

Today the SIGINT handler only registers in the TUI branch — agent + CI runs get hard-killed by Node's default behaviour, so an orchestrator parsing the NDJSON stream sees an abrupt EOF instead of a clean `run_completed: cancelled` envelope. No checkpoint is saved on the way out, either. Extract the handler into `installAbortSignalHandler` (shared helper in `src/lib/graceful-exit.ts`) and wire it into the agent + CI branches in `src/commands/default.ts` immediately after the session is built. The TUI keeps its inline registration because it needs direct access to the Ink store's `setCommandFeedback`. `WizardAbortOptions` gains a `reason` field so SIGINT (and downstream PRs: LOCK_HELD, AUTH_REQUIRED midRun) can pass a stable machine-readable code to the terminal `run_completed` envelope, instead of relying on the sanitized human-readable `message`.

The 2s graceful-exit timer in `performGracefulExit` called `process.exit(130)` with no terminal NDJSON event — a parent agent parsing the stream saw an abrupt EOF that's indistinguishable from a crash. Add an `emitRunCompleted({ outcome: 'cancelled', exitCode: 130, reason: 'sigint' })` call before the timer fires so AgentUI lands the terminal envelope on stdout (InkUI / LoggingUI no-op as before). Regression test in `graceful-exit.test.ts` spies on the UI and asserts the call lands before the 2s timer.

When a long agent run hits an LLM-gateway 401 (or an Amplitude OAuth expiry) the wizard previously aborted with a generic `error` envelope plus a TUI-only OutroScreen prompt. Agent-mode orchestrators saw a plain error and had to guess at whether files were preserved or what command to re-run. Wire the AUTH_ERROR path in `agent-runner.ts` to call `emitAuthRequired` BEFORE `wizardAbort` with the structured payload: mid-run discriminator, `preserveFiles: true`, `partialProgress` flags from `classifyAgentOutcome`, and the canonical login + resume argv. Extends the auth_required wire shape with `midRun`, `preserveFiles`, `partialProgress`, `authSubkind`, plus two new reason discriminators (`amplitude_token_expired`, `gateway_token_expired`). Bumps `EVENT_DATA_VERSIONS.auth_required` from 1 -> 2; existing v1 callers keep working since every new field is optional. `emitAuthRequired` graduates to the WizardUI interface (optional) so business logic can call it through `getUI()` without an AgentUI cast. InkUI / LoggingUI no-op as designed.

After AUTH_RETRY_LIMIT consecutive 401-flavoured api_retry messages the wizard short-circuits the SDK's ~3-minute retry storm and aborts. Today that boundary fires `controller.abort('auth_failed')` silently — orchestrators only see the downstream AUTH_ERROR routing and can't distinguish "single 401, transient" from "we tried twice, stuck". Add `lifecycle: auth_retry_exhausted` event with `attempts` + `subkind` discriminator (`amplitude` / `llm-gateway`). Wired into the SDK retry-loop boundary in `agent-interface.ts` immediately before `controller.abort('auth_failed')`. AgentUI implements; InkUI / LoggingUI no-op (the TUI's AUTH_ERROR outro covers the human path). New WizardUI optional method `emitAuthRetryExhausted`. Registered in `EVENT_DATA_VERSIONS` at v1. Extended the existing AUTH_RETRY_LIMIT test in `agent-interface.test.ts` to assert the emit fires exactly once with the expected payload.

…emit Two fixes that together harden the NDJSON wire contract: 1. `data.event` discriminator on anonymous error envelopes. Previously `AgentUI.setRunError` emitted `data: { name, recoverable, suggestedAction }` with no `event` field — every other event family uses `data.event` as the discriminator. Orchestrators filtering on `data.event` saw nothing for run-aborting errors. Add `event: 'run_error'` and register `run_error: 1` in `EVENT_DATA_VERSIONS` so the envelope ships with `data_version`. 2. Zod-validate every envelope at the emit boundary. A new `validateEnvelopeOrLog` runs each event through a Zod schema (v=1, ISO timestamp, valid `type`, string `message`) plus a coherence check that asserts `data_version === EVENT_DATA_VERSIONS[event]` when the discriminator is set. Failures route to `logToFile` — never throw, never stderr (Ink owns stdout in TUI mode). Regression tests in `agent-ui.test.ts`: a property-style check that every documented emit method produces a wire-valid envelope, plus a focused test pinning the `event=run_error` shape.

Every AgentErrorType branch in agent-runner.ts emitted a generic `error` envelope with no machine-readable code, then routed to wizardAbort with a human-readable message. Orchestrators couldn't distinguish "retry your launch" (GATEWAY_DOWN 5xx blip) from "upgrade wizard" (GATEWAY_INVALID_REQUEST schema rejection) from "back off" (RATE_LIMIT) without parsing message strings. Adds a new optional `emitRunError(data)` method on WizardUI with a typed `code` discriminator covering GATEWAY_DOWN / GATEWAY_INVALID_REQUEST / RATE_LIMIT / API_ERROR / MCP_MISSING / RESOURCE_MISSING plus an optional `mcpServer` field (wizard-tools vs amplitude-wizard) and a `recoverable` hint. AgentUI implements; InkUI / LoggingUI no-op. Wired into every AgentErrorType branch in agent-runner.ts immediately before the corresponding wizardAbort. The envelope reuses the `run_error` discriminator + data_version=1 registered in the previous commit, so consumers branching on `data.event` + `data.code` get a stable contract. Regression tests in `agent-ui.test.ts` parameterize over every code and assert the emitted envelope shape.

`wizard apply` exited with `ExitCode.INVALID_ARGS=2` whenever the per-project apply-lock (`acquireApplyLock`) detected an in-flight holder — an orchestrator routinely interprets exit 2 as "bad flags" and doesn't retry. Lock contention is automatically retryable. Add `ExitCode.LOCK_HELD=14` and route the apply-lock collision path in `src/commands/apply.ts` to it. Emit a terminal `run_completed: { outcome: 'error', exitCode: 14, reason: 'lock_held' }` envelope before `process.exit` so an orchestrator parsing the NDJSON stream sees the canonical signal instead of an abrupt EOF. Regression test in `src/commands/__tests__/apply-lock-held.test.ts` pins the numeric code at 14 (orchestrator contract) and asserts the holder shape that the handler ships in the `apply_refused` envelope.

Today an orchestrator parsing the NDJSON stream has to infer coarse progress from the firehose of tool_call / status / progress events. Adds a dedicated `lifecycle: run_phase` event that transits five fixed states in order: cold_start -> agent_running -> finalizing -> completed | error. Wiring (each is best-effort, never blocks the run): - cold_start: first cold-start activity in agent-runner.ts - agent_running: first PreToolUse in inner-lifecycle.ts (the emitter dedups so subsequent tool calls don't spam the stream) - finalizing: just before seedPostAgentSteps in agent-runner.ts - completed: in wizardSuccessExit before run_completed - error: in wizardAbort before run_completed New optional `emitRunPhase` on WizardUI. AgentUI implements with deduplication; InkUI / LoggingUI no-op (their UX already implies phase via the journey stepper / log line cadence). Registered `run_phase: 1` in `EVENT_DATA_VERSIONS`. Regression tests in agent-ui.test.ts: per-phase envelope shape, dedup behaviour for repeated calls, and the wire-level ordering invariant.

The SIGINT handler routes through wizardAbort via a dynamic import. In production wizardAbort calls process.exit and never resolves, but vitest's strict-exit guard throws a synthetic exception when process.exit is called in a test — surfacing as an unhandled rejection on the test stream. Wrap the runner in try/catch so the rejection stops at the boundary; production behaviour is unchanged because the throw path doesn't fire there.

PR B subagent's final commit skipped a prettier pass on these five files. Pure whitespace / wrap reflows — no semantic change.

The TUI's "Discovered facts" chips (vertical, app type, framework, package manager, region) only rendered in Ink. Mirror them onto the agent-mode NDJSON stream as `progress: discovery_fact { id, label, value, discoveredAt }` so parent agents (Claude Code, Cursor, Codex) can render the same chips without parsing every status line. Hooks into the existing `publishDiscoveryFact` in agent-runner.ts — the abstract `WizardUI.pushDiscoveryFact` now reaches AgentUI as a typed emit instead of a no-op. Stable `id` lets orchestrators upsert chips on retry without flicker. Adds `discovery_fact: 1` to EVENT_DATA_VERSIONS and a regression test pinning envelope shape + version stamping.

Every Edit/Write tool call already emits its own `tool_call` event plus a `file_change_planned`. That's great for an audit log but noisy for parent agents that just want a "now editing X" header. Add `progress: current_file { path, relativePath, operation }` — the coarse rollup orchestrators subscribe to instead of parsing every fine-grained tool event. Debounced at the wire boundary (250ms per identical (path, op) tuple) so a tight edit chain collapses into one event. `relativePath` is computed against `installDir` so the wire ships a short label; absolute path preserved for audit. Wired from the existing inner-lifecycle PreToolUse hook; AgentUI debounces, InkUI / LoggingUI no-op (optional WizardUI method). Adds `current_file: 1` to EVENT_DATA_VERSIONS and regression tests covering envelope shape, version stamping, and the (path, op) debounce / non-debounce paths.

The wizard already detects stalls (no SDK traffic for N seconds) and shows TUI hints. Mirror that escalation onto NDJSON so parent agents can render the same coaching cadence. `progress: stall_status { tier, durationMs, lastActivity, hint? }` fires at 10s / 30s / 60s of silence, mapped to `noticed` / `concerning` / `critical`. Per-tier dedup at the AgentUI boundary (`resetStallStatus()` clears the gate when activity resumes), so each tier hits the wire at most once per stall window. Wired from the existing 10s heartbeat tick in agent-interface — the same loop that emits `heartbeat` now also derives the silence-window tier via `deriveStallTier()` (pure helper, regression-tested). An `onActivity()` shim resets `lastActivityAt` from `onStatus` / non-`stream_event` SDK messages, so a long tool-call interlude isn't spuriously stamped as "concerning". InkUI / LoggingUI no-op via the optional WizardUI methods. Adds `stall_status: 1` to EVENT_DATA_VERSIONS and tests covering envelope shape, tier dedup, reset semantics, and the pure helper's threshold boundaries.

When the wizard restarts from a checkpoint (post-crash, post-SIGINT, post-token-expiry), the existing `checkpoint_loaded` event carries only the file age — useful for policy gates but not for UX. Add `lifecycle: run_resumed { from_checkpoint_at, last_phase, restored_state_summary }` as the orchestrator-facing companion so parent agents can render "continuing from where we left off" instead of "fresh run". Wired from `loadCheckpoint()` immediately after `emitCheckpointLoaded`. The summary is composed from the typed checkpoint fields (region, org, project, env, framework, intro flag) — pre-redacted, no free-form user input. `lastPhase` defaults to `'unknown'` because the checkpoint deliberately drops `runPhase` from its restored set (so it re-evaluates on resume). InkUI / LoggingUI no-op via the optional WizardUI method. Adds `run_resumed: 1` to EVENT_DATA_VERSIONS and tests covering envelope shape, the 'unknown' last_phase fallback, and version stamping.

Previously the inner-lifecycle PostToolUse hook unconditionally emitted `file_change_applied` for any Edit/Write/MultiEdit even when the tool itself reported `is_error: true`. That falsely advertised the write as successful on the orchestrator's audit trail. Add `error: file_change_failed { path, operation, errorClass, errorMessage }`. Gated on the `tool_response.is_error` / `tool_response.error` shape via the new `extractToolFailureMessage()` helper — when a failure is detected, AgentUI emits the typed error event and the success-side `recordFileChangeApplied` + ledger post-write are skipped (the pre-write ledger entry stays so a cancelled run can still roll back). `errorClass` discriminates the common failure modes: - `permission` — EACCES / "permission denied" / write_refused - `not_found` — ENOENT / "no such file" - `syntax` — Edit/MultiEdit string-match failures - `generic` — fallback InkUI / LoggingUI no-op via the optional WizardUI method. Adds `file_change_failed: 1` to EVENT_DATA_VERSIONS, regression tests for the wire shape, the classifier's match precedence (syntax wins over not_found for "String to replace not found"), and the inner-lifecycle gate ensuring failed writes don't double-emit as applied.

…se content Final CI-gate cleanup for the v2 protocol stack: - prettier-formatted the v2 event interfaces, AgentUI emitters, and the regression test file - narrowed `obj.content[0]` to `unknown` in `extractToolFailureMessage` to silence `@typescript-eslint/no-unsafe-assignment`; runtime checks unchanged (we already gate on `typeof first === 'object'` before any property read)

…t/decision_auto correlation Introduce a process-local counter that mints stable `dec_<NNN>` ids and extend the `needs_input` + `decision_auto` data shapes with an optional `decisionId` field. Bumps both events to `data_version: 2` so consumers that want strict request/response correlation can branch on `data_version >= 2`; v1 readers continue to work because the field is optional at the consumer end. The generator (`nextDecisionId`) uses a monotonic counter rather than a UUID so a transcript replayed offline reads the SAME ids it read live — critical for deterministic log analysis. Wraps at 999 isn't a concern for single wizard runs (the wizard never asks hundreds of questions). Test-only `__resetDecisionIdCounterForTests` keeps regression tests deterministic across files without exposing counter mutation to production code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… and decision_auto pair Wires the new correlation id through the four `needs_input` callsites in `AgentUI`: - `promptConfirm` → code='confirm', recommended='yes' - `promptChoice` → code='choice', recommended=options[0] - `promptEventPlan` → code='event_plan', recommended='approved' - env-picker auto-fallback → code='environment_selection' `emitNeedsInput` now returns the freshly-minted `decisionId`; each caller captures it locally and threads it into the matching `emitDecisionAuto` call. The env-picker previously never emitted a `decision_auto` for its auto-select fallback path — adding one here closes the request/response correlation loop for that prompt too. The env-var-resolved `promptEventPlan` short-circuit (where `AMPLITUDE_WIZARD_EVENT_PLAN_DECISION` answers the prompt without ever emitting `needs_input`) is intentionally unchanged — there's no request envelope to correlate against, which is the documented contract: the ABSENCE of `needs_input + decision_auto` for the same code signals "resolved by user-driven flag." The stdin-resolved env-picker path is also unchanged: an orchestrator that wrote the answer to stdin already knows which request it answered, and the wire today has no `decision_human` envelope to annotate (would be a new envelope, which this PR is explicitly scoped to avoid). Regression coverage in `agent-ui-decision-id.test.ts`: - Generator emits `dec_NNN` zero-padded, monotonic across calls - `promptConfirm` and `promptChoice` request/response pairs share id - Two back-to-back `confirm` prompts mint DISTINCT ids — the bug this PR exists to fix (orchestrators previously relied on timing + `code` heuristics, which fails for same-code prompts in series) - `emitNeedsInput` returns the minted id and honors an explicit override (forward-compat for paged emitters that want stable ids across pages) - Both events stamp `data_version: 2` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…or needs_input PR B2 deferred reconciling the two names ("needs_input" on the wire, "waiting for user" in human-facing copy and early design notes) because `NeedsInputData` already provided the typed schema. The deferral was correct — there is no separate wire event to ship — but orchestrators reading protocol docs still see both spellings and have no way to import the one that matches their mental model. This commit adds three type-identical re-exports in `agent-events.ts`: - `WaitingForUserData<V>` → `NeedsInputData<V>` - `WaitingForUserWireData<V>` → `NeedsInputWireData<V>` - `WaitingForUserEvent<V>` → `NeedsInputEvent<V>` Pure type aliases — no new emit path, no `EVENT_DATA_VERSIONS` entry (documented as a load-bearing absence — adding one would require a separate emitter + schema + tests). The wire-format `event` discriminator remains `'needs_input'` so existing subscribers keep working. Regression test pins both directions of the assignability proof, the wire-event discriminator invariant, and the absence of a `EVENT_DATA_VERSIONS.waiting_for_user` registry entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…patterns PR B2 shipped `file_change_failed` with four errorClass values (`permission` / `not_found` / `syntax` / `generic`). Orchestrators parsing the stream couldn't distinguish a transient timeout from a permanent failure — both fell into `generic` and burned retry budget. This commit: 1. Adds `'timeout'` to the `FileChangeErrorClass` union, documented as the transient-by-definition variant. Retry-aware orchestrators can re-issue the write without changing input. 2. Extends `classifyFileChangeError` patterns: - `permission`: + EPERM, + read-only file system / EROFS - `not_found`: + "does not exist" - `syntax`: + "did not match", + "unexpected token", + "invalid JSON" - `timeout`: NEW — ETIMEDOUT, "timed out", "timeout", "deadline exceeded" 3. Pins the classifier ordering so `timeout` wins over `not_found` when both signals appear in the message (the SDK occasionally wraps an ETIMEDOUT with secondary "not found in cache" text). Wire-shape change: the `FileChangeErrorClass` union grew. Orchestrators already keyed off the literal string — adding a new variant is a breaking enum change in principle, but the existing `data_version: 1` on `file_change_failed` doesn't bump because we are ADDING, not RENAMING. New consumers branch on the wider set; old consumers fall through their default arm. The schema doc was updated to flag the "adding a variant is still a data_version bump in principle" rule. `emitFileChangeFailed` now consumes the exported `FileChangeErrorClass` type rather than an inline literal — a future widening lives in one place. `wizard-ui.ts` mirrors the wider union on the optional method. Regression tests cover all new patterns + the timeout-over-not_found ordering guard. Existing tests continue to pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR B (#714) shipped `transient_retry`, the "wizard decided to retry, sleeping Ns" decision envelope. Orchestrators parsing the stream could see retries decided, but had no way to tell when the retry attempt actually BEGAN — the backoff sleep (up to 30s) and proactive bearer refresh both fell into a silent gap. A consumer rendering "retrying… (attempt 2/4)" had to guess at the boundary from timing. This commit adds `attempt_started`, emitted at the top of each outer-loop iteration in `agent-interface.ts`, AFTER the backoff sleep has elapsed, AFTER the proactive bearer refresh has run, but BEFORE the inner SDK query fires. Paired with `transient_retry`, orchestrators see: transient_retry { attempt: 2, nextRetryInMs: 2500, reason: 'stall' } ...2.5s of silence... attempt_started { attemptNumber: 2, backoffMs: 2500, reason: 'stall_retry' } ...inner work... Schema additions in `agent-events.ts`: - `AttemptStartedReason` — `'cold_start' | 'stall_retry' | 'auth_refresh' | 'network_retry'` - `AttemptStartedData` — `{ attemptNumber, totalBudget, reason, backoffMs? }` - `EVENT_DATA_VERSIONS.attempt_started: 1` - Added to `InnerAgentLifecycleData` union Wire-up in `agent-interface.ts`: - Stash `attemptStartedReason` + `attemptStartedBackoffMs` at the retry decision points (already-existing `lastRetryReason` mapping). - Track `bearerRotatedThisAttempt` in the `refreshGatewayBearer` callback so an attempt starting with a freshly-rotated token reports `'auth_refresh'` instead of the underlying stall/network reason. - Emit AFTER the bearer refresh but BEFORE the SDK query so the envelope fires when work actually begins. - Cold-start attempt fires with `reason: 'cold_start'` and OMITS `backoffMs` on the wire (orchestrators read absence as "no preceding sleep"). `emitAttemptStarted` is wired as an optional method on `WizardUI` — InkUI / LoggingUI no-op (their existing retry banner / log line already covers their UX), so orchestrators are the only consumers. Regression coverage: 7 new tests on `agent-ui-v2-events.test.ts` covering the envelope shape, the data_version stamp, all four reason values, the cold-start no-backoffMs invariant, the explicit-0 round-trip, and the non-AgentUI no-op contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on boundaries Long-running multi-item operations (post-agent step queue, multi-editor MCP install, event-plan write) emit a stream of fine-grained per-item events (`post_agent_step`, `tool_call`). Orchestrators that want a single progress bar have to reconstruct the rollup from those events — track the seeded total, count completions, compute the percent themselves. The same logic gets repeated in every consumer. This commit adds `progress_estimate` — the canonical `(stage, current, total, percent)` rollup. Orchestrators subscribe to this single event and ignore the fine-grained stream when all they want is a progress bar. Schema additions in `agent-events.ts`: - `ProgressEstimateData` — `{ event, stage, current, total, percent }` - `EVENT_DATA_VERSIONS.progress_estimate: 1` - `buildProgressEstimate(stage, current, total)` — pure helper that clamps current to `[0, total]`, floors fractional values, and returns null for zero-item operations (no `progress_estimate` on the wire when there's no work to do). Wire-up: - `AgentUI.emitProgressEstimate` — new emitter, delegates to `buildProgressEstimate` for the clamp/floor/null logic. - `AgentUI.setPostAgentStep` — emit at terminal-state transitions (`completed` / `skipped`) only. `in_progress` / `pending` don't bump so a bouncing step can't lower the percent. Set-based dedup guarantees the count is monotone across duplicate completions. - `AgentUI.seedPostAgentSteps` — track ids + done-set so the rollup can compute total locally. A second seed call resets the tracking so the new queue doesn't inherit the old queue's progress. - `addMCPServer` (mcp-install step) — emit one rollup per editor after each successful `addServer`. Failures don't bump (orchestrators infer "stuck at X/N" from absence + error envelope). - `WizardUI.emitProgressEstimate?` — optional. InkUI / LoggingUI no-op (their existing journey stepper / panel covers the UX). Stages emitted today: - `'post_agent_steps'` — post-agent queue advance - `'mcp_install'` — multi-editor MCP install loop Additional stages (e.g. `'event_plan_write'`) land as new long-running operations get wired. Regression coverage: 13 new tests on `agent-ui-v2-events.test.ts` covering the envelope shape, the data_version stamp, terminal-state-only transitions, idempotent re-emission, queue reset on re-seed, the zero-total no-op, current clamping in both directions, and the pure helper's full behaviour matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Today the wizard emits a coarse `run_phase: cold_start` lifecycle event when cold start begins. Parent agents driving the spinner know the wizard IS cold-starting but have zero visibility into WHICH phase consumed the time. Cold start is 5-30s of perceived silence on the spinner, and the lion's share is one of a handful of identifiable phases (skill staging, package-manager probing, framework preflight, MCP bootstrap, gateway probe). This commit adds `cold_start_breakdown` — a `{phase, startedAt, finishedAt, durationMs}` envelope emitted at the END of each of the five cold-start phase boundaries. Parent agents subscribe and can: - render which phase is CURRENTLY active during the spinner (update UI on each fire), - surface "your cold start is slow because phase X took Ys" diagnostics on a hung run, - aggregate per-phase timings across runs for performance tracking without re-parsing log lines. Critical: each phase is wrapped in a `try/finally` boundary so the breakdown event always ships, even when the phase throws — orchestrators see a timing breadcrumb for the failing phase (paired with the subsequent thrown error envelope) instead of a silent gap on the wire. Schema additions in `agent-events.ts`: - `ColdStartPhase` — the five canonical phase ids - `ColdStartBreakdownData` — the wire shape - `EVENT_DATA_VERSIONS.cold_start_breakdown: 1` - `buildColdStartBreakdown(phase, startedAt, finishedAt)` — pure helper that floors `durationMs` at 0 (guards a non-monotonic clock from shipping a negative duration) and bumps `finishedAt` up to `startedAt` so the invariant `finishedAt >= startedAt` holds on the wire. Wire-up: - `AgentUI.emitColdStartBreakdown` — new emitter, delegates to `buildColdStartBreakdown` for the clamp logic. - `WizardUI.emitColdStartBreakdown?` — optional. InkUI / LoggingUI no-op (the TUI already surfaces the labeled cold-start activity line; CI logs the per-phase step messages). - `agent-runner.ts` — wraps skill_staging, package_manager_detection, framework_detection in `try/finally`. - `agent-interface.ts` — wraps gateway_probe (`checkGatewayLiveness`) and mcp_bootstrap (`createWizardToolsServer`) in `try/finally`. Regression coverage: 11 new tests covering the envelope shape, the data_version stamp, per-phase emission for all five phases, the non-monotonic-clock floor, the pure helper's full behaviour matrix, the WizardUI optional-method contract, and the critical "emission survives a thrown phase" path. Total file: 64 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ries Today the wizard ships one fine-grained `tool_call` NDJSON envelope per PreToolUse (30-200 per run). Parent agents that want a "tool usage" summary at completion have to maintain their own running aggregate. The wizard already has all the data — it should emit the rollup. This PR adds a `tool_call_summary { totalCalls, byTool, byOutcome, durationMsTotal, durationMsAvg, topToolByCount? }` event emitted at two boundaries: 1. Phase finalize — before `run_phase: finalizing` so an orchestrator can render the inner-agent tool summary before the post-agent steps section appears. 2. Terminal exit — inside `wizardSuccessExit` / `wizardAbort` (after post-agent steps), so the orchestrator always sees a final cumulative rollup covering the whole run. Dedup-safety: the emitter no-ops when the payload signature is identical to the previous emission (so a duplicate boundary call doesn't double-emit). `totalCalls === 0` is suppressed at the wire — absence of the event means no tools were called. Implementation: - `ToolCallStats` pure accumulator in `agent-events.ts` (per-tool counts, per-outcome counts, FIFO-by-tool duration pairing). - `AgentUI.emitToolCall` increments the accumulator; new `AgentUI.recordToolOutcome` is called from the PostToolUse hook in `inner-lifecycle.ts` for ALL tools (not just write tools) so the outcome breakdown is faithful. - `WizardUI.emitToolCallSummary?(): void` — optional method; AgentUI implements, InkUI/LoggingUI no-op. - `data_version: 1` registered in `EVENT_DATA_VERSIONS`. - Emit callsites in `agent-runner.ts` (before finalize) and `wizard-abort.ts` (`wizardSuccessExit` + `wizardAbort`) wrapped in try/catch so the rollup can never block the phase transition or exit path. Regression tests cover the pure accumulator (counting, outcome math, duration FIFO pairing, non-monotonic clock floor, orphaned outcomes, top-tool tie-breaking), the wire envelope (shape, data_version, dedup guard, zero-call suppression), and the inner-lifecycle wiring (success and error outcomes flowing through PostToolUse). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…or_install Parent agents (Claude Code, Codex, Cursor) parsing the wizard's NDJSON stream had no visibility into MCP server state transitions: the in-process wizard-tools server bootstrapped silently during cold start, and the editor-install flow silently detected (or didn't) supported editors and silently installed (or skipped) the editor MCP config. mcp_status fills that gap with a { server, state, transition_ts, detail? } envelope emitted at every state boundary for both lifecycles. Wiring: wizard_tools server (in-process MCP that the inner agent calls): - available — emitted in agent-interface.ts after createWizardToolsServer() resolves successfully - failed — emitted in the same try/catch before the boot error rethrows, so the lifecycle event lands on the wire before any subsequent run_error envelope editor_install flow (wizard-mcp install into the user's editor): - not_applicable — no supported clients detected on this machine - install_skipped — CI mode short-circuit OR user declined the install prompt in McpScreen - needs_user_choice — multiple editors detected; awaiting the user's pick in McpScreen - installed — at least one client took the config - failed — write-time error (permissions, disk full) OR every client install failed Schema: EVENT_DATA_VERSIONS.mcp_status = 1. Adds MCPStatusData / MCPStatusServer / MCPStatusState type aliases to agent-events.ts; optional emitMcpStatus on WizardUI (AgentUI implements with Zod defense-in-depth on the payload, InkUI / LoggingUI no-op). Tests: agent-ui-mcp-status.test.ts pins the envelope shape, all seven (server, state) transitions, the Zod guard rejecting invalid literals, no-op behaviour on LoggingUI, and stream silence on non-MCP-related runs. mcp-status-wiring.test.ts covers the addMCPServerToClientsStep call sites that don't require host-filesystem probes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Today the wizard emits `tool_call` at PreToolUse but NOTHING at PostToolUse for non-write tools — orchestrators (Claude Code, Cursor, Codex) see "Bash: pnpm install" and then radio silence until the next `tool_call`. Three independent audit subagents flagged this as the biggest parent-agent UX cliff in the wire today. This PR closes the gap with: - `tool_response` envelope (data_version: 1) emitted at PostToolUse for EVERY tool the inner agent calls, carrying `outcome` (success/error/denied), `durationMs`, optional `exitCode` (Bash), a 1024-byte-bounded `contentHead`, `isError`, a 256-byte-bounded `errorMessage`, and a 120-char-bounded `summary`. - `tool_call.id` correlation field (mirrors the SDK's `tool_use_id`). Bumps `tool_call` to data_version: 2 — v1 readers continue to work because the field is optional; strict-correlation consumers branch on `data_version >= 2`. - All free-form text fields go through `redactString` at the emit boundary so a misbehaving Bash command can't leak env-value secrets. New `truncateToBytes` helper handles UTF-8-safe byte- bounded truncation (won't split multi-byte codepoints). - PostToolUse emission is wrapped in try/catch — observation must NEVER block tool execution. Stacks on PR #723 (feat/agent-mode-mcp-status). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Orphaned JSDoc separated from its function by insertions
- Moved the JSDoc block for extractToolFailureMessage from above extractToolUseId to directly above extractToolFailureMessage where it belongs.

Or push these changes by commenting:

@cursor push 93b6b4782a

Preview (93b6b4782a)

diff --git a/src/lib/inner-lifecycle.ts b/src/lib/inner-lifecycle.ts
--- a/src/lib/inner-lifecycle.ts
+++ b/src/lib/inner-lifecycle.ts
@@ -102,14 +102,6 @@
 }
 
 /**
- * Inspect a PostToolUse hook input for a tool-result error. The SDK
- * surfaces failures via either `tool_response.is_error` /
- * `tool_response.error` (newer hook shape) or a stringified result
- * containing common error markers. Returns the sanitized message when
- * a failure is detected, null when the tool succeeded. Pure — no
- * I/O, no throws.
- */
-/**
  * Extract the SDK's `tool_use_id` from a hook input — the stable
  * correlation key that pairs a PreToolUse `tool_call` with its
  * matching PostToolUse `tool_response`. The SDK consistently surfaces
@@ -216,6 +208,14 @@
   return { content: null, exitCode: undefined };
 }
 
+/**
+ * Inspect a PostToolUse hook input for a tool-result error. The SDK
+ * surfaces failures via either `tool_response.is_error` /
+ * `tool_response.error` (newer hook shape) or a stringified result
+ * containing common error markers. Returns the sanitized message when
+ * a failure is detected, null when the tool succeeded. Pure — no
+ * I/O, no throws.
+ */
 export function extractToolFailureMessage(
   input: Record<string, unknown>,
 ): string | null {

_{You can send follow-ups to the cloud agent here.}

…response # Conflicts: # src/lib/__tests__/graceful-exit.test.ts # src/lib/agent-events.ts # src/lib/agent-runner.ts # src/lib/graceful-exit.ts # src/lib/inner-lifecycle.ts # src/ui/agent-ui.ts # src/utils/wizard-abort.ts

…hema 2020-12 fragments (#732) * feat(agent-mode): replace responseSchema English strings with JSON Schema 2020-12 fragments Audit #3 (Codex/non-Anthropic orchestrator POV) flagged the v2 needs_input.responseSchema shape as a vendor-portability blocker: responseSchema: { appId: 'string (required, from choices[].value)' } Non-Claude orchestrators (Codex, GPT-5, Mistral) couldn't programmatically validate stdin payloads against the English-in-JSON string — they had to run an LLM over the description to decide whether `{ appId: '769610' }` was a legal response. That violates the "perfectly suited for ... codex, anything that runs this against code bases via a parent agent" charter. Replaces the shape with a proper JSON Schema 2020-12 fragment so any orchestrator can load `ajv` / `jsonschema` and validate directly: responseSchema: { \$schema: 'https://json-schema.org/draft/2020-12/schema', type: 'object', properties: { appId: { type: 'string', pattern: '^\\\\d+$', description: 'Numeric Amplitude app ID — ...', }, }, required: ['appId'], } Changes: - Add ResponseSchemaFragment + JsonSchemaProperty exports to agent-events. - Update NeedsInputData / NeedsInputWireData to use the structured type. - Bump EVENT_DATA_VERSIONS.needs_input from 2 to 3 (BREAKING for any consumer that parsed the prior English strings — most orchestrators round-trip via resumeFlags and are unaffected). - Update emitter callsites in src/ui/agent-ui.ts (legacy `prompt` + structured `needs_input` for environment_selection) and src/commands/projects.ts (project_selection). - Use `pattern: '^\\d+\$'` rather than `enum` because both callsites set `allowManualEntry: true` — a closed enum would contradict the manualEntry contract that lets orchestrators submit an above-cap app-id. - New regression suite (agent-ui-response-schema.test.ts, 9 tests) pinning the wire shape, data_version: 3 stamp, validateEnvelopeOrLog round-trip, and an inline snapshot for the env-selection fragment. - Update agent-ui-decision-id.test.ts to match the new v3 stamp. Stacks on #730 (feat/agent-mode-tool-response). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bugbot): stamp data_version on manual projects list needs_input envelope PR B12 bumped `EVENT_DATA_VERSIONS.needs_input` to 3 for the JSON Schema 2020-12 responseSchema shape. The `wizard projects list --agent` envelope at `commands/projects.ts:59` is manually constructed via `JSON.stringify` and writes directly to `process.stdout`, bypassing the `emit()` helper that auto-stamps `data_version` from the registry. Result: the new v3 envelope ships with no `data_version` field at all — orchestrators that branch on `data_version >= 3` for the new shape treat this envelope as pre-v3 and misinterpret the JSON Schema object as the old English-string format. Import `EVENT_DATA_VERSIONS` and read `.needs_input` at emit time so future bumps (v4, v5…) stay in sync without touching this file. * fix(bugbot): include decisionId on manual projects list needs_input envelope The previous fix stamped `data_version: EVENT_DATA_VERSIONS.needs_input` (currently 3) but omitted the `decisionId` field. `NeedsInputWireData` declares `decisionId: string` as required since v2 — orchestrators that branch on `data_version >= 2` expect `data.decisionId` for correlation with the matching `decision_auto` / `user_input_received` partner. Manual envelope construction bypasses `emitNeedsInput` (which mints the id internally), so mint via `nextDecisionId()` directly. * refactor(bugbot): extract appIdResponseSchema factory to prevent wire drift The same JSON Schema 2020-12 fragment (`$schema`, `type: 'object'`, `properties.appId.{type:'string', pattern:'^\\d+$', description}`, `required: ['appId']`) was hand-written identically in three places: - agent-ui.ts env-selection legacy `prompt` event emit - agent-ui.ts env-selection canonical `emitNeedsInput` - commands/projects.ts manual envelope Two of three were byte-identical. A future change (drop `pattern` for a real `enum`, add a property, bump dialect) lands in three places silently or — worse — in one place, drifting the wire contract. Extracted `appIdResponseSchema(description)` as a typed factory in `agent-events.ts` alongside the `ResponseSchemaFragment` interface. Only the human-facing `description` varies between callsites; the structural parts are now one place. All 9 response-schema tests still pass. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Redundant extractToolFailureMessage call in PostToolUse hook
- Replaced the redundant second extractToolFailureMessage(input) call at line 578 with the already-computed outcomeFailureMessage variable from line 470, eliminating both the redundancy and the inconsistent error handling.

Or push these changes by commenting:

@cursor push e70c5bb870

Preview (e70c5bb870)

diff --git a/src/lib/inner-lifecycle.ts b/src/lib/inner-lifecycle.ts
--- a/src/lib/inner-lifecycle.ts
+++ b/src/lib/inner-lifecycle.ts
@@ -154,8 +154,8 @@
     typeof input.tool_response !== 'undefined'
       ? input.tool_response
       : typeof input.tool_result !== 'undefined'
-      ? input.tool_result
-      : null;
+        ? input.tool_result
+        : null;
   if (result === null || result === undefined) {
     return { content: null, exitCode: undefined };
   }
@@ -224,8 +224,8 @@
     typeof input.tool_response !== 'undefined'
       ? input.tool_response
       : typeof input.tool_result !== 'undefined'
-      ? input.tool_result
-      : null;
+        ? input.tool_result
+        : null;
   if (result === null || result === undefined) return null;
   if (typeof result === 'object') {
     const obj = result as Record<string, unknown>;
@@ -334,14 +334,14 @@
       typeof input.tool_name === 'string'
         ? input.tool_name
         : typeof input.toolName === 'string'
-        ? input.toolName
-        : 'unknown';
+          ? input.toolName
+          : 'unknown';
     const toolInput =
       typeof input.tool_input !== 'undefined'
         ? input.tool_input
         : typeof input.toolInput !== 'undefined'
-        ? input.toolInput
-        : null;
+          ? input.toolInput
+          : null;
     const summary = summarizeToolInput(toolName, toolInput);
     // Capture the SDK's `tool_use_id` so the PostToolUse `tool_response`
     // envelope can correlate with this `tool_call`. Stored in a closure-
@@ -411,8 +411,8 @@
         typeof obj.file_path === 'string'
           ? obj.file_path
           : typeof obj.path === 'string'
-          ? obj.path
-          : null;
+            ? obj.path
+            : null;
       if (path) {
         try {
           getUI().recordFileChangePlanned({ path, operation });
@@ -454,8 +454,8 @@
       typeof input.tool_name === 'string'
         ? input.tool_name
         : typeof input.toolName === 'string'
-        ? input.toolName
-        : 'unknown';
+          ? input.toolName
+          : 'unknown';
 
     // Record the run-level tool-call outcome for ALL tools (not just
     // write tools) so `tool_call_summary` carries an accurate
@@ -524,8 +524,8 @@
           typeof input.tool_input !== 'undefined'
             ? input.tool_input
             : typeof input.toolInput !== 'undefined'
-            ? input.toolInput
-            : null;
+              ? input.toolInput
+              : null;
         const responseSummary = summarizeToolInput(
           toolName,
           toolInputForSummary,
@@ -556,8 +556,8 @@
       typeof input.tool_input !== 'undefined'
         ? input.tool_input
         : typeof input.toolInput !== 'undefined'
-        ? input.toolInput
-        : null;
+          ? input.toolInput
+          : null;
     const obj =
       toolInput && typeof toolInput === 'object'
         ? (toolInput as Record<string, unknown>)
@@ -566,8 +566,8 @@
       typeof obj.file_path === 'string'
         ? obj.file_path
         : typeof obj.path === 'string'
-        ? obj.path
-        : null;
+          ? obj.path
+          : null;
     if (path) {
       // v2 protocol: gate on tool_result outcome. If the write tool
       // surfaced an error, emit `file_change_failed` and SKIP
@@ -575,14 +575,13 @@
       // successful write to the orchestrator's audit trail. The
       // pre-write entry stays in the rollback ledger so a cancelled
       // run can still restore the original on-disk state.
-      const failureMessage = extractToolFailureMessage(input);
-      if (failureMessage !== null) {
+      if (outcomeFailureMessage !== null) {
         try {
           getUI().emitFileChangeFailed?.({
             path,
             operation,
-            errorClass: classifyFileChangeError(failureMessage),
-            errorMessage: failureMessage,
+            errorClass: classifyFileChangeError(outcomeFailureMessage),
+            errorMessage: outcomeFailureMessage,
           });
         } catch {
           // See preToolUse — same defensive swallow.

_{You can send follow-ups to the cloud agent here.}

kelsonpw · 2026-05-14T21:53:58Z

@cursor push e70c5bb

…ureMessage call Applied via @cursor push command

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: extractToolFailureMessage false positive on is_error: false with error field
- Added obj.is_error !== false guard to the error-string fallback branch so that when is_error is explicitly false, the error string is ignored and the tool result is not misclassified as a failure.

Or push these changes by commenting:

@cursor push 5318fdd275

Preview (5318fdd275)

diff --git a/src/lib/inner-lifecycle.ts b/src/lib/inner-lifecycle.ts
--- a/src/lib/inner-lifecycle.ts
+++ b/src/lib/inner-lifecycle.ts
@@ -154,8 +154,8 @@
     typeof input.tool_response !== 'undefined'
       ? input.tool_response
       : typeof input.tool_result !== 'undefined'
-        ? input.tool_result
-        : null;
+      ? input.tool_result
+      : null;
   if (result === null || result === undefined) {
     return { content: null, exitCode: undefined };
   }
@@ -224,13 +224,18 @@
     typeof input.tool_response !== 'undefined'
       ? input.tool_response
       : typeof input.tool_result !== 'undefined'
-        ? input.tool_result
-        : null;
+      ? input.tool_result
+      : null;
   if (result === null || result === undefined) return null;
   if (typeof result === 'object') {
     const obj = result as Record<string, unknown>;
     // Newer SDK shape: `{ is_error: true, error: 'msg' }` or `{ error: 'msg' }`.
-    if (obj.is_error === true || (obj.error && typeof obj.error === 'string')) {
+    // When `is_error` is explicitly `false`, the boolean flag is authoritative
+    // and the `error` string (possibly an informational warning) is ignored.
+    if (
+      obj.is_error === true ||
+      (obj.is_error !== false && obj.error && typeof obj.error === 'string')
+    ) {
       const msg = typeof obj.error === 'string' ? obj.error : 'tool error';
       return msg;
     }
@@ -334,14 +339,14 @@
       typeof input.tool_name === 'string'
         ? input.tool_name
         : typeof input.toolName === 'string'
-          ? input.toolName
-          : 'unknown';
+        ? input.toolName
+        : 'unknown';
     const toolInput =
       typeof input.tool_input !== 'undefined'
         ? input.tool_input
         : typeof input.toolInput !== 'undefined'
-          ? input.toolInput
-          : null;
+        ? input.toolInput
+        : null;
     const summary = summarizeToolInput(toolName, toolInput);
     // Capture the SDK's `tool_use_id` so the PostToolUse `tool_response`
     // envelope can correlate with this `tool_call`. Stored in a closure-
@@ -411,8 +416,8 @@
         typeof obj.file_path === 'string'
           ? obj.file_path
           : typeof obj.path === 'string'
-            ? obj.path
-            : null;
+          ? obj.path
+          : null;
       if (path) {
         try {
           getUI().recordFileChangePlanned({ path, operation });
@@ -454,8 +459,8 @@
       typeof input.tool_name === 'string'
         ? input.tool_name
         : typeof input.toolName === 'string'
-          ? input.toolName
-          : 'unknown';
+        ? input.toolName
+        : 'unknown';
 
     // Record the run-level tool-call outcome for ALL tools (not just
     // write tools) so `tool_call_summary` carries an accurate
@@ -524,8 +529,8 @@
           typeof input.tool_input !== 'undefined'
             ? input.tool_input
             : typeof input.toolInput !== 'undefined'
-              ? input.toolInput
-              : null;
+            ? input.toolInput
+            : null;
         const responseSummary = summarizeToolInput(
           toolName,
           toolInputForSummary,
@@ -556,8 +561,8 @@
       typeof input.tool_input !== 'undefined'
         ? input.tool_input
         : typeof input.toolInput !== 'undefined'
-          ? input.toolInput
-          : null;
+        ? input.toolInput
+        : null;
     const obj =
       toolInput && typeof toolInput === 'object'
         ? (toolInput as Record<string, unknown>)
@@ -566,8 +571,8 @@
       typeof obj.file_path === 'string'
         ? obj.file_path
         : typeof obj.path === 'string'
-          ? obj.path
-          : null;
+        ? obj.path
+        : null;
     if (path) {
       // v2 protocol: gate on tool_result outcome. If the write tool
       // surfaced an error, emit `file_change_failed` and SKIP

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 3b9469b. Configure here.}

…s false

kelsonpw · 2026-05-15T21:36:22Z

@cursor push 5318fdd

…s false Applied via @cursor push command

kelsonpw and others added 26 commits May 10, 2026 22:46

style: prettier formatting fixes for CI lint pass

fdef630

PR B subagent's final commit skipped a prettier pass on these five files. Pure whitespace / wrap reflows — no semantic change.

kelsonpw requested a review from a team as a code owner May 11, 2026 14:49

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread src/lib/inner-lifecycle.ts

This was referenced May 11, 2026

feat(agent-mode): replace responseSchema English strings with JSON Schema 2020-12 fragments #732

Merged

feat(agent): surface log + run-dir paths on wizard_capabilities #742

Merged

kelsonpw force-pushed the feat/agent-mode-tool-response branch from 6c325e5 to 74e67fd Compare May 14, 2026 18:03

kelsonpw changed the base branch from feat/agent-mode-mcp-status to main May 14, 2026 18:05

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/lib/inner-lifecycle.ts Outdated

cursoragent and others added 2 commits May 14, 2026 21:54

fix: reuse outcomeFailureMessage instead of redundant extractToolFail…

c3e2e44

…ureMessage call Applied via @cursor push command

Merge branch 'main' into feat/agent-mode-tool-response

3b9469b

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/lib/inner-lifecycle.ts

kelsonpw and others added 3 commits May 14, 2026 15:03

fix(lint): remove duplicate blank lines in wizard-abort.ts

382bebf

fix(lint): apply prettier to inner-lifecycle.ts ternary indents

7d68774

fix: prevent extractToolFailureMessage false positive when is_error i…

5318fdd

…s false

fix: prevent extractToolFailureMessage false positive when is_error i…

7b6ca71

…s false Applied via @cursor push command

kelsonpw merged commit f49da82 into main May 15, 2026
13 checks passed

amplitude-release-bot Bot mentioned this pull request May 15, 2026

chore(main): release wizard 1.18.0 #694

Open

kelsonpw mentioned this pull request May 16, 2026

feat(agent-mode): model_used event for orchestrator model-awareness #726

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent-mode): tool_response envelope + tool_call.id correlation#730

feat(agent-mode): tool_response envelope + tool_call.id correlation#730
kelsonpw merged 34 commits into
mainfrom
feat/agent-mode-tool-response

kelsonpw commented May 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

kelsonpw commented May 14, 2026

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

kelsonpw commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kelsonpw commented May 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. tool_response envelope (data_version: 1)

2. tool_call.id correlation field (tool_call bumped to data_version: 2)

Sample envelopes

Safety

Test plan

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kelsonpw commented May 14, 2026

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kelsonpw commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kelsonpw commented May 11, 2026 •

edited by cursor Bot

Loading

1. `tool_response` envelope (`data_version: 1`)

2. `tool_call.id` correlation field (`tool_call` bumped to `data_version: 2`)

cursor Bot left a comment •

edited

Loading

cursor Bot left a comment •

edited

Loading

cursor Bot left a comment •

edited

Loading