feat(tui): v2 PR 5 — full screen-tree redesign + IA + render-cost teardown (stacks on #693) by kelsonpw · Pull Request #696 · amplitude/wizard

kelsonpw · 2026-05-09T17:18:53Z

TL;DR

Unique to this PR: full TUI v2 screen-tree polish — shared lifecycle glyph vocabulary, Operator Overview reframing of /status, mode badge in HeaderBar, /help slash command with mid-run grouping, and the useWizardSelector slice-hook infrastructure for render-cost discipline.
Depends on: feat(orchestration): v2 PR 4 — widen wiring + retire WizardSession redundancy + supervisor + live status refresh (stacks on #691) #693 (PR 4 — wiring + supervisor + live refresh). Stacks feat(orchestration): v2 foundation — durable state, lifecycle, status/resume commands (PR 1 of 3) #689 → feat(orchestration): v2 PR 2 — user-choice + manual-verification checkpoints + MCP-app lifecycle (stacks on #689) #690 → feat(orchestration): v2 PR 3 — TUI integration + MCP-server tool parity + perf hot-paths + resilience (stacks on #690) #691 → feat(orchestration): v2 PR 4 — widen wiring + retire WizardSession redundancy + supervisor + live status refresh (stacks on #691) #693 → feat(tui): v2 PR 5 — full screen-tree redesign + IA + render-cost teardown (stacks on #693) #696. Final PR in the v2 stack.
Most load-bearing test: src/ui/tui/screens/__tests__/StatusOverlayScreen.glyphs.test.tsx — pins the canonical glyph palette, the "what's blocking" summary headline, and the mode-badge + Blocked-vs-Running distinction.
If you read one section, read this one: "Glyph palette (canonical vocabulary)" — every primary surface in the TUI now sources its state-symbol from src/ui/tui/utils/lifecycle-display.ts. Silent drift trips a unit test.

How to review without running

The diff is large. Below are the 7 files where most reviewable behavior lives, with what each one shows. Skim these in order to get the substance of the PR without checking out the branch.

src/ui/tui/utils/lifecycle-display.ts — Canonical glyph + label + color vocabulary mapping TaskLifecycle → {glyph, color, label}. This is the source-of-truth that all primary surfaces (operator overview, progress list, header) consume. The exhaustive switch keeps drift caught at compile time.
src/ui/tui/utils/__tests__/mode-badge.test.ts — Spec-as-test for mode-badge resolution priority (agent > ci > nested > mcp-server > suppressed in plain interactive). Reads top-to-bottom as the resolution rules; covers CLAUDECODE=1 and CLAUDE_CODE_ENTRYPOINT envs.
src/ui/tui/screens/StatusOverlayScreen.tsx — The Operator Overview itself. Sectioned by Session / Primary work / Background / Pending choices / Pending verifications / MCP capabilities / Owned artifacts / Next action. Includes the new "what's blocking the run?" headline summary.
src/ui/tui/screens/__tests__/StatusOverlayScreen.glyphs.test.tsx — Regression pins for the glyph palette, the summary headline ("Waiting on N choices…", "N primary task(s) in flight", etc.), the Blocked vs. Running distinction (⏸ red vs. › violet), and mode-badge rendering.
src/ui/tui/hooks/useWizardSelector.ts + src/ui/tui/hooks/__tests__/useWizardSelector.test.tsx — Slice-hook infrastructure with shallowArrayEqual / shallowObjectEqual. The render-cost ceiling test (src/ui/tui/__tests__/render-cost.test.tsx) demonstrates the ~60% render-budget cut for slice-subscribed components versus whole-store subscribers.
src/ui/tui/components/HeaderBar.tsx + src/ui/tui/components/__tests__/HeaderBar.modeBadge.test.tsx — Mode badge placement and conditional suppression in plain interactive mode.
src/ui/tui/__tests__/console-commands-help.test.ts — Locks down the /help text contract: which commands are listed, how they're grouped (always-available vs. paused-during-run), and the hanging-indent multi-line feedback rendering.

Recommended runtime check: pnpm try --install-dir=<some-test-app> and press /status — confirms the Operator Overview renders with the new glyphs, summary headline, and mode badge in the header.

Stacks on #693 → #691 → #690 → #689. Merge after PRs 1+2+3+4.

Problem

PRs 1–4 fixed the substrate (durable orchestration store, lifecycle, choice/verification primitives, supervisor with PID + heartbeats, live file-watcher refresh). The TUI surface was solid but unfinished:

Some screens needed terminal resize to redraw after transitions.
Prompts could disappear or get re-asked after a durable answer.
"Success" UI showed up while a manual verification was still pending.
Background agents were hard to distinguish from user-directed work.
/status was usable but not refreshable while open.
The slash-command bar was easy to miss during active runs.

PR 5 turns the TUI from "screens that mostly work" into a serious operator interface: coherent IA, shared glyph vocabulary, actionable operator overview, and render-cost discipline.

IA redesign

Three-zone layout (top → bottom):

Header (≤ 2 rows): JourneyStepper + identity + mode badge.
Body (flex, dominant): active screen content.
Chrome (≤ 3 rows): inline status pill, KeyHintBar, slash prompt.

The mode badge surfaces [agent] / [ci] / [nested] / [mcp-server] (suppressed in plain interactive mode). Resolution priority is documented in docs/tui-v2.md.

ASCII layout — 3 viewports

Wide (142×41)

✓ Welcome ─ ✓ Auth ─ ● Setup ←  ─ ○ Verify ─ ○ Done
Amplitude Wizard  [agent]                                                    · Acme / Web App / Production
─────────────────────────────────────────────────────────────────────────────────────────────────────────

Tasks                                                                       Discovered facts
✓ Detect framework                                                          · framework=Next.js
› Install Amplitude                                                         · package_manager=pnpm
○ Plan and approve events                                                   · TypeScript=yes
○ Wire up event tracking
  · Reading package.json
  · Running pnpm add @amplitude/analytics-browser

Standard (100×30)

✓ Welcome ─ ✓ Auth ─ ● Setup ←  ─ ○ Verify ─ ○ Done
Amplitude Wizard  [agent]                                · Acme / WebApp / Prod
──────────────────────────────────────────────────────────────────────────────

Tasks
✓ Detect framework
› Install Amplitude
  · Reading package.json
○ Plan and approve events
○ Wire up event tracking
──────────────────────────────────────────────────────────────────────────────
◆ Status: pnpm install (3.4s)
Tab=ask  ←/→=tabs  Ctrl+C=cancel
❯ Press / for commands

Narrow (80×24)

✓ ─ ✓ ─ ● ─ ○ ─ ○
Amplitude Wizard  [agent]                       · Acme / WebApp
─────────────────────────────────────────────────────────────────
Tasks
✓ Detect framework
› Installing Amplitude
○ Plan events
○ Wire up
─────────────────────────────────────────────────────────────────
◆ pnpm add … (3.4s)
Ctrl+C=cancel
❯ /

Glyph palette (canonical vocabulary)

Every primary surface shares one vocabulary:

State	Glyph	Color	Meaning
Queued	`○`	muted	Created, awaiting start
Running	`›`	violet	Actively executing
Waiting	`…`	blue	Paused on user choice / verification
Blocked	`⏸`	red	Cannot proceed (auth, network, dep)
Completed	`✓`	success	Terminal: success
Failed	`✗`	red	Terminal: failure
Cancelled	`⊘`	amber	Terminal: cancelled by user
Superseded	`⮕`	muted	Terminal: replaced by another task

Lives in src/ui/tui/utils/lifecycle-display.ts, sourced from TaskLifecycle. Pinned by unit tests so silent drift trips a test.

Screen tree changes

Kept: Welcome, Auth phase screens, Setup/Run/Outro, all 6 overlays.
Reframed: StatusOverlayScreen → "Operator Overview". Sectioned by Session / Primary work / Background / Pending choices / Pending verifications / MCP capabilities / Owned artifacts / Next action. Live-refresh via PR 4's useOrchestrationStore hook so a sibling shell running wizard choice answer … updates the open overlay without close+re-open.
Augmented: HeaderBar gets a mode badge.

Prompt UX contract

Every choice prompt renders the full contract:

Field	Source
Why-asking	`Choice.whyAsking`
Options + descriptions	`Choice.options[]`
Recommended option	`Choice.recommendedOptionId`
Safe-default option	`Choice.safeDefaultOptionId`
Reversible	`Choice.reversible`
Requires human	`Choice.requiresHuman`
Consequence if skipped	`Choice.consequenceIfSkipped`
Resume command	`Choice.resumeCommand`
Skip safety	derived: safe-default + !requires-human + reversible

ChoiceCheckpointBanner renders the full block; the operator overview renders an inline condensed version that still includes every field.

Slash command coherence

New /help command lists every registered command grouped by "available anytime" vs "available before/after a setup run". When a run is active, the second group renames itself "paused while a setup run is active (Ctrl+C to cancel, then retry)" so the user knows exactly why a command can't fire and what to do.

Multi-line command feedback (e.g. /help, /diagnostics) now renders with a hanging indent so it reads as one coherent block, not several disconnected lines.

Render-cost teardown

New useWizardSelector(store, selector, isEqual?) slice hook. Components subscribed to a slice no longer rerender for unrelated store ticks. shallowArrayEqual and shallowObjectEqual exported alongside.

Render-cost benchmark fixture (src/ui/tui/__tests__/render-cost.test.tsx):

Subscriber type	3 task transitions + 5 status bumps
Whole-store	8+ renders
Tasks slice	3 renders
Status slice	5 renders

Slicing cuts each subscriber's render budget by ~60% in this scenario. The infrastructure ships in PR 5; migrating individual subscribers is incremental.

Bugs addressed (with regression-test refs)

Mode-badge invisibility for nested-agent invocations — CLAUDECODE=1 / CLAUDE_CODE_ENTRYPOINT now surface as [nested] in the header and in the operator overview. Test: src/ui/tui/utils/__tests__/mode-badge.test.ts, src/ui/tui/screens/__tests__/StatusOverlayScreen.glyphs.test.tsx.
Operator overview blocked-vs-running confusion — Blocked state now renders the distinct ⏸ glyph + red color, separate from running › + violet. Test: StatusOverlayScreen.glyphs.test.tsx.
Multi-line slash feedback wrapping awkwardly — feedback now hanging-indents continuation lines instead of single-line truncation. Covered by console-commands-help.test.ts (text contract) and existing ConsoleView tests.
No "what's blocking the run?" headline in the operator overview — added a 1-line summary that resolves to "Waiting on N choices from you" / "Waiting on N manual verifications" / "N primary task(s) in flight" / "Session active — no pending action" / "No active session". Test: StatusOverlayScreen.glyphs.test.tsx::summary headline.

The brief's other bug categories (success-while-pending, prompt re-ask after durable answer, prompts that disappear) are already covered by PR 3's ManualVerificationRibbon integration and the existing ChoiceCheckpoint.test.tsx / OutroScreen.verificationRibbon.test.tsx regression tests, which pass on PR 5.

Tests added

40 new tests over the base 3949 (3989/3989 vitest):

5 — lifecycle-display vocabulary
9 — mode-badge env resolution
6 — /help text generation
5 — HeaderBar mode badge rendering
4 + 3 — useWizardSelector primitives + render-cost ceiling
7 — StatusOverlayScreen glyph palette + summary + mode badge
1 — StatusOverlayScreen Operator Overview reframing (existing test updated)

Backward compatibility

All existing slash commands continue to work the same way; /help is additive.
/status overlay's data shape is unchanged from PR 3; only the rendering reorganized.
--agent, --ci, --json, manifest, plan, apply, verify, MCP server, v: 1 envelope, exit codes — all unchanged.
Mode badge is suppressed in plain interactive mode, preserving the prior header look for the most common case.
ProgressList still uses a blank gutter for pending rows rather than the canonical ○ glyph (deliberate UX trade-off — see comment in ProgressList.tsx).

Known limitations & follow-ups

The render-cost helpers exist (useWizardSelector); migrating every subscriber over is out of scope for PR 5. The infrastructure is in place; the migration is incremental.
The operator overview is rendered via the same overlay infrastructure as before; it doesn't yet support keyboard-actionable choice resolution from inside the overlay.

Test plan

pnpm exec vitest run --pool=forks --maxWorkers=1 — 277 files / 3989 tests pass
pnpm test:bdd — 100/100 scenarios pass
pnpm build (TypeScript + smoke test) — clean
pnpm lint — clean (only pre-existing warning unchanged)

🤖 Generated with Claude Code

Note

Medium Risk
Adds new orchestration-related CLI commands and MCP tool surfaces plus changes to credential-resolution signaling, which can affect automation and operator workflows if schemas/exit codes drift. Most changes are additive and covered by new schema-validated smoke tests, but they touch user-facing command routing and external contracts.

Overview
Introduces a durable orchestration inspection surface across the CLI and the external MCP server, adding tasks/task/sessions/session/resume/orchestration status plus choice and verification subcommands with documented extended exit codes and JSON envelopes.

Wires a first “beachhead” mirror for agent-mode environment selection into the orchestration store (records an environment_selection choice and marks it answered), and updates credential resolution to return a discriminated outcome that is logged by bin.ts for debuggability.

Expands docs/README with /status and TUI v2/operator concepts, adds shared orchestration CLI helpers (install-dir resolution, JSON error envelopes), and adds extensive new tests (CLI smoke tests against real binary, MCP-server orchestration tool parity, SSE frame suppression, and per-run cache memoization).

^{Reviewed by Cursor Bugbot for commit 6210b95. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Primary tasks render glyph twice (double glyph)
- Removed the StateBadge component (which rendered its own glyph) from the primary tasks section and inlined just the label text, matching the pattern used by background tasks to render the glyph exactly once per row.

Or push these changes by commenting:

@cursor push 97259f1ad4

Preview (97259f1ad4)

diff --git a/src/ui/tui/screens/StatusOverlayScreen.tsx b/src/ui/tui/screens/StatusOverlayScreen.tsx
--- a/src/ui/tui/screens/StatusOverlayScreen.tsx
+++ b/src/ui/tui/screens/StatusOverlayScreen.tsx
@@ -56,24 +56,13 @@
 import { ChoiceStatus } from '../../../lib/orchestration/checkpoints/choices.js';
 import { VerificationStatus } from '../../../lib/orchestration/checkpoints/verifications.js';
 import { resolveMode } from '../utils/mode-badge.js';
-import {
-  lifecycleDisplay,
-  type LifecycleDisplay,
-} from '../utils/lifecycle-display.js';
+import { lifecycleDisplay } from '../utils/lifecycle-display.js';
 import { TaskLifecycle } from '../../../lib/orchestration/lifecycle.js';
 
 interface StatusOverlayScreenProps {
   store: WizardStore;
 }
 
-/** Compact "glyph + label" badge used in every section. */
-const StateBadge = ({ display }: { display: LifecycleDisplay }) => (
-  <Text color={display.color} bold>
-    {display.glyph}{' '}
-    <Text color={display.color}>{display.label}</Text>
-  </Text>
-);
-
 /**
  * Section header — bold, secondary color, with a count badge.
  * Pulled out so the operator overview's many sections share the same
@@ -262,7 +251,7 @@
                 {display.glyph}{' '}
               </Text>
               <Text color={Colors.body}>
-                <StateBadge display={display} /> — {t.label}
+                <Text color={display.color}>{display.label}</Text> — {t.label}
               </Text>
             </Box>
           );

_{You can send follow-ups to the cloud agent here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: shallowObjectEqual fails for objects with different keys
- Added Object.prototype.hasOwnProperty.call(b, k) check before the value comparison so objects with the same key count but different key names are correctly detected as unequal.

Or push these changes by commenting:

@cursor push 931ca6d84a

Preview (931ca6d84a)

diff --git a/src/ui/tui/hooks/useWizardSelector.ts b/src/ui/tui/hooks/useWizardSelector.ts
--- a/src/ui/tui/hooks/useWizardSelector.ts
+++ b/src/ui/tui/hooks/useWizardSelector.ts
@@ -107,7 +107,8 @@
   const bk = Object.keys(b);
   if (ak.length !== bk.length) return false;
   for (const k of ak) {
-    if (!Object.is(a[k], b[k])) return false;
+    if (!Object.prototype.hasOwnProperty.call(b, k) || !Object.is(a[k], b[k]))
+      return false;
   }
   return true;
 }

_{You can send follow-ups to the cloud agent here.}

…e task label (#695) PR #688 made the inline status pill flush with the content area, sitting directly under the Tasks list. Tier 6 of the resolver returns the in-progress canonical task's `activeForm` — which is the SAME string ProgressList already renders for that row above the pill. The result was a visible duplicate: the Tasks list showed `› Detecting project setup` and the pill below showed `◇ Detecting project setup`. Fix: in `resolveRunStatusPill`, suppress tier 6 by returning `undefined` whenever any canonical task is in_progress. Higher-priority tiers (file writes, tool activity, event-plan-await, currentActivity, post-agent steps) keep firing because they carry signal the Tasks list does NOT show. Tier 7 (`pushStatus` cold-start fallback) is also skipped while a canonical task is in_progress to prevent stale narration leaking in once tier 6 stops covering for it. Tests: pinned the new contract in `run-status-pill.test.ts` (suppression, no over-suppression of tiers 1-5, the screenshot scenario, and the "no tier 7 leak" guard), updated `RunScreen.statusPill.test.tsx` and `RunScreen.spacing.test.tsx` to match.

* perf(build): bundle wizard via tsup for faster cold-start Replaces the per-file `tsc` JS emit with a single tsup-driven bundle so cold-start parses one file (`dist/bin.js`) instead of resolving and loading 343 individual modules from `dist/src/`. Type declarations are still emitted via a separate `tsc --emitDeclarationOnly` pass so the package's `.d.ts` surface is unchanged. Lazy-loads the heaviest externals on the cold-start path: - axios in src/utils/urls.ts (only used in detectRegionFromToken) - axios in src/utils/oauth.ts (only used in OAuth exchanges) - axios + apiClient in src/lib/api.ts (cached promise, first GraphQL call pays the import cost) - fast-glob in src/utils/environment.ts (only used in detectEnvVarPrefix during framework detection) Profile-instrumented numbers: cumulative require time drops from ~1.5 s / 1034 calls to ~0.25 s / 626 calls. Wall-clock --version median drops 260ms -> 250ms on a fast Mac (Node startup is the fixed-cost floor); savings are larger on slower hardware where IO dominates. Smoke tests cover --version, status --json, and mcp serve against the bundled artifact so regressions in the publish path are caught in vitest. Build is deterministic — two consecutive `pnpm build:bundle` runs produce byte-identical bin.js and bin.js.map. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bugbot): clear cached lazy-import promises on rejection The new lazy-load patterns for `axiosModulePromise`/`apiClientPromise` (in `lib/api.ts`), `axiosPromise` (in `utils/urls.ts`), and `fgPromise` (in `utils/environment.ts`) cached the dynamic-import promise via `??=` but never cleared the cache on rejection. A transient `import()` failure (broken install, partial filesystem, transient I/O) would poison every subsequent caller in the process with the same stale rejection forever. Switch to the same null-on-catch pattern already used by `loadDefaultDriver` in `agent-driver.ts`: store the cached promise, wire a `.catch()` that nulls the cache and re-throws so callers still see the original error, and let the next call retry the import cleanly. * test(bugbot): pin lazy-import rejection-clearing contract on detectRegionFromToken Add a regression test that mocks axios's `.default` getter to throw on the first call and asserts `detectRegionFromToken` re-attempts the import after a working axios is doMock'd in. Without the rejection-clearing branch in `loadAxios`, the second call would replay the cached rejection instead of returning a region. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: StatusOverlayScreen shows mode badge in interactive mode
- Added a showBadge guard (mode.key !== 'interactive') to conditionally render the mode badge in StatusOverlayScreen, matching the existing suppression logic in HeaderBar.

Or push these changes by commenting:

@cursor push a111fe2fd9

Preview (a111fe2fd9)

diff --git a/src/ui/tui/screens/StatusOverlayScreen.tsx b/src/ui/tui/screens/StatusOverlayScreen.tsx
--- a/src/ui/tui/screens/StatusOverlayScreen.tsx
+++ b/src/ui/tui/screens/StatusOverlayScreen.tsx
@@ -157,6 +157,7 @@
 
   const lsp = data.status.lastStoppingPoint;
   const mode = resolveMode();
+  const showBadge = mode.key !== 'interactive';
 
   // Split active tasks into "primary" (running/waiting/blocked) and
   // "background" (everything else among the active set — supervisor's
@@ -205,10 +206,14 @@
           <Text bold color={Colors.accent}>
             {Icons.diamond} Operator overview
           </Text>
-          <Text color={Colors.subtle}> {Icons.dot} </Text>
-          <Text color={mode.color} bold>
-            [{mode.label}]
-          </Text>
+          {showBadge && (
+            <>
+              <Text color={Colors.subtle}> {Icons.dot} </Text>
+              <Text color={mode.color} bold>
+                [{mode.label}]
+              </Text>
+            </>
+          )}
         </Box>
         <Text color={Colors.body}>{summary}</Text>
         <Text color={Colors.muted}>

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 8a800c0. Configure here.}

…)" (#697) This reverts commit 06506f7.

…d the revised plan renders verbatim (#701) The user reported typing "lowercase the event names" on the event-plan approval screen — but the revised plan came back with the same TitleCase names. Two root causes: 1. `confirm_event_plan` always force-Title-Cased every name via `normalizeEventName`. When the LLM dutifully revised with "user signed up", the wizard slammed it back to "User Signed Up" before the user ever saw the revised plan. The same Title-Cased names were then persisted to `.amplitude/events.json` and shipped into the eventual `track()` calls — exactly the display-vs-implementation drift the user noticed ("when they get implemented they don't show the same as this"). 2. After Enter, EventPlanFullScreen vanished and RunScreen rendered as if nothing had happened. No "Revising plan with your feedback…" beat, so the user concluded their note was dropped on the floor. Fix: - `normalizeEventName` is now non-destructive on already-multi-word inputs (lowercase, UPPERCASE, Sentence case all pass through). It still repairs schema-violating shapes (snake_case, kebab-case, camelCase, dotted, single-token). - Tool schema description and `confirm-event-plan-contract.md` now say "default to Title Case, but honor explicit user feedback that asks for a different casing convention". - New `revisingEventPlan` flag on `WizardStore`, set by `resolveEventPlan({decision:'revised'})` and cleared by the next `promptEventPlan`. New tier 3b in `resolveRunStatusPill` surfaces "Revising plan with your feedback…" until the LLM lands the next prompt. Tests: - `normalizeEventName` — preserves intentional lowercase/UPPERCASE/Sentence case on already-multi-word inputs; still repairs snake/kebab/camel/dotted. - `confirm_event_plan feedback round-trip` — feedback string returned verbatim to the agent ("feedback: lowercase the event names"); LLM payloads pass through to `promptEventPlan` and to `events.json` without re-casing. - `run-status-pill` tier 3b — "Revising plan with your feedback…" lights up after revised, clears on next prompt, doesn't fire on approve/skip. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e LLM an…" (#704) This reverts commit 8409487.

…assification of mid-stream 400s (#705) The `[legacy] DEBUG Agent result with error: API Error: 400 ...` log line was dumping the full failing SSE response body — hundreds of `event: message_start` / `data: {"type":"content_block_delta",...}` framing lines plus `partial_json` `tool_use` deltas — into the user-visible TUI Logs tab when the Anthropic gateway terminated a streaming response with a 4xx (most commonly `400 terminated` mid-stream; Sentry #7442894144). Two leaks: the SDK Result-message branch in `agent-interface.ts` called `logToFile('Agent result with error:', message.result)` with no truncation, AND `agent-runner.ts`'s `abortOnApiError` / `GATEWAY_DOWN` / soft-error branches interpolated the raw `rawMessage` straight into user-facing copy and Sentry context. Both surfaced the SSE body verbatim — past sessions surfaced 50KB+ `log.message` strings polluting orchestrator context. Add `suppressSseFrames(message)` and `sanitizeErrorMessageForLog` helpers to `agent-events.ts` that: - detect runs of Anthropic SSE protocol frames (event:/data:/bare-JSON forms for the eight known stream-event subtypes) - collapse each run into a single `[N SSE frames suppressed]` marker - cap the result at `MAX_LOG_MESSAGE_LENGTH` (existing 2KB budget) - preserve any non-frame content (real errors / stack traces riding alongside the protocol noise survive — same defense as the existing `stripStreamEventNoise` / `partitionHookBridgeRace` pair) Apply the sanitizer at every callsite that logs / interpolates an agent error string: the two `logToFile('Agent result with error:', message.result)` paths in `agent-interface.ts`, and the GATEWAY_DOWN / GATEWAY_INVALID_REQUEST / API_ERROR / RATE_LIMIT branches plus the soft-error pushStatus path in `agent-runner.ts`. Classification still runs against the raw form (the `400 terminated` regex matches the head of the message, not the SSE body) — only logging / user-surface paths take the sanitized form. Tests: `agent-events-sse-suppression.test.ts` (10 cases — fast-path no-op, contiguous-block collapse, inline-prefix split, real-error preservation, singular vs plural wording, bare-JSON form, unknown event-type passthrough, oversized-input pipeline) covers the matcher end-to-end and the truncation cap. Verdict: pre-existing leak in `agent-interface.ts:4220` going back to the original SDK Result handler; unrelated to #698 (which only adds TUI per-event status; doesn't touch error logging). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nv-selection instead of silent defer (#703) * fix(self-heal): unstall second run after .amplitude/ wipe — surface env-selection instead of silent defer After `git reset --hard` wipes `<installDir>/.amplitude/project-binding.json`, the second wizard run gets to credential resolution, finds 2 environments via `fetchAmplitudeUser`, and "defers" — populates `pendingOrgs` and returns `Promise<void>`. To any non-TUI caller (or anyone tailing the per-project log) the void return is indistinguishable from "credentials ready"; the wizard parks at "Detecting your project setup" with no diagnostic and no in-band signal a downstream surface can branch on to route the user to the env picker / emit `auth_required: env_selection_failed`. Make the resolver return a discriminated `ResolveCredentialsResult` so the deferred path is load-bearing instead of silent: - 'resolved' — credentials populated. - 'needs_user_choice' — pendingOrgs set; carries kind + envsWithKey count so the agent-mode rejection envelope can quote it. - 'api_key_notice' — fetch succeeded but no envs had keys. - 'unauthenticated' — no usable token; caller routes to fresh OAuth. - 'ci_env_token' — WIZARD_OAUTH_TOKEN env-var path won. The TUI bin path now logs the outcome at INFO so a tail of `log.txt` shows a concrete reason ("needs_user_choice / environment_selection / envsWithKey=2") instead of just the previous deferring log line. Existing callers in `commands/helpers.ts` (CI / agent path) and `commands/dashboard.ts` ignore the return value — they already branch on `session.credentials` / `session.pendingOrgs`, so behaviour is unchanged for them; the new contract is purely additive. Regression tests in `credential-resolution.test.ts`: - Multi-env defer scenario races resolveCredentials against a 1s timeout to prove no hang, asserts the `'needs_user_choice'` outcome with envsWithKey=2, and locks down that pendingOrgs + pending tokens are populated for the env picker. Pre-fix the void return would have been `undefined` and the await would never have surfaced the deferred state explicitly. - Sanity siblings cover the 'resolved' (cached API key) and 'unauthenticated' (no stored token) and 'api_key_notice' (admin-only project) outcomes. Updated the cli.test.ts mock to return `{ outcome: 'unauthenticated' }` (the closest analogue to the pre-fix `undefined`) so 17 test-timeout regressions on the TUI-auth-task / feature-discovery suites stay green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: stamp deferredEnvCount on filter-mismatch path to avoid unauthenticated misclassification Applied via @cursor push command --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com>

…/resume commands (PR 1 of 3) Introduce src/lib/orchestration/ — a durable, file-backed orchestration store that becomes the source of truth for sessions, tasks, subagents, ownership, and last-stopping-point. Adds six new read-only CLI commands (tasks/task/sessions/session/resume/orchestration status), each emitting Zod-validated JSON envelopes for outer agents. Foundation only. Legacy WizardSession remains the live in-memory surface; PR 2 wires checkpoints + MCP-app lifecycle, PR 3 retires duplicate state and ships the TUI redesign + MCP-server tool parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

resolveCommonOpts now passes argv.installDir through resolveInstallDir so quoted/env-sourced `~` actually expands instead of being treated as a literal directory name. The resume --execute path now imports spawn from utils/cross-platform- spawn so the npm-installed `amplitude-wizard` .cmd shim resolves on Windows. Node's built-in spawn does not consult PATHEXT and would fail with ENOENT for every Windows user invoking `wizard resume --execute`.

`wizard resume <session-id>` validated the requested session existed but then called `computeLastStoppingPoint(installDir)` without scoping, which always derived its next action from the *most recently created active session* — not the one the user asked about. The envelope's `sessionId` field still echoed the requested ID, so the command/description shown could describe a different session entirely. `computeLastStoppingPoint` now accepts an optional `sessionId` that restricts both session metadata and task buckets to that session. The resume command threads the resolved session id through, and an added test pins the scoping behavior against a two-session fixture.

…kpoints + MCP-app lifecycle Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of the v2 orchestration foundation: - Choice — typed user-choice records with stable promptId for de-dup, requiresHuman automation gate, and full status transitions (pending → answered/expired/cancelled/superseded). - Verification — manual out-of-band verification records with status transitions (pending → passed/failed/skipped, skipped/failed may recover to passed; passed/skipped/failed may supersede). - McpAppCapability — durable lifecycle for every MCP-app capability with an anti-nag invariant: install_skipped → needs_user_choice REQUIRES a non-empty lastStateChangeReason. New CLI commands: - wizard choice list / show / answer (with --confirm-human gate) - wizard verification list / show / mark Wires last-stopping-point's pendingChoices / pendingMcpActions / pendingManualVerifications arrays to read real records (was [] in PR 1). Two callsites instrumented as the PR 2 wiring beachhead: - env-selection in src/commands/helpers.ts (Choice mirror + answer) - event-plan-approval in src/lib/wizard-tools.ts (Verification mirror) Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI. No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The validation array did not include 'all', so passing the documented '--status all' opt-out exited with INVALID_ARGS before reaching the 'statusRaw === all' branch on line 132. Mirror the verification list guard: skip enum validation when statusRaw === 'all' and skip the cast so it stays undefined for the listChoices call.

…sionAt `TERMINAL_VERIFICATION_STATUSES` claimed `Failed` and `Skipped` were terminal, but the allowed-transitions table explicitly permits `Failed → Passed | Superseded` and `Skipped → Passed | Failed | Superseded`. That contradicts the Choice convention (terminal = no forward transitions other than re-supersede) and `last-stopping-point` already treats `Failed` as actionable via `pendingManualVerifications`. Reduce the terminal set to `Passed` and `Superseded`. `transitionMcpCapability` set `userDecision = 'pending'` on a transition back to `NeedsUserChoice` but left a stale `userDecisionAt` from the previous installed/skipped state. Consumers checking `userDecisionAt !== null` would incorrectly conclude a decision had been made. Null it out alongside the pending decision.

- `resume --execute` now attaches `child.on('error', …)` before `exit`. Previously a synchronous spawn failure (ENOENT, EACCES, missing PATH entry on Windows) fired an unhandled `error` event, which Node's EventEmitter rethrows — crashing the CLI with a stack trace instead of producing a clean message + GENERAL_ERROR exit. - `saveStore` was calling `ensureDir(dirname(path))` and then `ensureDir(getRunDir(installDir))` — both resolve to the same run directory because `getOrchestrationStoreFile()` is defined as `join(getRunDir(installDir), 'orchestration.json')`. Drop the second call and the now-unused `getRunDir` import.

- `computeLastStoppingPoint` already filtered tasks by `options.sessionId` but read the full unfiltered `file.choices` / `file.mcpCapabilities` / `file.verifications` arrays. `wizard resume <session-id>` could surface pending checkpoints belonging to a different (more recently active) session, producing a misleading `nextAction`. Filter each by the session-link field on the record (`linkedSessionId` for choices and MCP capabilities, `blockingSessionId` for verifications) so all four buckets stay consistent with the requested session. - `choice.ts` and `verification.ts` had inline `resolveCommonOpts` / `emitJson` / `emitJsonError` that omitted the `resolveInstallDir` call done correctly in `orchestration.ts`. A user passing `--install-dir ~/myapp` would resolve to `<cwd>/~/myapp` instead of the home-relative path, silently writing to the wrong store. Extract the helpers to a shared `orchestration-common.ts` and switch all three command modules to it so the `resolveInstallDir` fix applies uniformly and future drift is impossible.

`deriveNextAction` builds an `inspect_failure` next-action when the most recent task has stopped. The structured `command` array uses the configurable `cliPrefix` (sourced from `args.invocation`, which flows from `options.cliInvocation` on `computeLastStoppingPoint`), but the inline shell hint embedded in `description` was templating the hardcoded module-level `CLI_INVOCATION` constant. A custom invocation (e.g. an alternate `wizard` symlink, or a test harness overriding the binary name) would surface a description that says \`amplitude-wizard task <id>\` while the JSON payload's `command` points at the configured executable. Use `cliPrefix.join(' ')` for both so the human and machine views always agree.

`resumeCommand` is the human-facing copy-pasteable form of `nextAction.command`. It was built with `nextAction.command.join(' ')`, which silently corrupts paths containing spaces (e.g. an `installDir` of `/Users/me/my project` would land in the shell as two separate words). The structured `command` array stayed correct, but the string the user is invited to paste into a terminal would fail. Add a small `shellJoin` / `shellQuote` helper that wraps tokens with shell metacharacters or whitespace in single quotes (with the standard `'\''` close/escape/reopen dance for embedded single quotes). Tokens that are already shell-safe stay unquoted so the common case stays readable.

`HeaderBar` already gates the mode badge on `resolved.key !== 'interactive'`, so the default interactive run never sees a stray `[interactive]` chip. `StatusOverlayScreen` rendered the badge unconditionally in its header, so opening `/status` during a normal interactive run printed `[interactive]` next to "Operator overview" — which the brief explicitly says is noise. Mirror HeaderBar's gate and add tests covering both branches (suppressed in interactive, visible in agent mode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Progress tab on `feat/v2-tui-redesign` used a single-line `wrap="truncate-end"` row to display every planned event name joined by commas: ◆ Events: Nf User Signed Up, Nf User Signed In, Nf Use… Once the agent fills in 10+ events, the line truncates to "…" even though the screen has a full column of empty rows below it — the active task list collapses to ~5 lines once Wiring is the focused step. The user can't audit the plan they just approved without flipping to /events. Render one event per row with name + description, soft-wrapped. The bullet (`·`) lines up with the existing diamond glyph column. The `(N events)` count gives an at-a-glance scale check. No data-model change — `PlannedEvent` is still `{name, description}`. Per-event lifecycle status (queued → in_progress → done) is the shape of #698 against `main` and is a separate change. 27 RunScreen tests still pass; no test asserted on the comma-join shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous ReportViewer wrapped each visible line in `<Text wrap="truncate">`, which silently dropped the right edge of any line wider than the content area — events-table rows and code blocks with long paths or JSON payloads got a stray "…" decoration and the user had no way to read what was clipped. Add ANSI-aware horizontal panning so the user can shift the visible window left/right with `h`/`l` (or arrow keys), keep colour codes intact across the slice, and surface the full LogViewer-style key hint footer (↑↓/jk scroll · h/l pan · g/G top/bottom · 0 reset · Esc close). The pan offset is clamped to the widest line so users can't scroll into empty whitespace. Also include a regression test covering: (a) horizontal pan offset shifts visible content, (b) lines wider than the content area are no longer clipped at the right edge, (c) the key-hint footer surfaces all documented controls, and (d) `0` resets the pan offset. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…added/removed (#707) The event-plan approval flow used to feel like list replacement. Each time the user pressed `[F] give feedback`, the agent's revised plan silently overwrote the prior list — the user lost the context of what they had just said and what the AI changed in response. The recent "keep event names snake cased and prefixed" feedback round made this visible: the AI applied the prefix but the user couldn't tell from the new screen alone whether snake_case had also landed (it had not — the display normalizer was Title-Casing — but the lack of an explicit "AI: revised plan +N -M" signal meant nothing about the conversation was legible). Add a round-history layer so the screen can render the back-and-forth: * Store now keeps `EventPlanRound[]`, one entry per `promptEventPlan` call. Each round carries the AI's plan + the user feedback (if any) that produced it. Cleared on `approved` / `skipped`; persists across `revised` rounds. * `pendingPlanFeedback` (instance field) buffers feedback typed in one `[F]` decision and pairs it with the next `promptEventPlan` from the agent. Single-pair carry — no leakage across runs. * EventPlanFullScreen renders a conversational header on rounds ≥ 2: - "You: <quoted feedback>" - "AI: revised plan +N added · −M removed" (with green/red counts) * Per-row diff markers when a prior round exists: - `+` (green) for events new to this round - `−` (red, struck-through) for events the AI dropped - bullet (`·`) for unchanged events Diff is by name (description regen on revision is expected). Round 1 still renders the original "Suggested events for your app" title — the convo affordance only appears once there's actually a conversation to render. Tests: * 3 new cases in EventPlanFullScreen.test.tsx — round-2 quote+delta rendering, round-1 fallback, history clear on approve. * All 174 existing store tests + 6 existing screen tests stay green. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kelsonpw · 2026-05-13T23:35:17Z

V2 PR 5 (full TUI v2 screen-tree redesign) — replaced by V3 polish work landed and pending across the TUI v3 chain. Closing per direction change. Audit by subagent a209c551541d85df3.

kelsonpw requested a review from a team as a code owner May 9, 2026 17:18

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread src/ui/tui/screens/StatusOverlayScreen.tsx

kelsonpw force-pushed the feat/v2-widen-and-retire branch from 19ef9dd to b4d4e59 Compare May 9, 2026 17:23

kelsonpw force-pushed the feat/v2-tui-redesign branch from ff44bb5 to db7a96c Compare May 9, 2026 17:25

kelsonpw force-pushed the feat/v2-widen-and-retire branch from 2fb5920 to 5f4273d Compare May 9, 2026 17:45

kelsonpw force-pushed the feat/v2-tui-redesign branch from bd3bee6 to ca03480 Compare May 9, 2026 17:46

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread src/ui/tui/hooks/useWizardSelector.ts

kelsonpw mentioned this pull request May 9, 2026

feat(orchestration): v2 foundation — durable state, lifecycle, status/resume commands (PR 1 of 3) #689

Merged

11 tasks

kelsonpw force-pushed the feat/v2-tui-redesign branch from a45ea7d to 4551d44 Compare May 9, 2026 20:05

kelsonpw force-pushed the feat/v2-tui-redesign branch from 4551d44 to 8a800c0 Compare May 9, 2026 20:36

cursor Bot reviewed May 9, 2026

View reviewed changes

Comment thread src/ui/tui/screens/StatusOverlayScreen.tsx

kelsonpw and others added 17 commits May 9, 2026 13:40

Revert "perf(build): bundle wizard via tsup for faster cold-start (#692…

7412eb0

…)" (#697) This reverts commit 06506f7.

Revert "fix(event-plan): feedback now actually round-trips through th…

2d7f52b

…e LLM an…" (#704) This reverts commit 8409487.

chore: prettier --write for orchestration files

d642db3

fix: drop unnecessary non-null assertion on session.id

42dc60b

kelsonpw force-pushed the feat/v2-tui-redesign branch from 781c24b to 2c93edd Compare May 10, 2026 14:25

kelsonpw mentioned this pull request May 10, 2026

feat(tui): conversational event-plan approval (stacks on #696) #707

Merged

4 tasks

kelsonpw and others added 2 commits May 10, 2026 07:48

kelsonpw closed this May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tui): v2 PR 5 — full screen-tree redesign + IA + render-cost teardown (stacks on #693)#696

feat(tui): v2 PR 5 — full screen-tree redesign + IA + render-cost teardown (stacks on #693)#696
kelsonpw wants to merge 42 commits into
feat/v2-widen-and-retirefrom
feat/v2-tui-redesign

kelsonpw commented May 9, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

kelsonpw commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kelsonpw commented May 9, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

How to review without running

Problem

IA redesign

ASCII layout — 3 viewports

Wide (142×41)

Standard (100×30)

Narrow (80×24)

Glyph palette (canonical vocabulary)

Screen tree changes

Prompt UX contract

Slash command coherence

Render-cost teardown

Bugs addressed (with regression-test refs)

Tests added

Backward compatibility

Known limitations & follow-ups

Test plan

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kelsonpw commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kelsonpw commented May 9, 2026 •

edited by cursor Bot

Loading

cursor Bot left a comment •

edited

Loading

cursor Bot left a comment •

edited

Loading

cursor Bot left a comment •

edited

Loading