Skip to content

feat(tui): v2 PR 5 — full screen-tree redesign + IA + render-cost teardown (stacks on #693)#696

Closed
kelsonpw wants to merge 42 commits into
feat/v2-widen-and-retirefrom
feat/v2-tui-redesign
Closed

feat(tui): v2 PR 5 — full screen-tree redesign + IA + render-cost teardown (stacks on #693)#696
kelsonpw wants to merge 42 commits into
feat/v2-widen-and-retirefrom
feat/v2-tui-redesign

Conversation

@kelsonpw
Copy link
Copy Markdown
Member

@kelsonpw kelsonpw commented May 9, 2026

TL;DR

How to review without running

The diff is large. Below are the 7 files where most reviewable behavior lives, with what each one shows. Skim these in order to get the substance of the PR without checking out the branch.

  1. src/ui/tui/utils/lifecycle-display.ts — Canonical glyph + label + color vocabulary mapping TaskLifecycle{glyph, color, label}. This is the source-of-truth that all primary surfaces (operator overview, progress list, header) consume. The exhaustive switch keeps drift caught at compile time.
  2. src/ui/tui/utils/__tests__/mode-badge.test.ts — Spec-as-test for mode-badge resolution priority (agent > ci > nested > mcp-server > suppressed in plain interactive). Reads top-to-bottom as the resolution rules; covers CLAUDECODE=1 and CLAUDE_CODE_ENTRYPOINT envs.
  3. src/ui/tui/screens/StatusOverlayScreen.tsx — The Operator Overview itself. Sectioned by Session / Primary work / Background / Pending choices / Pending verifications / MCP capabilities / Owned artifacts / Next action. Includes the new "what's blocking the run?" headline summary.
  4. src/ui/tui/screens/__tests__/StatusOverlayScreen.glyphs.test.tsx — Regression pins for the glyph palette, the summary headline ("Waiting on N choices…", "N primary task(s) in flight", etc.), the Blocked vs. Running distinction ( red vs. violet), and mode-badge rendering.
  5. src/ui/tui/hooks/useWizardSelector.ts + src/ui/tui/hooks/__tests__/useWizardSelector.test.tsx — Slice-hook infrastructure with shallowArrayEqual / shallowObjectEqual. The render-cost ceiling test (src/ui/tui/__tests__/render-cost.test.tsx) demonstrates the ~60% render-budget cut for slice-subscribed components versus whole-store subscribers.
  6. src/ui/tui/components/HeaderBar.tsx + src/ui/tui/components/__tests__/HeaderBar.modeBadge.test.tsx — Mode badge placement and conditional suppression in plain interactive mode.
  7. src/ui/tui/__tests__/console-commands-help.test.ts — Locks down the /help text contract: which commands are listed, how they're grouped (always-available vs. paused-during-run), and the hanging-indent multi-line feedback rendering.

Recommended runtime check: pnpm try --install-dir=<some-test-app> and press /status — confirms the Operator Overview renders with the new glyphs, summary headline, and mode badge in the header.


Stacks on #693#691#690#689. Merge after PRs 1+2+3+4.

Problem

PRs 1–4 fixed the substrate (durable orchestration store, lifecycle, choice/verification primitives, supervisor with PID + heartbeats, live file-watcher refresh). The TUI surface was solid but unfinished:

  • Some screens needed terminal resize to redraw after transitions.
  • Prompts could disappear or get re-asked after a durable answer.
  • "Success" UI showed up while a manual verification was still pending.
  • Background agents were hard to distinguish from user-directed work.
  • /status was usable but not refreshable while open.
  • The slash-command bar was easy to miss during active runs.

PR 5 turns the TUI from "screens that mostly work" into a serious operator interface: coherent IA, shared glyph vocabulary, actionable operator overview, and render-cost discipline.

IA redesign

Three-zone layout (top → bottom):

  1. Header (≤ 2 rows): JourneyStepper + identity + mode badge.
  2. Body (flex, dominant): active screen content.
  3. Chrome (≤ 3 rows): inline status pill, KeyHintBar, slash prompt.

The mode badge surfaces [agent] / [ci] / [nested] / [mcp-server] (suppressed in plain interactive mode). Resolution priority is documented in docs/tui-v2.md.

ASCII layout — 3 viewports

Wide (142×41)

✓ Welcome ─ ✓ Auth ─ ● Setup ←  ─ ○ Verify ─ ○ Done
Amplitude Wizard  [agent]                                                    · Acme / Web App / Production
─────────────────────────────────────────────────────────────────────────────────────────────────────────

Tasks                                                                       Discovered facts
✓ Detect framework                                                          · framework=Next.js
› Install Amplitude                                                         · package_manager=pnpm
○ Plan and approve events                                                   · TypeScript=yes
○ Wire up event tracking
  · Reading package.json
  · Running pnpm add @amplitude/analytics-browser

Standard (100×30)

✓ Welcome ─ ✓ Auth ─ ● Setup ←  ─ ○ Verify ─ ○ Done
Amplitude Wizard  [agent]                                · Acme / WebApp / Prod
──────────────────────────────────────────────────────────────────────────────

Tasks
✓ Detect framework
› Install Amplitude
  · Reading package.json
○ Plan and approve events
○ Wire up event tracking
──────────────────────────────────────────────────────────────────────────────
◆ Status: pnpm install (3.4s)
Tab=ask  ←/→=tabs  Ctrl+C=cancel
❯ Press / for commands

Narrow (80×24)

✓ ─ ✓ ─ ● ─ ○ ─ ○
Amplitude Wizard  [agent]                       · Acme / WebApp
─────────────────────────────────────────────────────────────────
Tasks
✓ Detect framework
› Installing Amplitude
○ Plan events
○ Wire up
─────────────────────────────────────────────────────────────────
◆ pnpm add … (3.4s)
Ctrl+C=cancel
❯ /

Glyph palette (canonical vocabulary)

Every primary surface shares one vocabulary:

State Glyph Color Meaning
Queued muted Created, awaiting start
Running violet Actively executing
Waiting blue Paused on user choice / verification
Blocked red Cannot proceed (auth, network, dep)
Completed success Terminal: success
Failed red Terminal: failure
Cancelled amber Terminal: cancelled by user
Superseded muted Terminal: replaced by another task

Lives in src/ui/tui/utils/lifecycle-display.ts, sourced from TaskLifecycle. Pinned by unit tests so silent drift trips a test.

Screen tree changes

  • Kept: Welcome, Auth phase screens, Setup/Run/Outro, all 6 overlays.
  • Reframed: StatusOverlayScreen → "Operator Overview". Sectioned by Session / Primary work / Background / Pending choices / Pending verifications / MCP capabilities / Owned artifacts / Next action. Live-refresh via PR 4's useOrchestrationStore hook so a sibling shell running wizard choice answer … updates the open overlay without close+re-open.
  • Augmented: HeaderBar gets a mode badge.

Prompt UX contract

Every choice prompt renders the full contract:

Field Source
Why-asking Choice.whyAsking
Options + descriptions Choice.options[]
Recommended option Choice.recommendedOptionId
Safe-default option Choice.safeDefaultOptionId
Reversible Choice.reversible
Requires human Choice.requiresHuman
Consequence if skipped Choice.consequenceIfSkipped
Resume command Choice.resumeCommand
Skip safety derived: safe-default + !requires-human + reversible

ChoiceCheckpointBanner renders the full block; the operator overview renders an inline condensed version that still includes every field.

Slash command coherence

New /help command lists every registered command grouped by "available anytime" vs "available before/after a setup run". When a run is active, the second group renames itself "paused while a setup run is active (Ctrl+C to cancel, then retry)" so the user knows exactly why a command can't fire and what to do.

Multi-line command feedback (e.g. /help, /diagnostics) now renders with a hanging indent so it reads as one coherent block, not several disconnected lines.

Render-cost teardown

New useWizardSelector(store, selector, isEqual?) slice hook. Components subscribed to a slice no longer rerender for unrelated store ticks. shallowArrayEqual and shallowObjectEqual exported alongside.

Render-cost benchmark fixture (src/ui/tui/__tests__/render-cost.test.tsx):

Subscriber type 3 task transitions + 5 status bumps
Whole-store 8+ renders
Tasks slice 3 renders
Status slice 5 renders

Slicing cuts each subscriber's render budget by ~60% in this scenario. The infrastructure ships in PR 5; migrating individual subscribers is incremental.

Bugs addressed (with regression-test refs)

  • Mode-badge invisibility for nested-agent invocationsCLAUDECODE=1 / CLAUDE_CODE_ENTRYPOINT now surface as [nested] in the header and in the operator overview. Test: src/ui/tui/utils/__tests__/mode-badge.test.ts, src/ui/tui/screens/__tests__/StatusOverlayScreen.glyphs.test.tsx.
  • Operator overview blocked-vs-running confusionBlocked state now renders the distinct glyph + red color, separate from running + violet. Test: StatusOverlayScreen.glyphs.test.tsx.
  • Multi-line slash feedback wrapping awkwardly — feedback now hanging-indents continuation lines instead of single-line truncation. Covered by console-commands-help.test.ts (text contract) and existing ConsoleView tests.
  • No "what's blocking the run?" headline in the operator overview — added a 1-line summary that resolves to "Waiting on N choices from you" / "Waiting on N manual verifications" / "N primary task(s) in flight" / "Session active — no pending action" / "No active session". Test: StatusOverlayScreen.glyphs.test.tsx::summary headline.

The brief's other bug categories (success-while-pending, prompt re-ask after durable answer, prompts that disappear) are already covered by PR 3's ManualVerificationRibbon integration and the existing ChoiceCheckpoint.test.tsx / OutroScreen.verificationRibbon.test.tsx regression tests, which pass on PR 5.

Tests added

40 new tests over the base 3949 (3989/3989 vitest):

  • 5 — lifecycle-display vocabulary
  • 9 — mode-badge env resolution
  • 6 — /help text generation
  • 5 — HeaderBar mode badge rendering
  • 4 + 3 — useWizardSelector primitives + render-cost ceiling
  • 7 — StatusOverlayScreen glyph palette + summary + mode badge
  • 1 — StatusOverlayScreen Operator Overview reframing (existing test updated)

Backward compatibility

  • All existing slash commands continue to work the same way; /help is additive.
  • /status overlay's data shape is unchanged from PR 3; only the rendering reorganized.
  • --agent, --ci, --json, manifest, plan, apply, verify, MCP server, v: 1 envelope, exit codes — all unchanged.
  • Mode badge is suppressed in plain interactive mode, preserving the prior header look for the most common case.
  • ProgressList still uses a blank gutter for pending rows rather than the canonical glyph (deliberate UX trade-off — see comment in ProgressList.tsx).

Known limitations & follow-ups

  • The render-cost helpers exist (useWizardSelector); migrating every subscriber over is out of scope for PR 5. The infrastructure is in place; the migration is incremental.
  • The operator overview is rendered via the same overlay infrastructure as before; it doesn't yet support keyboard-actionable choice resolution from inside the overlay.

Test plan

  • pnpm exec vitest run --pool=forks --maxWorkers=1 — 277 files / 3989 tests pass
  • pnpm test:bdd — 100/100 scenarios pass
  • pnpm build (TypeScript + smoke test) — clean
  • pnpm lint — clean (only pre-existing warning unchanged)

🤖 Generated with Claude Code


Note

Medium Risk
Adds new orchestration-related CLI commands and MCP tool surfaces plus changes to credential-resolution signaling, which can affect automation and operator workflows if schemas/exit codes drift. Most changes are additive and covered by new schema-validated smoke tests, but they touch user-facing command routing and external contracts.

Overview
Introduces a durable orchestration inspection surface across the CLI and the external MCP server, adding tasks/task/sessions/session/resume/orchestration status plus choice and verification subcommands with documented extended exit codes and JSON envelopes.

Wires a first “beachhead” mirror for agent-mode environment selection into the orchestration store (records an environment_selection choice and marks it answered), and updates credential resolution to return a discriminated outcome that is logged by bin.ts for debuggability.

Expands docs/README with /status and TUI v2/operator concepts, adds shared orchestration CLI helpers (install-dir resolution, JSON error envelopes), and adds extensive new tests (CLI smoke tests against real binary, MCP-server orchestration tool parity, SSE frame suppression, and per-run cache memoization).

Reviewed by Cursor Bugbot for commit 6210b95. Bugbot is set up for automated code reviews on this repo. Configure here.

@kelsonpw kelsonpw requested a review from a team as a code owner May 9, 2026 17:18
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Primary tasks render glyph twice (double glyph)
    • Removed the StateBadge component (which rendered its own glyph) from the primary tasks section and inlined just the label text, matching the pattern used by background tasks to render the glyph exactly once per row.

Create PR

Or push these changes by commenting:

@cursor push 97259f1ad4
Preview (97259f1ad4)
diff --git a/src/ui/tui/screens/StatusOverlayScreen.tsx b/src/ui/tui/screens/StatusOverlayScreen.tsx
--- a/src/ui/tui/screens/StatusOverlayScreen.tsx
+++ b/src/ui/tui/screens/StatusOverlayScreen.tsx
@@ -56,24 +56,13 @@
 import { ChoiceStatus } from '../../../lib/orchestration/checkpoints/choices.js';
 import { VerificationStatus } from '../../../lib/orchestration/checkpoints/verifications.js';
 import { resolveMode } from '../utils/mode-badge.js';
-import {
-  lifecycleDisplay,
-  type LifecycleDisplay,
-} from '../utils/lifecycle-display.js';
+import { lifecycleDisplay } from '../utils/lifecycle-display.js';
 import { TaskLifecycle } from '../../../lib/orchestration/lifecycle.js';
 
 interface StatusOverlayScreenProps {
   store: WizardStore;
 }
 
-/** Compact "glyph + label" badge used in every section. */
-const StateBadge = ({ display }: { display: LifecycleDisplay }) => (
-  <Text color={display.color} bold>
-    {display.glyph}{' '}
-    <Text color={display.color}>{display.label}</Text>
-  </Text>
-);
-
 /**
  * Section header — bold, secondary color, with a count badge.
  * Pulled out so the operator overview's many sections share the same
@@ -262,7 +251,7 @@
                 {display.glyph}{' '}
               </Text>
               <Text color={Colors.body}>
-                <StateBadge display={display} /> — {t.label}
+                <Text color={display.color}>{display.label}</Text> — {t.label}
               </Text>
             </Box>
           );

You can send follow-ups to the cloud agent here.

Comment thread src/ui/tui/screens/StatusOverlayScreen.tsx
@kelsonpw kelsonpw force-pushed the feat/v2-widen-and-retire branch from 19ef9dd to b4d4e59 Compare May 9, 2026 17:23
@kelsonpw kelsonpw force-pushed the feat/v2-tui-redesign branch from ff44bb5 to db7a96c Compare May 9, 2026 17:25
@kelsonpw kelsonpw force-pushed the feat/v2-widen-and-retire branch from 2fb5920 to 5f4273d Compare May 9, 2026 17:45
@kelsonpw kelsonpw force-pushed the feat/v2-tui-redesign branch from bd3bee6 to ca03480 Compare May 9, 2026 17:46
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: shallowObjectEqual fails for objects with different keys
    • Added Object.prototype.hasOwnProperty.call(b, k) check before the value comparison so objects with the same key count but different key names are correctly detected as unequal.

Create PR

Or push these changes by commenting:

@cursor push 931ca6d84a
Preview (931ca6d84a)
diff --git a/src/ui/tui/hooks/useWizardSelector.ts b/src/ui/tui/hooks/useWizardSelector.ts
--- a/src/ui/tui/hooks/useWizardSelector.ts
+++ b/src/ui/tui/hooks/useWizardSelector.ts
@@ -107,7 +107,8 @@
   const bk = Object.keys(b);
   if (ak.length !== bk.length) return false;
   for (const k of ak) {
-    if (!Object.is(a[k], b[k])) return false;
+    if (!Object.prototype.hasOwnProperty.call(b, k) || !Object.is(a[k], b[k]))
+      return false;
   }
   return true;
 }

You can send follow-ups to the cloud agent here.

Comment thread src/ui/tui/hooks/useWizardSelector.ts
…e task label (#695)

PR #688 made the inline status pill flush with the content area, sitting
directly under the Tasks list. Tier 6 of the resolver returns the
in-progress canonical task's `activeForm` — which is the SAME string
ProgressList already renders for that row above the pill. The result
was a visible duplicate: the Tasks list showed `› Detecting project
setup` and the pill below showed `◇ Detecting project setup`.

Fix: in `resolveRunStatusPill`, suppress tier 6 by returning `undefined`
whenever any canonical task is in_progress. Higher-priority tiers (file
writes, tool activity, event-plan-await, currentActivity, post-agent
steps) keep firing because they carry signal the Tasks list does NOT
show. Tier 7 (`pushStatus` cold-start fallback) is also skipped while a
canonical task is in_progress to prevent stale narration leaking in
once tier 6 stops covering for it.

Tests: pinned the new contract in `run-status-pill.test.ts` (suppression,
no over-suppression of tiers 1-5, the screenshot scenario, and the
"no tier 7 leak" guard), updated `RunScreen.statusPill.test.tsx` and
`RunScreen.spacing.test.tsx` to match.
@kelsonpw kelsonpw force-pushed the feat/v2-tui-redesign branch from a45ea7d to 4551d44 Compare May 9, 2026 20:05
* perf(build): bundle wizard via tsup for faster cold-start

Replaces the per-file `tsc` JS emit with a single tsup-driven bundle so
cold-start parses one file (`dist/bin.js`) instead of resolving and
loading 343 individual modules from `dist/src/`. Type declarations are
still emitted via a separate `tsc --emitDeclarationOnly` pass so the
package's `.d.ts` surface is unchanged.

Lazy-loads the heaviest externals on the cold-start path:
  - axios in src/utils/urls.ts (only used in detectRegionFromToken)
  - axios in src/utils/oauth.ts (only used in OAuth exchanges)
  - axios + apiClient in src/lib/api.ts (cached promise, first GraphQL
    call pays the import cost)
  - fast-glob in src/utils/environment.ts (only used in
    detectEnvVarPrefix during framework detection)

Profile-instrumented numbers: cumulative require time drops from
~1.5 s / 1034 calls to ~0.25 s / 626 calls. Wall-clock --version
median drops 260ms -> 250ms on a fast Mac (Node startup is the
fixed-cost floor); savings are larger on slower hardware where IO
dominates.

Smoke tests cover --version, status --json, and mcp serve against
the bundled artifact so regressions in the publish path are caught
in vitest.

Build is deterministic — two consecutive `pnpm build:bundle` runs
produce byte-identical bin.js and bin.js.map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bugbot): clear cached lazy-import promises on rejection

The new lazy-load patterns for `axiosModulePromise`/`apiClientPromise`
(in `lib/api.ts`), `axiosPromise` (in `utils/urls.ts`), and `fgPromise`
(in `utils/environment.ts`) cached the dynamic-import promise via `??=`
but never cleared the cache on rejection. A transient `import()` failure
(broken install, partial filesystem, transient I/O) would poison every
subsequent caller in the process with the same stale rejection forever.

Switch to the same null-on-catch pattern already used by
`loadDefaultDriver` in `agent-driver.ts`: store the cached promise, wire
a `.catch()` that nulls the cache and re-throws so callers still see the
original error, and let the next call retry the import cleanly.

* test(bugbot): pin lazy-import rejection-clearing contract on detectRegionFromToken

Add a regression test that mocks axios's `.default` getter to throw on
the first call and asserts `detectRegionFromToken` re-attempts the
import after a working axios is doMock'd in. Without the
rejection-clearing branch in `loadAxios`, the second call would replay
the cached rejection instead of returning a region.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw kelsonpw force-pushed the feat/v2-tui-redesign branch from 4551d44 to 8a800c0 Compare May 9, 2026 20:36
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: StatusOverlayScreen shows mode badge in interactive mode
    • Added a showBadge guard (mode.key !== 'interactive') to conditionally render the mode badge in StatusOverlayScreen, matching the existing suppression logic in HeaderBar.

Create PR

Or push these changes by commenting:

@cursor push a111fe2fd9
Preview (a111fe2fd9)
diff --git a/src/ui/tui/screens/StatusOverlayScreen.tsx b/src/ui/tui/screens/StatusOverlayScreen.tsx
--- a/src/ui/tui/screens/StatusOverlayScreen.tsx
+++ b/src/ui/tui/screens/StatusOverlayScreen.tsx
@@ -157,6 +157,7 @@
 
   const lsp = data.status.lastStoppingPoint;
   const mode = resolveMode();
+  const showBadge = mode.key !== 'interactive';
 
   // Split active tasks into "primary" (running/waiting/blocked) and
   // "background" (everything else among the active set — supervisor's
@@ -205,10 +206,14 @@
           <Text bold color={Colors.accent}>
             {Icons.diamond} Operator overview
           </Text>
-          <Text color={Colors.subtle}> {Icons.dot} </Text>
-          <Text color={mode.color} bold>
-            [{mode.label}]
-          </Text>
+          {showBadge && (
+            <>
+              <Text color={Colors.subtle}> {Icons.dot} </Text>
+              <Text color={mode.color} bold>
+                [{mode.label}]
+              </Text>
+            </>
+          )}
         </Box>
         <Text color={Colors.body}>{summary}</Text>
         <Text color={Colors.muted}>

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 8a800c0. Configure here.

Comment thread src/ui/tui/screens/StatusOverlayScreen.tsx
kelsonpw and others added 17 commits May 9, 2026 13:40
…d the revised plan renders verbatim (#701)

The user reported typing "lowercase the event names" on the event-plan
approval screen — but the revised plan came back with the same TitleCase
names. Two root causes:

1. `confirm_event_plan` always force-Title-Cased every name via
   `normalizeEventName`. When the LLM dutifully revised with
   "user signed up", the wizard slammed it back to "User Signed Up"
   before the user ever saw the revised plan. The same Title-Cased
   names were then persisted to `.amplitude/events.json` and shipped
   into the eventual `track()` calls — exactly the display-vs-implementation
   drift the user noticed ("when they get implemented they don't show
   the same as this").

2. After Enter, EventPlanFullScreen vanished and RunScreen rendered as
   if nothing had happened. No "Revising plan with your feedback…"
   beat, so the user concluded their note was dropped on the floor.

Fix:
- `normalizeEventName` is now non-destructive on already-multi-word
  inputs (lowercase, UPPERCASE, Sentence case all pass through). It
  still repairs schema-violating shapes (snake_case, kebab-case,
  camelCase, dotted, single-token).
- Tool schema description and `confirm-event-plan-contract.md` now
  say "default to Title Case, but honor explicit user feedback that
  asks for a different casing convention".
- New `revisingEventPlan` flag on `WizardStore`, set by
  `resolveEventPlan({decision:'revised'})` and cleared by the next
  `promptEventPlan`. New tier 3b in `resolveRunStatusPill` surfaces
  "Revising plan with your feedback…" until the LLM lands the next
  prompt.

Tests:
- `normalizeEventName` — preserves intentional lowercase/UPPERCASE/Sentence
  case on already-multi-word inputs; still repairs snake/kebab/camel/dotted.
- `confirm_event_plan feedback round-trip` — feedback string returned
  verbatim to the agent ("feedback: lowercase the event names"); LLM
  payloads pass through to `promptEventPlan` and to `events.json`
  without re-casing.
- `run-status-pill` tier 3b — "Revising plan with your feedback…" lights
  up after revised, clears on next prompt, doesn't fire on approve/skip.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…assification of mid-stream 400s (#705)

The `[legacy] DEBUG Agent result with error: API Error: 400 ...` log
line was dumping the full failing SSE response body — hundreds of
`event: message_start` / `data: {"type":"content_block_delta",...}`
framing lines plus `partial_json` `tool_use` deltas — into the
user-visible TUI Logs tab when the Anthropic gateway terminated a
streaming response with a 4xx (most commonly `400 terminated`
mid-stream; Sentry #7442894144).

Two leaks: the SDK Result-message branch in `agent-interface.ts`
called `logToFile('Agent result with error:', message.result)` with
no truncation, AND `agent-runner.ts`'s `abortOnApiError` /
`GATEWAY_DOWN` / soft-error branches interpolated the raw `rawMessage`
straight into user-facing copy and Sentry context. Both surfaced the
SSE body verbatim — past sessions surfaced 50KB+ `log.message`
strings polluting orchestrator context.

Add `suppressSseFrames(message)` and `sanitizeErrorMessageForLog`
helpers to `agent-events.ts` that:
  - detect runs of Anthropic SSE protocol frames (event:/data:/bare-JSON
    forms for the eight known stream-event subtypes)
  - collapse each run into a single `[N SSE frames suppressed]` marker
  - cap the result at `MAX_LOG_MESSAGE_LENGTH` (existing 2KB budget)
  - preserve any non-frame content (real errors / stack traces riding
    alongside the protocol noise survive — same defense as the
    existing `stripStreamEventNoise` / `partitionHookBridgeRace` pair)

Apply the sanitizer at every callsite that logs / interpolates an
agent error string: the two `logToFile('Agent result with error:',
message.result)` paths in `agent-interface.ts`, and the GATEWAY_DOWN /
GATEWAY_INVALID_REQUEST / API_ERROR / RATE_LIMIT branches plus the
soft-error pushStatus path in `agent-runner.ts`. Classification still
runs against the raw form (the `400 terminated` regex matches the head
of the message, not the SSE body) — only logging / user-surface paths
take the sanitized form.

Tests: `agent-events-sse-suppression.test.ts` (10 cases — fast-path
no-op, contiguous-block collapse, inline-prefix split, real-error
preservation, singular vs plural wording, bare-JSON form, unknown
event-type passthrough, oversized-input pipeline) covers the matcher
end-to-end and the truncation cap.

Verdict: pre-existing leak in `agent-interface.ts:4220` going back to
the original SDK Result handler; unrelated to #698 (which only adds
TUI per-event status; doesn't touch error logging).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nv-selection instead of silent defer (#703)

* fix(self-heal): unstall second run after .amplitude/ wipe — surface env-selection instead of silent defer

After `git reset --hard` wipes `<installDir>/.amplitude/project-binding.json`,
the second wizard run gets to credential resolution, finds 2 environments via
`fetchAmplitudeUser`, and "defers" — populates `pendingOrgs` and returns
`Promise<void>`. To any non-TUI caller (or anyone tailing the per-project
log) the void return is indistinguishable from "credentials ready"; the
wizard parks at "Detecting your project setup" with no diagnostic and no
in-band signal a downstream surface can branch on to route the user to the
env picker / emit `auth_required: env_selection_failed`.

Make the resolver return a discriminated `ResolveCredentialsResult` so
the deferred path is load-bearing instead of silent:

  - 'resolved'             — credentials populated.
  - 'needs_user_choice'    — pendingOrgs set; carries kind +
                             envsWithKey count so the agent-mode
                             rejection envelope can quote it.
  - 'api_key_notice'       — fetch succeeded but no envs had keys.
  - 'unauthenticated'      — no usable token; caller routes to fresh OAuth.
  - 'ci_env_token'         — WIZARD_OAUTH_TOKEN env-var path won.

The TUI bin path now logs the outcome at INFO so a tail of `log.txt`
shows a concrete reason ("needs_user_choice / environment_selection /
envsWithKey=2") instead of just the previous deferring log line.
Existing callers in `commands/helpers.ts` (CI / agent path) and
`commands/dashboard.ts` ignore the return value — they already branch
on `session.credentials` / `session.pendingOrgs`, so behaviour is
unchanged for them; the new contract is purely additive.

Regression tests in `credential-resolution.test.ts`:
  - Multi-env defer scenario races resolveCredentials against a
    1s timeout to prove no hang, asserts the
    `'needs_user_choice'` outcome with envsWithKey=2, and locks
    down that pendingOrgs + pending tokens are populated for the
    env picker. Pre-fix the void return would have been
    `undefined` and the await would never have surfaced the
    deferred state explicitly.
  - Sanity siblings cover the 'resolved' (cached API key) and
    'unauthenticated' (no stored token) and 'api_key_notice'
    (admin-only project) outcomes.

Updated the cli.test.ts mock to return
`{ outcome: 'unauthenticated' }` (the closest analogue to the
pre-fix `undefined`) so 17 test-timeout regressions on the
TUI-auth-task / feature-discovery suites stay green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: stamp deferredEnvCount on filter-mismatch path to avoid unauthenticated misclassification

Applied via @cursor push command

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
…/resume commands (PR 1 of 3)

Introduce src/lib/orchestration/ — a durable, file-backed orchestration
store that becomes the source of truth for sessions, tasks, subagents,
ownership, and last-stopping-point. Adds six new read-only CLI commands
(tasks/task/sessions/session/resume/orchestration status), each emitting
Zod-validated JSON envelopes for outer agents.

Foundation only. Legacy WizardSession remains the live in-memory
surface; PR 2 wires checkpoints + MCP-app lifecycle, PR 3 retires
duplicate state and ships the TUI redesign + MCP-server tool parity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolveCommonOpts now passes argv.installDir through resolveInstallDir
so quoted/env-sourced `~` actually expands instead of being treated
as a literal directory name.

The resume --execute path now imports spawn from utils/cross-platform-
spawn so the npm-installed `amplitude-wizard` .cmd shim resolves on
Windows. Node's built-in spawn does not consult PATHEXT and would fail
with ENOENT for every Windows user invoking `wizard resume --execute`.
`wizard resume <session-id>` validated the requested session existed but
then called `computeLastStoppingPoint(installDir)` without scoping, which
always derived its next action from the *most recently created active
session* — not the one the user asked about. The envelope's `sessionId`
field still echoed the requested ID, so the command/description shown
could describe a different session entirely.

`computeLastStoppingPoint` now accepts an optional `sessionId` that
restricts both session metadata and task buckets to that session. The
resume command threads the resolved session id through, and an added
test pins the scoping behavior against a two-session fixture.
…kpoints + MCP-app lifecycle

Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of
the v2 orchestration foundation:

- Choice — typed user-choice records with stable promptId for de-dup,
  requiresHuman automation gate, and full status transitions
  (pending → answered/expired/cancelled/superseded).
- Verification — manual out-of-band verification records with
  status transitions (pending → passed/failed/skipped, skipped/failed
  may recover to passed; passed/skipped/failed may supersede).
- McpAppCapability — durable lifecycle for every MCP-app capability
  with an anti-nag invariant: install_skipped → needs_user_choice
  REQUIRES a non-empty lastStateChangeReason.

New CLI commands:
- wizard choice list / show / answer (with --confirm-human gate)
- wizard verification list / show / mark

Wires last-stopping-point's pendingChoices / pendingMcpActions /
pendingManualVerifications arrays to read real records (was [] in PR 1).
Two callsites instrumented as the PR 2 wiring beachhead:
- env-selection in src/commands/helpers.ts (Choice mirror + answer)
- event-plan-approval in src/lib/wizard-tools.ts (Verification mirror)

Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI.
No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The validation array did not include 'all', so passing the documented
'--status all' opt-out exited with INVALID_ARGS before reaching the
'statusRaw === all' branch on line 132. Mirror the verification list
guard: skip enum validation when statusRaw === 'all' and skip the cast
so it stays undefined for the listChoices call.
…sionAt

`TERMINAL_VERIFICATION_STATUSES` claimed `Failed` and `Skipped` were
terminal, but the allowed-transitions table explicitly permits
`Failed → Passed | Superseded` and `Skipped → Passed | Failed |
Superseded`. That contradicts the Choice convention (terminal = no
forward transitions other than re-supersede) and `last-stopping-point`
already treats `Failed` as actionable via `pendingManualVerifications`.
Reduce the terminal set to `Passed` and `Superseded`.

`transitionMcpCapability` set `userDecision = 'pending'` on a transition
back to `NeedsUserChoice` but left a stale `userDecisionAt` from the
previous installed/skipped state. Consumers checking
`userDecisionAt !== null` would incorrectly conclude a decision had been
made. Null it out alongside the pending decision.
- `resume --execute` now attaches `child.on('error', …)` before `exit`.
  Previously a synchronous spawn failure (ENOENT, EACCES, missing PATH
  entry on Windows) fired an unhandled `error` event, which Node's
  EventEmitter rethrows — crashing the CLI with a stack trace instead
  of producing a clean message + GENERAL_ERROR exit.
- `saveStore` was calling `ensureDir(dirname(path))` and then
  `ensureDir(getRunDir(installDir))` — both resolve to the same run
  directory because `getOrchestrationStoreFile()` is defined as
  `join(getRunDir(installDir), 'orchestration.json')`. Drop the second
  call and the now-unused `getRunDir` import.
- `computeLastStoppingPoint` already filtered tasks by `options.sessionId`
  but read the full unfiltered `file.choices` / `file.mcpCapabilities` /
  `file.verifications` arrays. `wizard resume <session-id>` could surface
  pending checkpoints belonging to a different (more recently active)
  session, producing a misleading `nextAction`. Filter each by the
  session-link field on the record (`linkedSessionId` for choices and
  MCP capabilities, `blockingSessionId` for verifications) so all four
  buckets stay consistent with the requested session.
- `choice.ts` and `verification.ts` had inline `resolveCommonOpts` /
  `emitJson` / `emitJsonError` that omitted the `resolveInstallDir`
  call done correctly in `orchestration.ts`. A user passing
  `--install-dir ~/myapp` would resolve to `<cwd>/~/myapp` instead of
  the home-relative path, silently writing to the wrong store. Extract
  the helpers to a shared `orchestration-common.ts` and switch all
  three command modules to it so the `resolveInstallDir` fix applies
  uniformly and future drift is impossible.
`deriveNextAction` builds an `inspect_failure` next-action when the
most recent task has stopped. The structured `command` array uses
the configurable `cliPrefix` (sourced from `args.invocation`, which
flows from `options.cliInvocation` on `computeLastStoppingPoint`),
but the inline shell hint embedded in `description` was templating
the hardcoded module-level `CLI_INVOCATION` constant. A custom
invocation (e.g. an alternate `wizard` symlink, or a test harness
overriding the binary name) would surface a description that says
\`amplitude-wizard task <id>\` while the JSON payload's `command`
points at the configured executable. Use `cliPrefix.join(' ')` for
both so the human and machine views always agree.
`resumeCommand` is the human-facing copy-pasteable form of
`nextAction.command`. It was built with `nextAction.command.join(' ')`,
which silently corrupts paths containing spaces (e.g. an `installDir`
of `/Users/me/my project` would land in the shell as two separate
words). The structured `command` array stayed correct, but the string
the user is invited to paste into a terminal would fail.

Add a small `shellJoin` / `shellQuote` helper that wraps tokens with
shell metacharacters or whitespace in single quotes (with the standard
`'\''` close/escape/reopen dance for embedded single quotes). Tokens
that are already shell-safe stay unquoted so the common case stays
readable.
`HeaderBar` already gates the mode badge on `resolved.key !==
'interactive'`, so the default interactive run never sees a stray
`[interactive]` chip. `StatusOverlayScreen` rendered the badge
unconditionally in its header, so opening `/status` during a normal
interactive run printed `[interactive]` next to "Operator overview"
— which the brief explicitly says is noise. Mirror HeaderBar's gate
and add tests covering both branches (suppressed in interactive,
visible in agent mode).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw kelsonpw force-pushed the feat/v2-tui-redesign branch from 781c24b to 2c93edd Compare May 10, 2026 14:25
The Progress tab on `feat/v2-tui-redesign` used a single-line
`wrap="truncate-end"` row to display every planned event name
joined by commas:

  ◆ Events: Nf User Signed Up, Nf User Signed In, Nf Use…

Once the agent fills in 10+ events, the line truncates to "…"
even though the screen has a full column of empty rows below it
— the active task list collapses to ~5 lines once Wiring is the
focused step. The user can't audit the plan they just approved
without flipping to /events.

Render one event per row with name + description, soft-wrapped.
The bullet (`·`) lines up with the existing diamond glyph
column. The `(N events)` count gives an at-a-glance scale check.

No data-model change — `PlannedEvent` is still `{name, description}`.
Per-event lifecycle status (queued → in_progress → done) is the
shape of #698 against `main` and is a separate change.

27 RunScreen tests still pass; no test asserted on the comma-join
shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw and others added 2 commits May 10, 2026 07:48
The previous ReportViewer wrapped each visible line in
`<Text wrap="truncate">`, which silently dropped the right edge of any
line wider than the content area — events-table rows and code blocks
with long paths or JSON payloads got a stray "…" decoration and the
user had no way to read what was clipped.

Add ANSI-aware horizontal panning so the user can shift the visible
window left/right with `h`/`l` (or arrow keys), keep colour codes
intact across the slice, and surface the full LogViewer-style key
hint footer (↑↓/jk scroll · h/l pan · g/G top/bottom · 0 reset · Esc
close). The pan offset is clamped to the widest line so users can't
scroll into empty whitespace.

Also include a regression test covering: (a) horizontal pan offset
shifts visible content, (b) lines wider than the content area are no
longer clipped at the right edge, (c) the key-hint footer surfaces
all documented controls, and (d) `0` resets the pan offset.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…added/removed (#707)

The event-plan approval flow used to feel like list replacement. Each
time the user pressed `[F] give feedback`, the agent's revised plan
silently overwrote the prior list — the user lost the context of what
they had just said and what the AI changed in response. The recent
"keep event names snake cased and prefixed" feedback round made this
visible: the AI applied the prefix but the user couldn't tell from the
new screen alone whether snake_case had also landed (it had not — the
display normalizer was Title-Casing — but the lack of an explicit
"AI: revised plan +N -M" signal meant nothing about the conversation
was legible).

Add a round-history layer so the screen can render the back-and-forth:

* Store now keeps `EventPlanRound[]`, one entry per `promptEventPlan`
  call. Each round carries the AI's plan + the user feedback (if any)
  that produced it. Cleared on `approved` / `skipped`; persists across
  `revised` rounds.
* `pendingPlanFeedback` (instance field) buffers feedback typed in one
  `[F]` decision and pairs it with the next `promptEventPlan` from the
  agent. Single-pair carry — no leakage across runs.
* EventPlanFullScreen renders a conversational header on rounds ≥ 2:
  - "You: <quoted feedback>"
  - "AI: revised plan +N added · −M removed" (with green/red counts)
* Per-row diff markers when a prior round exists:
  - `+` (green) for events new to this round
  - `−` (red, struck-through) for events the AI dropped
  - bullet (`·`) for unchanged events
  Diff is by name (description regen on revision is expected).

Round 1 still renders the original "Suggested events for your app"
title — the convo affordance only appears once there's actually a
conversation to render.

Tests:
* 3 new cases in EventPlanFullScreen.test.tsx — round-2 quote+delta
  rendering, round-1 fallback, history clear on approve.
* All 174 existing store tests + 6 existing screen tests stay green.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw
Copy link
Copy Markdown
Member Author

V2 PR 5 (full TUI v2 screen-tree redesign) — replaced by V3 polish work landed and pending across the TUI v3 chain. Closing per direction change. Audit by subagent a209c551541d85df3.

@kelsonpw kelsonpw closed this May 13, 2026
This was referenced May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant