feat(tui): v3 foundation — low-risk consensus wins from 5-subagent audit#713
Closed
kelsonpw wants to merge 45 commits into
Closed
feat(tui): v3 foundation — low-risk consensus wins from 5-subagent audit#713kelsonpw wants to merge 45 commits into
kelsonpw wants to merge 45 commits into
Conversation
…/resume commands (PR 1 of 3) Introduce src/lib/orchestration/ — a durable, file-backed orchestration store that becomes the source of truth for sessions, tasks, subagents, ownership, and last-stopping-point. Adds six new read-only CLI commands (tasks/task/sessions/session/resume/orchestration status), each emitting Zod-validated JSON envelopes for outer agents. Foundation only. Legacy WizardSession remains the live in-memory surface; PR 2 wires checkpoints + MCP-app lifecycle, PR 3 retires duplicate state and ships the TUI redesign + MCP-server tool parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolveCommonOpts now passes argv.installDir through resolveInstallDir so quoted/env-sourced `~` actually expands instead of being treated as a literal directory name. The resume --execute path now imports spawn from utils/cross-platform- spawn so the npm-installed `amplitude-wizard` .cmd shim resolves on Windows. Node's built-in spawn does not consult PATHEXT and would fail with ENOENT for every Windows user invoking `wizard resume --execute`.
`wizard resume <session-id>` validated the requested session existed but then called `computeLastStoppingPoint(installDir)` without scoping, which always derived its next action from the *most recently created active session* — not the one the user asked about. The envelope's `sessionId` field still echoed the requested ID, so the command/description shown could describe a different session entirely. `computeLastStoppingPoint` now accepts an optional `sessionId` that restricts both session metadata and task buckets to that session. The resume command threads the resolved session id through, and an added test pins the scoping behavior against a two-session fixture.
…kpoints + MCP-app lifecycle Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of the v2 orchestration foundation: - Choice — typed user-choice records with stable promptId for de-dup, requiresHuman automation gate, and full status transitions (pending → answered/expired/cancelled/superseded). - Verification — manual out-of-band verification records with status transitions (pending → passed/failed/skipped, skipped/failed may recover to passed; passed/skipped/failed may supersede). - McpAppCapability — durable lifecycle for every MCP-app capability with an anti-nag invariant: install_skipped → needs_user_choice REQUIRES a non-empty lastStateChangeReason. New CLI commands: - wizard choice list / show / answer (with --confirm-human gate) - wizard verification list / show / mark Wires last-stopping-point's pendingChoices / pendingMcpActions / pendingManualVerifications arrays to read real records (was [] in PR 1). Two callsites instrumented as the PR 2 wiring beachhead: - env-selection in src/commands/helpers.ts (Choice mirror + answer) - event-plan-approval in src/lib/wizard-tools.ts (Verification mirror) Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI. No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The validation array did not include 'all', so passing the documented '--status all' opt-out exited with INVALID_ARGS before reaching the 'statusRaw === all' branch on line 132. Mirror the verification list guard: skip enum validation when statusRaw === 'all' and skip the cast so it stays undefined for the listChoices call.
…sionAt `TERMINAL_VERIFICATION_STATUSES` claimed `Failed` and `Skipped` were terminal, but the allowed-transitions table explicitly permits `Failed → Passed | Superseded` and `Skipped → Passed | Failed | Superseded`. That contradicts the Choice convention (terminal = no forward transitions other than re-supersede) and `last-stopping-point` already treats `Failed` as actionable via `pendingManualVerifications`. Reduce the terminal set to `Passed` and `Superseded`. `transitionMcpCapability` set `userDecision = 'pending'` on a transition back to `NeedsUserChoice` but left a stale `userDecisionAt` from the previous installed/skipped state. Consumers checking `userDecisionAt !== null` would incorrectly conclude a decision had been made. Null it out alongside the pending decision.
- `resume --execute` now attaches `child.on('error', …)` before `exit`.
Previously a synchronous spawn failure (ENOENT, EACCES, missing PATH
entry on Windows) fired an unhandled `error` event, which Node's
EventEmitter rethrows — crashing the CLI with a stack trace instead
of producing a clean message + GENERAL_ERROR exit.
- `saveStore` was calling `ensureDir(dirname(path))` and then
`ensureDir(getRunDir(installDir))` — both resolve to the same run
directory because `getOrchestrationStoreFile()` is defined as
`join(getRunDir(installDir), 'orchestration.json')`. Drop the second
call and the now-unused `getRunDir` import.
- `computeLastStoppingPoint` already filtered tasks by `options.sessionId` but read the full unfiltered `file.choices` / `file.mcpCapabilities` / `file.verifications` arrays. `wizard resume <session-id>` could surface pending checkpoints belonging to a different (more recently active) session, producing a misleading `nextAction`. Filter each by the session-link field on the record (`linkedSessionId` for choices and MCP capabilities, `blockingSessionId` for verifications) so all four buckets stay consistent with the requested session. - `choice.ts` and `verification.ts` had inline `resolveCommonOpts` / `emitJson` / `emitJsonError` that omitted the `resolveInstallDir` call done correctly in `orchestration.ts`. A user passing `--install-dir ~/myapp` would resolve to `<cwd>/~/myapp` instead of the home-relative path, silently writing to the wrong store. Extract the helpers to a shared `orchestration-common.ts` and switch all three command modules to it so the `resolveInstallDir` fix applies uniformly and future drift is impossible.
`deriveNextAction` builds an `inspect_failure` next-action when the
most recent task has stopped. The structured `command` array uses
the configurable `cliPrefix` (sourced from `args.invocation`, which
flows from `options.cliInvocation` on `computeLastStoppingPoint`),
but the inline shell hint embedded in `description` was templating
the hardcoded module-level `CLI_INVOCATION` constant. A custom
invocation (e.g. an alternate `wizard` symlink, or a test harness
overriding the binary name) would surface a description that says
\`amplitude-wizard task <id>\` while the JSON payload's `command`
points at the configured executable. Use `cliPrefix.join(' ')` for
both so the human and machine views always agree.
`resumeCommand` is the human-facing copy-pasteable form of
`nextAction.command`. It was built with `nextAction.command.join(' ')`,
which silently corrupts paths containing spaces (e.g. an `installDir`
of `/Users/me/my project` would land in the shell as two separate
words). The structured `command` array stayed correct, but the string
the user is invited to paste into a terminal would fail.
Add a small `shellJoin` / `shellQuote` helper that wraps tokens with
shell metacharacters or whitespace in single quotes (with the standard
`'\''` close/escape/reopen dance for embedded single quotes). Tokens
that are already shell-safe stay unquoted so the common case stays
readable.
…estration commands The structured `lsp.nextAction.command` array is correct, but joining it with ` ` for the human-facing `Resume:` line splits paths-with-spaces into multiple shell words. Switch every human-display callsite to `lsp.resumeCommand`, which already shellQuotes via `shellJoin`. Mirrors the fix that was applied to the LSP envelope, but covers the remaining `wizard resume` and `wizard orchestration status` print paths.
…ty + perf hot-paths + resilience Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2. PR 3 lands the state-driven foundation that the broader v2 TUI redesign will sit on. Five concerns, all additive — every PR 1 + PR 2 surface keeps working unchanged. A. TUI v2 wiring — `/status` overlay renders the same data `wizard orchestration status --json` emits, sectioned for human reading. ManualVerificationRibbon mounts on OutroScreen so success- looking UI cannot appear while a verification is pending. ChoiceCheckpointBanner is a reusable primitive for surfacing typed Choice records with the full UX contract (why-asking, recommended, safe-default, reversibility, consequence-if-skipped). B. MCP-server tool parity — every read-only orchestration CLI command now has a matching MCP tool. Both surfaces call into the same builders in `src/lib/orchestration/envelopes.ts`, so output is byte-for-byte identical (modulo `generatedAt`). Server stays read-only by design — mutators stay on the CLI. C. Perf hot-paths — `withReadCache(fn)` amortises store reads across builders inside one command/tool invocation. `per-run-cache.ts` memoises repeated `gh pr view` / MCP-availability calls within a single run. D. Bugs found and fixed — - success-looking UI while blocked on a verification → ribbon - choices asked again after a durable answer → addChoice de-dup (covered in PR 2; regression test added) - skipped MCP apps not remembered → covered by anti-nag invariant (PR 2; surfaced via /status) Background agents continuing after cancellation: out of scope — call out as known limitation. E. Resilience — token-expired-during-long-task. agent-runner's AUTH_ERROR branch now mirrors the K/R question to a durable Choice (kind=keep_or_revert_files) plus a manual_pr_test Verification. `wizard status --json` thereafter shows `nextAction.kind === 'await_user_choice'`. F. Tests — 40+ new tests: - envelope schema parity (CLI ↔ MCP tool) - StatusOverlay rendering all sections - ChoiceCheckpoint UX contract (every required field surfaced) - OutroScreen verification ribbon regression - per-run-cache (memoize / memoizeAsync / invalidate) - auth-error resilience (Choice + Verification + LSP shape) - perf-status-cold (internal-cold-start bound) All 3919 unit tests pass; 100/100 BDD scenarios pass. G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI integration model, envelopes layer, MCP tool parity table, perf measurements, resilience flow). New `docs/agent-consumability.md` covers CLI / MCP / NDJSON consumption with worked examples (Claude Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated. Out of scope (future PRs): - Full TUI screen-tree redesign / information-architecture refactor. - Widening the Choice/Verification wiring beachhead beyond env-selection + event-plan-approval. - Retiring legacy `WizardSession`. - esbuild-bundled CLI for sub-200ms cold-start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bol instance Three bugbot findings on PR 3: 1. Auth-error mirror addVerification was not deduplicated against duplicate AUTH_ERROR fires, mirroring the existing addChoice findPendingChoice guard. Add a listVerifications(kind+session+ pending) check and a regression test pinning verification idempotency on the second mirror call. 2. envelopes.ts symKey used Symbol.prototype.toString which returns 'Symbol(<description>)' — NOT unique per instance. Replace the string-keyed read-cache with a Map<symbol, Map<installDir, file>> keyed by the actual Symbol object. The withReadCache cleanup collapses to a single readCache.delete(key); nested or async scopes no longer alias to the same key. 3. buildStatusEnvelope did not forward cacheKey, so the /status overlay triggered an extra independent store.read() per render. Forward the cacheKey through readStoreCached and pass the cached file into computeLastStoppingPoint via a new optional storeFile override. StatusOverlayScreen now passes cacheKey through to buildStatusEnvelope.
…reads
`buildLastStoppingPointEnvelope` and `buildResumeEnvelope` accepted
`BuilderOpts.cacheKey` but called `computeLastStoppingPoint` without
threading the cached snapshot through, so callers that passed
`cacheKey` expecting `withReadCache`-shared semantics silently
triggered an independent disk read. Both builders now route through
`readStoreCached` and pass `{ storeFile }` into
`computeLastStoppingPoint`.
The `status` and `tasks` command handlers were also computing
`computeLastStoppingPoint(installDir)` and `store.listTasks(...)`
unconditionally before branching on `--json`, even though those values
were only used in the human path. Move the precomputation into the
non-JSON branch so the JSON hot path stays read-once via the envelope
builder. Added an envelopes test that spies on `store.read()` and
asserts both LSP and resume builders incur zero extra reads inside a
`withReadCache` scope.
`buildResumeEnvelope` previously called
`computeLastStoppingPoint(opts.installDir, { storeFile: file })` without
forwarding `opts.sessionId`. The CLI's human-readable resume path
correctly scopes the LSP to the requested session, but the JSON path
delegated to this builder and computed the LSP for the most-recently
active session instead — so `wizard resume <session-id> --json` could
emit the wrong `command` / `description` when multiple sessions
existed and the requested one wasn't the most recent. Pass `sessionId`
through so both paths resolve identically.
…cation list `wizard choice list` and `wizard verification list` were calling `store.listChoices(...)` / `store.listVerifications(...)` before the `if (opts.jsonOutput)` branch. The JSON path then re-reads the same data internally via `buildChoicesEnvelope` / `buildVerificationsEnvelope`, so the first read was wasted on a hot agent path that's expected to stay tight. Move the local list call into the human-readable `else` branch — mirrors the same fix already applied to `status` and `tasks`.
The `resume` command computed `lsp = computeLastStoppingPoint(...)` unconditionally before the JSON / human branch split, then `buildResumeEnvelope` called `computeLastStoppingPoint` again internally — duplicating both the LSP traversal and the underlying store read on the JSON hot path. The same pattern was already fixed for `status` and `tasks` (and now `choice list` / `verification list`) but the `resume` handler was missed. Wrap the LSP lookup in an `ensureResumeCommand()` lazy accessor so the JSON path takes a single read (via `buildResumeEnvelope`), the human path computes the LSP once for the printed command, and `--execute` reuses whichever was already computed.
…awn for gh
- `wizard sessions` was calling `store.listSessions()` before the JSON
branch even though `buildSessionsEnvelope` re-reads internally; same
with `wizard sessions show <id>` which pre-computed
`store.listTasks({ sessionId })`. Move both into the human `else`
branch — completes the "skip pre-read on JSON path" sweep across
every list/show handler in this file.
- `runGh` in `per-run-cache.ts` imported `spawn` directly from
`node:child_process`. On Windows, `gh` installed via Scoop or
Chocolatey is exposed as a `.cmd` shim, which Node's built-in spawn
refuses to resolve (it doesn't consult PATHEXT). Switch to the
project's existing `cross-platform-spawn` wrapper — drop-in via
the typed re-export, no other call-site changes needed.
…dundancy + supervisor + live status refresh (stacks on #691) Stacks on #691 → #690 → #689. Merge after PRs 1+2+3. ## Summary - **Beachhead widening**: centralized `record*Choice` / `record*Verification` helpers in `src/lib/orchestration/wiring.ts` and wired them through every major user-choice and manual-verification surface in the wizard (MCP install, Slack, region select, OAuth browser login, project creation, dashboard setup, event-plan revision, logout). Existing TUI screens / agent prompts continue to drive the user-facing flow; the orchestration store mirror is ADDITIVE so outer agents inspecting `wizard status --json` see typed records. Mirror failures swallow + log so they NEVER break the user-facing path. - **WizardSession boundary**: docblock at the top of `wizard-session.ts` now spells out the contract — `WizardSession` = transient TUI display state; `OrchestrationStore` = durable orchestration state; never duplicate fields between them. Audit table in `docs/orchestration.md` (PR 4 section) walks every field. PR 4 deletes zero fields by design — the redundant *concept* (Subagent / Task / Ownership double-bookkeeping) was already avoided in PR 1; PR 4 cements the contract for PR 5's screen-tree redesign. - **Background-agent supervision**: new `Supervisor` class in `src/lib/orchestration/supervisor.ts`. Tracks subprocess PIDs that map to `Subagent` rows, writes `<runDir>/heartbeats/<pid>.txt` every 5s, SIGTERMs on SIGINT/SIGTERM (with 5s grace before SIGKILL), reaps stale heartbeats (>30s old + PID gone) by transitioning the rooted Task to `cancelled`. Startup recovery transitions orphaned-but-running Tasks to `failed: 'process gone'`. Eliminates the "stopped agents shown as running" drift. - **Live `/status` refresh**: new `watchOrchestrationStore` (debounced 200ms, watches the parent dir to survive `atomicWriteJSON`'s rename) + `useOrchestrationStore` React hook. `StatusOverlayScreen` plumbs the hook in so the overlay re-renders when a sibling shell mutates the store via `wizard choice answer`, `wizard verification mark`, etc. ## Tests +30 tests (3919 → 3949 vitest, 100/100 BDD): - 20 wiring tests (each `record*` helper, dedup invariant, answerByPromptId, anti-nag re-record, verification mark-passed contract) - 5 supervisor tests (track + heartbeat write, terminateAll + signal/marking, stale-heartbeat reap, recoverOrphanedSubagents, untrack) - 5 watcher tests (write fires onChange, debounce coalesces a burst, dispose idempotency, no-fire-after-dispose, late-mount before file exists) All test surfaces: - `pnpm exec vitest run --pool=forks --maxWorkers=1` → 3949/3949 - `pnpm test:bdd` → 100/100 - `pnpm build` → green - `pnpm lint` → green (1 pre-existing warning unchanged) - `pnpm exec tsc --noEmit -p tsconfig.json` → clean ## Backward compatibility - No public-contract changes. `wizard status --json`, `wizard choice list`, `wizard verification list`, and the MCP server's read-only tools all keep emitting the same envelope shapes; PR 4 just produces *more* records in them. - AI-SDK migration unaffected — no fields removed from `WizardSession`. ## Known limitations - TUI screen-tree redesign still PR 5. - Cold-start bundling still a follow-up. - Some less-trafficked prompt surfaces (the inner agent's `choose` tool, per-tool MCP auth confirmations) intentionally keep their existing transient-text path. The audit table in `docs/orchestration.md` documents what was wired and what was skipped (and why). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- The signal-handler comment in `Supervisor.ensureSignalHandlers` claimed
the wizard "terminate every tracked child synchronously, then re-raise
the signal." Neither is accurate: `terminateAll` is async and we never
call `process.kill(process.pid, sig)`. The clean-exit guarantee comes
from independent `process.once` listeners installed elsewhere — Node
fires every registered listener for the same signal. Rewrite the
comment to match what the code actually does so future maintainers
don't rely on a re-raise that never happens.
- `setDataIngestionConfirmed` was performing two separate
`await import('../../lib/orchestration/checkpoints/verifications.js')`
calls (one for `VerificationStatus`, one for `asVerificationId`).
Combine them into a single destructured import to drop a redundant
microtask + cache lookup.
…iple
The mirror that records the user's environment pick into the
orchestration store as an answered Choice was matching the user's
selection back to a choice option via `o.label.endsWith('/ ' + env)`.
Labels are formatted as `"orgName / projectName / envName"`, so two
orgs or two projects each carrying a "Production" environment would
collide — `Array.find` returned the first matching triple regardless
of which the user actually picked, recording the wrong optionId in
the orchestration store. Outer agents reading `wizard choice show`
would see an answer that pointed at a different env than the wizard
itself proceeded with.
Carry the (orgId, projectId, envName) triple alongside the public
{id, label, description} option shape (Zod's default `.strip()` drops
the extra fields before persistence) and match on the full triple.
The `selection` object returned by `promptEnvironmentSelection` already
carries `{ orgId, projectId, env }`, so this is exact, not heuristic.
…pe MCP promptId - The orchestration watcher's polling fallback was clearing its interval *before* calling `tryAttach()`. If the parent dir existed but `fs.watch` threw on the second poll (EMFILE from too many open fds, EACCES, EBUSY) the watcher entered a permanently broken state with no active handle and no retry — `/status` overlays silently stopped refreshing. Only stop polling once `tryAttach` actually succeeds. - The MCP `promptId` normalization (`client.toLowerCase().replace(/\s+/g, '_')`) was independently reimplemented in `McpScreen.tsx` (two locations) alongside the canonical version in `wiring.ts`. Drift between them was a silent failure mode — `answerChoiceByPromptId` returns null on miss, which the call sites swallow by design, so a typo would leave the choice in `pending` forever. Export `mcpInstallPromptId(client)` from `wiring.ts` and route both `McpScreen.tsx` call sites through it.
… listener `ensureSignalHandlers` registers `SIGINT` / `SIGTERM` listeners with `process.once` (so each fires at most once) and gates re-installation on a `signalHandlersInstalled` boolean. The boolean was being set to `true` and never cleared — so once a signal arrived and consumed the `once` listener, the flag stayed permanently `true` and any subsequent `track()` call short-circuited without re-installing the handler. If the process survives that first signal (a higher-level listener keeps it alive, a CI harness sends a stray SIGINT, etc.), the next signal would land with no `terminateAll` cleanup — leaving the still tracked subagent processes to be orphaned on the OS. Reset `signalHandlersInstalled = false` inside the handler so the next `ensureSignalHandlers()` call re-registers a fresh pair.
…turns null Builders re-read the orchestration store internally, so a TOCTOU race between the existence check above and the envelope build can produce a null envelope. Previously the JSON path emitted an error payload but fell through to a 0 exit, masking the failure from orchestrators that key on exit codes. Now each null branch (choice show / task show / session show / verification show) calls process.exit with the matching code (CHOICE_NOT_FOUND, INVALID_ARGS, VERIFICATION_NOT_FOUND).
`recordDashboardCorrectnessVerification` was being called twice for the same dashboard URL: once directly in `create-dashboard.ts` and again via `store.setChecklistDashboardUrl` (triggered by `ui.setDashboardUrl`). Verifications have no dedup guard, so outer agents and `/status` saw two pending entries for the same URL. Drop the direct call in `create-dashboard.ts` — the store-side mirror already records it for both the cached-existing-dashboard path and the fallback create path. The `dashboard_setup` choice still recorded here since the store doesn't mirror that.
…rdown (stacks on #693) Stacks on #693 → #691 → #690 → #689. Merge after PRs 1+2+3+4. PR 5 turns the TUI from "screens that mostly work" into a serious operator interface with a coherent IA, shared glyph vocabulary, and render-cost discipline. IA redesign: - Three-zone layout (header / body / chrome). - Header: JourneyStepper + identity + mode badge. Mode badge surfaces agent / ci / nested / mcp-server states; suppressed in plain interactive mode. - Operator Overview screen (`/status`) reframed: title + mode badge + 1-line summary, then sectioned by Session / Primary work / Background / Pending choices / Pending verifications / MCP capabilities / Owned artifacts / Next action. Live-refresh on orchestration store mutations via PR 4's file-watcher hook. Glyph palette (canonical vocabulary): ○ queued · › running · … waiting · ⏸ blocked · ✓ completed ✗ failed · ⊘ cancelled · ⮕ superseded Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future "swap one glyph" change is a one-line edit, not a hunt across the screen tree. Pinned by unit tests so silent drift trips a test. Slash command coherence: - New `/help` command lists every registered command grouped by "available anytime" vs "available before/after a setup run". When a run is active, the second group is renamed "paused while a setup run is active (Ctrl+C to cancel, then retry)" so the user knows exactly why a command can't fire. - Multi-line command feedback (e.g. /help, /diagnostics) renders with hanging indent so it reads as one block. Render-cost teardown: - New `useWizardSelector(store, selector, isEqual?)` slice hook. Components subscribed to a slice no longer rerender for unrelated store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported for the common case. - Render-cost benchmark fixture pins the contract: 3 task transitions + 5 status bumps → tasks slice 3 renders, status slice 5 renders, whole-store subscriber 8+ renders. Slicing cuts each subscriber's render budget by ~60%. Tests added (40 over the base 3949): - lifecycle-display vocabulary (5) - mode-badge env resolution (9) - /help text generation (6) - HeaderBar mode badge rendering (5) - useWizardSelector primitives + render-cost ceiling (4 + 3) - StatusOverlayScreen glyph palette + summary + mode badge (7) - StatusOverlayScreen Operator Overview reframing (existing test updated to match new section names) (1) Build, lint, vitest (3989/3989), BDD (100/100) all green. Backward compatibility: - All existing slash commands continue to work the same way; /help is additive. - /status overlay's data shape is unchanged from PR 3; only the rendering reorganized. - --agent, --ci, --json, manifest, plan, apply, verify, MCP server, v: 1 envelope, exit codes — all unchanged. - Mode badge is suppressed in plain interactive mode, preserving the prior header look for the most common case. - ProgressList still uses a blank gutter for `pending` rows rather than the canonical ○ glyph (deliberate UX trade-off — see comment in ProgressList.tsx). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tasks
`StateBadge` already renders `glyph + label`, so the primary-tasks row was
prefixing the lifecycle glyph twice (e.g. `› › Running — Detect framework`).
Drop the redundant standalone `<Text>{glyph}</Text>` and let the badge own
the prefix, matching the background-tasks row's single-glyph rendering.
The previous implementation only iterated keys of `a` and compared via
property access. Two objects with the same key count but disjoint key
names — e.g. `{x: undefined, y: 1}` vs `{z: undefined, y: 1}` — would
pass because `b.x` returned `undefined === a.x`. Add a
`hasOwnProperty` guard on `b` so disjoint key sets compare unequal,
plus a regression test.
`HeaderBar` already gates the mode badge on `resolved.key !== 'interactive'`, so the default interactive run never sees a stray `[interactive]` chip. `StatusOverlayScreen` rendered the badge unconditionally in its header, so opening `/status` during a normal interactive run printed `[interactive]` next to "Operator overview" — which the brief explicitly says is noise. Mirror HeaderBar's gate and add tests covering both branches (suppressed in interactive, visible in agent mode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Progress tab on `feat/v2-tui-redesign` used a single-line
`wrap="truncate-end"` row to display every planned event name
joined by commas:
◆ Events: Nf User Signed Up, Nf User Signed In, Nf Use…
Once the agent fills in 10+ events, the line truncates to "…"
even though the screen has a full column of empty rows below it
— the active task list collapses to ~5 lines once Wiring is the
focused step. The user can't audit the plan they just approved
without flipping to /events.
Render one event per row with name + description, soft-wrapped.
The bullet (`·`) lines up with the existing diamond glyph
column. The `(N events)` count gives an at-a-glance scale check.
No data-model change — `PlannedEvent` is still `{name, description}`.
Per-event lifecycle status (queued → in_progress → done) is the
shape of #698 against `main` and is a separate change.
27 RunScreen tests still pass; no test asserted on the comma-join
shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous ReportViewer wrapped each visible line in `<Text wrap="truncate">`, which silently dropped the right edge of any line wider than the content area — events-table rows and code blocks with long paths or JSON payloads got a stray "…" decoration and the user had no way to read what was clipped. Add ANSI-aware horizontal panning so the user can shift the visible window left/right with `h`/`l` (or arrow keys), keep colour codes intact across the slice, and surface the full LogViewer-style key hint footer (↑↓/jk scroll · h/l pan · g/G top/bottom · 0 reset · Esc close). The pan offset is clamped to the widest line so users can't scroll into empty whitespace. Also include a regression test covering: (a) horizontal pan offset shifts visible content, (b) lines wider than the content area are no longer clipped at the right edge, (c) the key-hint footer surfaces all documented controls, and (d) `0` resets the pan offset. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…added/removed (#707) The event-plan approval flow used to feel like list replacement. Each time the user pressed `[F] give feedback`, the agent's revised plan silently overwrote the prior list — the user lost the context of what they had just said and what the AI changed in response. The recent "keep event names snake cased and prefixed" feedback round made this visible: the AI applied the prefix but the user couldn't tell from the new screen alone whether snake_case had also landed (it had not — the display normalizer was Title-Casing — but the lack of an explicit "AI: revised plan +N -M" signal meant nothing about the conversation was legible). Add a round-history layer so the screen can render the back-and-forth: * Store now keeps `EventPlanRound[]`, one entry per `promptEventPlan` call. Each round carries the AI's plan + the user feedback (if any) that produced it. Cleared on `approved` / `skipped`; persists across `revised` rounds. * `pendingPlanFeedback` (instance field) buffers feedback typed in one `[F]` decision and pairs it with the next `promptEventPlan` from the agent. Single-pair carry — no leakage across runs. * EventPlanFullScreen renders a conversational header on rounds ≥ 2: - "You: <quoted feedback>" - "AI: revised plan +N added · −M removed" (with green/red counts) * Per-row diff markers when a prior round exists: - `+` (green) for events new to this round - `−` (red, struck-through) for events the AI dropped - bullet (`·`) for unchanged events Diff is by name (description regen on revision is expected). Round 1 still renders the original "Suggested events for your app" title — the convo affordance only appears once there's actually a conversation to render. Tests: * 3 new cases in EventPlanFullScreen.test.tsx — round-2 quote+delta rendering, round-1 fallback, history clear on approve. * All 174 existing store tests + 6 existing screen tests stay green. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Tab autocomplete LCP includes description-matched commands
- Filtered the LCP computation to only include commands whose
cmdstarts with the typed prefix (query), excluding description-only matches that were diluting the longest common prefix.
- Filtered the LCP computation to only include commands whose
Or push these changes by commenting:
@cursor push a9b6914430
Preview (a9b6914430)
diff --git a/src/ui/tui/primitives/SlashCommandInput.tsx b/src/ui/tui/primitives/SlashCommandInput.tsx
--- a/src/ui/tui/primitives/SlashCommandInput.tsx
+++ b/src/ui/tui/primitives/SlashCommandInput.tsx
@@ -147,7 +147,11 @@
// /debug + /diagnostics, but the picker stays open so they can
// disambiguate with the next keystroke).
if (isSlashMode && filtered.length > 0) {
- const lcp = longestCommonPrefix(filtered.map((c) => c.cmd));
+ const prefixMatches = filtered.filter((c) =>
+ c.cmd.slice(1).toLowerCase().startsWith(query),
+ );
+ if (prefixMatches.length === 0) return;
+ const lcp = longestCommonPrefix(prefixMatches.map((c) => c.cmd));
if (lcp.length > value.length) {
setValue(lcp);
setSelectedIndex(0);You can send follow-ups to the cloud agent here.
…istration Screen.Options has been dead code: only referenced in screen-registry as a null component, never appeared in any flow pipeline. Drop the enum entry and its registry placeholder to reduce surface area before v3 work lands.
…ion) Inside the slash-command picker, Tab used to be silently dropped (if (key.tab) return) — a missed UX affordance. CLI shells autocomplete Tab to the longest common prefix of remaining candidates; users have trained on that behavior for decades. Wire it up: /d + Tab → /d (LCP of /debug + /diagnostics) /diag + Tab → /diagnostics (single candidate, fully completed) /zzz + Tab → /zzz (no candidates, no-op) The picker stays open after Tab so the next keystroke disambiguates; the dropdown's selection state is unchanged. Pure helper longestCommonPrefix is exported for unit testing.
The Events tab was conditionally spread only when eventPlan.length > 0.
The agent's first event plan landing would re-order the tab strip
mid-run (insert a new tab between Progress and Logs), so users already
navigated to Logs would see Logs jump to a new index and the highlight
drift to a different tab.
Drop the conditional spread. EventPlanViewer already renders an
empty-state placeholder ('Waiting for the agent to propose events…')
when the plan is empty, so the cost of always rendering the tab is one
soft placeholder line behind a tab the user opts into.
… still works) Snake was a fourth tab on the RunScreen strip alongside Progress / Events / Logs — a workspace-grade slot for an easter egg. Drop it from the tab strip; users can still summon it via /snake (Overlay.Snake is unchanged in screen-registry.tsx and the slash command still dispatches it). Result: the ←/→ tab traversal cycles only through work surfaces, and the chrome's visual weight drops by one cell.
…m ProgressTab InlineEventPlan duplicated the Events tab — it summarized the same PlannedEvent list the dedicated tab now always renders (commit 4). The Progress tab is for active work, not the plan readout. ConditionalTips existed solely to render a Stripe doc-link banner under the task list when DiscoveredFeature.Stripe was present. The Discovery Feed is the right home for that surface; the link moves to a 'stripe' DiscoveryFact pushed by agent-runner alongside framework / region / package-manager, where it can rely on the resolved zone + selectedOrgId being present. Net: 50+ lines deleted from RunScreen, six unused imports gone, the Progress tab's vertical real estate is exclusively task + finalizing work.
…ules-of-hooks) The 'reportExists' useEffect (one-shot fs.existsSync to decide whether to show the bug-report picker option) lived AFTER the if (!outroData) return <FinishingUp /> early-return site. React's rules of hooks require the same hook call order on every render: the fallback render called one fewer hook than the happy path, so any session-state transition that flipped outroData from null → non-null would trip 'Rendered more hooks than during the previous render'. Hoist the reportPath derivation + the useEffect above the early return. Add a regression test that mounts with outroData=null, transitions to Success, and asserts no rules-of-hooks console.error fires.
Pre-fix copy: 'We'll wire the Amplitude MCP into Claude Code, Cursor,
Claude Desktop, and other AI tools you have installed.' That list was
hardcoded BEFORE detection ran, so users with none of those tools
installed saw the wizard promise integrations it couldn't deliver.
New copy:
- Lead paragraph is the value prop only: 'Ask Amplitude questions
from your editor — e.g. show me yesterday's signups.'
- Detected clients surface in the existing Phase.Ask 'Detected: …'
line — unchanged surface, but now the first thing the user sees
naming a client is grounded in actual detection.
- Phase.None copy gains a fallback hint: 'You can still install via
amplitude-wizard mcp serve.' so users without supported editors
know they can wire MCP up later without re-running the wizard.
Snapshot regenerated accordingly.
Bugbot finding on PR #713 (commit 50511fc): `longestCommonPrefix` is computed over `filtered`, but `filtered` includes commands matched by description text — not just by command-name prefix. Concrete failure case the PR explicitly promises: - User types `/diag` - `filtered` = [/diagnostics, /debug] — because /debug's description contains "diagnostic" - LCP across [/diagnostics, /debug] = `/d` - Pre-fix: `/d`.length (2) < `/diag`.length (5), so the Tab branch's `if (lcp.length > value.length)` guard becomes false and Tab is a no-op when the user explicitly wants /diagnostics filled in. Fix: filter LCP candidates to commands whose `cmd` starts with the current input value BEFORE computing the LCP. Description matches still participate in the picker render (so users can find /debug by typing "diag"), but they don't dilute Tab autocomplete. Regression test in SlashCommandInput.tab.test.tsx pins the exact failure case: cmds [/debug (desc contains diag), /diagnostics], input `/diag`, Tab → input becomes `/diagnostics`. Pre-fix this test would have been a no-op assertion failure.
50511fc to
a296997
Compare
kelsonpw
added a commit
that referenced
this pull request
May 11, 2026
Same finding as #713 — `filtered` includes description matches. Naive LCP across [/diagnostics, /debug] (where /debug's desc contains 'diag…') is `/d`, so `/diag` + Tab is a no-op when the user clearly wants /diagnostics completed. Filter LCP candidates by cmd-prefix BEFORE computing.
Contributor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Duplicated
longestCommonPrefixutility function- Extracted the duplicated function to a shared module at
src/ui/tui/utils/longest-common-prefix.tsand updated bothPathInput.tsxandSlashCommandInput.tsxto import and re-export from it.
- Extracted the duplicated function to a shared module at
Or push these changes by commenting:
@cursor push 8925239752
Preview (8925239752)
diff --git a/src/ui/tui/components/PathInput.tsx b/src/ui/tui/components/PathInput.tsx
--- a/src/ui/tui/components/PathInput.tsx
+++ b/src/ui/tui/components/PathInput.tsx
@@ -31,7 +31,10 @@
import { useState, useMemo, useRef, useEffect } from 'react';
import { readdirSync, type Dirent } from 'node:fs';
import { homedir } from 'node:os';
-import { isAbsolute as pathIsAbsolute, resolve as pathResolve } from 'node:path';
+import {
+ isAbsolute as pathIsAbsolute,
+ resolve as pathResolve,
+} from 'node:path';
import { useScreenInput } from '../hooks/useScreenInput.js';
import { Colors, Icons } from '../styles.js';
@@ -142,9 +145,7 @@
// Strip the trailing slash so `expandTilde` + `path.resolve` work
// on the directory itself rather than on a path with a trailing
// separator that happens to land on a file later.
- const withoutTrailingSlash = stem.endsWith('/')
- ? stem.slice(0, -1)
- : stem;
+ const withoutTrailingSlash = stem.endsWith('/') ? stem.slice(0, -1) : stem;
// A bare "~" expands to homedir. A bare "/" stays as the root.
if (withoutTrailingSlash === '') return '/';
if (withoutTrailingSlash === '~') return homedir();
@@ -205,24 +206,8 @@
return { stem, partial, candidates };
}
-/**
- * Compute the longest common prefix of an array of strings. Used when
- * Tab finds multiple matches: shells fill the prompt with the LCP and
- * leave the rest to the user.
- */
-export function longestCommonPrefix(items: string[]): string {
- if (items.length === 0) return '';
- if (items.length === 1) return items[0];
- let prefix = items[0];
- for (let i = 1; i < items.length; i++) {
- const s = items[i];
- let j = 0;
- while (j < prefix.length && j < s.length && prefix[j] === s[j]) j++;
- prefix = prefix.slice(0, j);
- if (prefix === '') return '';
- }
- return prefix;
-}
+import { longestCommonPrefix } from '../utils/longest-common-prefix.js';
+export { longestCommonPrefix };
/**
* Replace the trailing segment of `input` with `candidateName`, optionally
@@ -251,7 +236,9 @@
const [value, setValue] = useState(() => shortenHomePath(initialValue));
// Caret position (number of chars from the start). We render a
// block cursor at this index.
- const [cursor, setCursor] = useState(() => shortenHomePath(initialValue).length);
+ const [cursor, setCursor] = useState(
+ () => shortenHomePath(initialValue).length,
+ );
// Last validation error, cleared when the user types.
const [error, setError] = useState<string | null>(null);
@@ -365,7 +352,11 @@
setCandidates(result.candidates);
baselineRef.current = value;
const first = direction === 1 ? 0 : result.candidates.length - 1;
- const previewed = applyCompletion(value, result.candidates[first].name, false);
+ const previewed = applyCompletion(
+ value,
+ result.candidates[first].name,
+ false,
+ );
setValue(previewed);
setCursor(previewed.length);
setCycleIndex(first);
diff --git a/src/ui/tui/primitives/SlashCommandInput.tsx b/src/ui/tui/primitives/SlashCommandInput.tsx
--- a/src/ui/tui/primitives/SlashCommandInput.tsx
+++ b/src/ui/tui/primitives/SlashCommandInput.tsx
@@ -33,28 +33,8 @@
);
}
-/**
- * Returns the longest common prefix shared by every string in `values`.
- * Returns the input itself if `values` has a single entry, and `''` if
- * the input is empty. Used by Tab autocomplete: typing `/d` then Tab
- * extends the input to `/d` (no common prefix beyond the current input
- * across /debug, /diagnostics), while `/diag` + Tab extends to
- * `/diagnostics`.
- */
-export function longestCommonPrefix(values: string[]): string {
- if (values.length === 0) return '';
- if (values.length === 1) return values[0];
- let prefix = values[0];
- for (let i = 1; i < values.length; i++) {
- const candidate = values[i];
- let j = 0;
- const maxLen = Math.min(prefix.length, candidate.length);
- while (j < maxLen && prefix[j] === candidate[j]) j++;
- prefix = prefix.slice(0, j);
- if (prefix === '') return '';
- }
- return prefix;
-}
+import { longestCommonPrefix } from '../utils/longest-common-prefix.js';
+export { longestCommonPrefix };
interface SlashCommandInputProps {
commands?: SlashCommand[];
@@ -155,13 +135,9 @@
// is `/d`, shorter than `/diag` itself — so Tab would be a
// no-op when the user explicitly wants /diagnostics filled in.
if (isSlashMode && filtered.length > 0) {
- const lcpCandidates = filtered.filter((c) =>
- c.cmd.startsWith(value),
- );
+ const lcpCandidates = filtered.filter((c) => c.cmd.startsWith(value));
if (lcpCandidates.length > 0) {
- const lcp = longestCommonPrefix(
- lcpCandidates.map((c) => c.cmd),
- );
+ const lcp = longestCommonPrefix(lcpCandidates.map((c) => c.cmd));
if (lcp.length > value.length) {
setValue(lcp);
setSelectedIndex(0);
diff --git a/src/ui/tui/utils/longest-common-prefix.ts b/src/ui/tui/utils/longest-common-prefix.ts
new file mode 100644
--- /dev/null
+++ b/src/ui/tui/utils/longest-common-prefix.ts
@@ -1,0 +1,20 @@
+/**
+ * Returns the longest common prefix shared by every string in `items`.
+ * Returns the input itself if `items` has a single entry, and `''` if
+ * the input is empty. Used by Tab autocomplete in both slash-command
+ * and path-completion contexts.
+ */
+export function longestCommonPrefix(items: string[]): string {
+ if (items.length === 0) return '';
+ if (items.length === 1) return items[0];
+ let prefix = items[0];
+ for (let i = 1; i < items.length; i++) {
+ const candidate = items[i];
+ let j = 0;
+ const maxLen = Math.min(prefix.length, candidate.length);
+ while (j < maxLen && prefix[j] === candidate[j]) j++;
+ prefix = prefix.slice(0, j);
+ if (prefix === '') return '';
+ }
+ return prefix;
+}You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit a296997. Configure here.
…licating Bugbot finding: SlashCommandInput.tsx redefined longestCommonPrefix locally while components/PathInput.tsx already exports a semantically identical function (used by file-path autocomplete). Same algorithm, same signature — single source of truth. - Drop the local definition in SlashCommandInput.tsx - Import `longestCommonPrefix` from `../components/PathInput.js` - Drop the duplicate-test block in SlashCommandInput.test.ts (the PathInput tests already cover the helper; the SlashCommandInput.tab e2e test still pins Tab autocomplete behavior end-to-end)
Trivial trailing/duplicate-blank-line removals that prettier wanted. Local prettier-write surfaced exactly the two files CI flagged.
Closed
5 tasks
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Why this PR exists
Five expert TUI subagents independently audited the codebase. This PR ships the intersection of their consensus low-risk recommendations — boring, defensible wins that pave the way for the v3 redesign work tracked elsewhere without touching its surface area.
Every change is its own atomic commit so a reviewer can bisect, revert one without the others, or cherry-pick into a hotfix branch.
The 8 changes
refactor(tui): remove unused Screen.Options enumsrc/ui/tui/flows.ts,src/ui/tui/screen-registry.tsxfeat(tui): register /help slash commanddocs/flows.mdandCLAUDE.mdboth promise/help; the COMMANDS registry never had it. Add registration + dispatcher with categorized cheatsheetsrc/ui/tui/console-commands.ts,src/ui/tui/components/ConsoleView.tsxfeat(tui): /<Tab> autocompletes to longest common prefix/d→ no-op,/diag→/diagnostics)src/ui/tui/primitives/SlashCommandInput.tsxfeat(tui): pin Events tab to RunScreen alwayseventPlan.length > 0made the tab strip re-index mid-run when the agent's first plan landed. EventPlanViewer already has an empty-state placeholdersrc/ui/tui/screens/RunScreen.tsxrefactor(tui): demote Snake to overlay-only/snakeslash command +Overlay.Snakestill worksrc/ui/tui/screens/RunScreen.tsxrefactor(tui): remove redundant InlineEventPlan + ConditionalTips'stripe'DiscoveryFact pushed by agent-runnersrc/ui/tui/screens/RunScreen.tsx,src/lib/agent-runner.tsfix(tui): hoist OutroScreen useEffect above early returnif (!outroData) return <FinishingUp />— rules-of-hooks violation. TheoutroData: null → successtransition would trip React's "Rendered more hooks than" errorsrc/ui/tui/screens/OutroScreen.tsxfix(tui): McpScreen leads with value prop, names only detected clientsamplitude-wizard mcp servefallback hintsrc/ui/tui/screens/McpScreen.tsxTest coverage added
Each behavioral change has a pinning test next to its source:
src/ui/tui/__tests__/console-commands.test.ts—/help+/snakeregistry +getHelpTextcategorized outputsrc/ui/tui/primitives/__tests__/SlashCommandInput.test.ts—longestCommonPrefixunit testssrc/ui/tui/primitives/__tests__/SlashCommandInput.tab.test.tsx— end-to-end Tab autocomplete via stdinsrc/ui/tui/screens/__tests__/RunScreen.spacing.test.tsx— Events tab always present, Snake absent from strip, ConditionalTips banner gone, EventPlanViewer renders on plan flipsrc/ui/tui/screens/__tests__/OutroScreen.hookOrder.test.tsx—outroData: null → successtransition produces noconsole.errormatching "Rendered .* hooks than"src/ui/tui/screens/__tests__/McpScreen.copy.test.tsx— Phase.Detecting names no client, Phase.Ask shows detected list, Phase.None hints at the fallback subcommandExisting snapshot regenerated:
src/ui/tui/screens/__tests__/__snapshots__/McpScreen.snap.test.tsx.snap.Gates
pnpm exec tsc --noEmit— cleanpnpm exec eslint src/ --cache— 0 errors, 1 pre-existing warning inEventPlanFullScreen.test.tsx(unrelateddefaultFlowimport that's been on main since fix(p0): render event-plan as full-screen overlay #626; not touched here)pnpm build+ smoke — cleanpnpm exec vitest run --pool=forks --maxWorkers=1— 3830 tests pass across 257 filesOut-of-scope (deferred to follow-up PRs)
The 5-subagent audit also flagged the following higher-blast-radius items. They will each be their own PR so the review surface stays small:
ErrorFallbacktext with a recover/exit picker)🤖 Generated with Claude Code
Note
Low Risk
Primarily TUI UX/flow cleanups and test additions, with minimal behavioral impact beyond command input handling and small UI routing changes.
Overview
Improves the TUI Run experience by keeping the
Eventstab always present (preventing tab index shifts mid-run) and moving Snake out of the tab strip to overlay-only access via/snake.Adds Tab-based slash-command autocomplete in
SlashCommandInput(longest common prefix, ignoring description-only matches) and pins related behavior with new tests.Moves the Stripe “connect as data source” affordance from a RunScreen banner into
agent-runneras astripediscovery fact, fixes a rules-of-hooks early-return issue inOutroScreen, and refreshesMcpScreencopy to avoid naming editors pre-detection (with a fallback hint to useamplitude-wizard mcp serve).Reviewed by Cursor Bugbot for commit 49c6f17. Bugbot is set up for automated code reviews on this repo. Configure here.