Skip to content

feat(orchestration): v2 foundation — durable state, lifecycle, status/resume commands (PR 1 of 3)#689

Merged
kelsonpw merged 15 commits into
mainfrom
feat/v2-orchestration-foundation
May 13, 2026
Merged

feat(orchestration): v2 foundation — durable state, lifecycle, status/resume commands (PR 1 of 3)#689
kelsonpw merged 15 commits into
mainfrom
feat/v2-orchestration-foundation

Conversation

@kelsonpw
Copy link
Copy Markdown
Member

@kelsonpw kelsonpw commented May 9, 2026

TL;DR


Problem

Wizard's WizardSession (src/lib/wizard-session.ts) is the de-facto orchestration state, but it's an in-memory snapshot held in a single process — there's no durable, machine-readable surface for "what's running, what stopped, what's the resume command." Outer agents that wrap the wizard scrape TUI text or grep log.ndjson. Status JSON exists for a few specific commands (wizard status, wizard projects list) but each shape is ad-hoc — there is no unified envelope.

Task lifecycle is implicit in the tasks array (booleans done / error). Last-stopping-point is partially captured by session-checkpoint.ts but only stores intro-phase state, not active task ownership of branches/PRs/worktrees.

Why this is "PR 1 of 3"

This PR introduces a single durable orchestration store that becomes the source of truth for sessions, tasks, subagents, ownership, last-stopping-point, and structured task results.

Foundation only. The TUI redesign, the user-choice / verification / MCP-app lifecycle plumbing, and the MCP-server tool surface deferred to PRs 2 and 3:

  • PR 1 (this PR) — durable schema, lifecycle, store, six new read CLI commands, last-stopping-point derivation. Legacy WizardSession stays live alongside.
  • PR 2 — widens PendingCheckpoint with concrete schemas, routes existing user-choice / event-plan-confirm / MCP-app prompt sites through the store. The pendingChoices / pendingMcpActions / pendingManualVerifications arrays start carrying real content.
  • PR 3 — TUI v2 reads from the store as its source of truth, retires duplicate state, surfaces the new MCP-server read-only tools so outer coding agents can call them as typed tools instead of parsing CLI stdout.

Architecture is documented in docs/orchestration.md.

State model

                 ┌────────────────────────────────────────┐
                 │             Session                    │
                 │  id, installDir, status, goal, branch  │
                 └─────────────┬──────────────────────────┘
                               │ 1..n
                               ▼
                 ┌────────────────────────────────────────┐
                 │              Task                      │
                 │ id, sessionId, parentTaskId,           │
                 │ label, state, ownership[], result      │
                 └─────────┬───────────────────┬──────────┘
                           │ 1..n              │ 1..n
                           ▼                   ▼
            ┌──────────────────────┐  ┌──────────────────┐
            │      Subagent        │  │   Ownership      │
            │ kind, rootTaskId     │  │ kind, …          │
            └──────────────────────┘  └──────────────────┘

Task lifecycle

  ┌──────────┐        ┌──────────┐        ┌────────────┐
  │  queued  │──────► │ running  │──────► │ completed  │ (terminal)
  └──────────┘        └────┬─────┘        └────────────┘
       │                   ├──► waiting_for_user ◄──┐
       │                   ├──► blocked          ◄──┤
       └─► cancelled       └──► failed              │
                                                    │
   any non-terminal  ──────► superseded ────────────┘

Implemented in src/lib/orchestration/lifecycle.ts. The assertTransition(taskId, from, to) helper is the trust boundary — illegal transitions throw IllegalTaskTransitionError rather than corrupt persisted state.

New commands

Command Purpose
wizard tasks List tasks; filter with --state, --session-id
wizard task <id> Inspect a single task
wizard sessions List sessions
wizard session <id> Inspect a session and its tasks
wizard resume <session-id> Print (or run with --execute) the resume command
wizard orchestration status Print the LastStoppingPoint snapshot

Every command supports --json (auto-enabled when stdout isn't a TTY) and validates its JSON payload against a Zod envelope schema before writing. A regression in the producer surfaces as a thrown ZodError on the producer side rather than silent corruption downstream.

Example human output

$ amplitude-wizard orchestration status
Store: /Users/me/.amplitude/wizard/runs/3d8f2a1b9c4e/orchestration.json (generated 2026-05-09T12:34:56.789Z)
Active session: session_a3f9e7c2d1b48a09f5c6
Goal:           set up Amplitude in nextjs
Branch:         feat/amplitude-setup
Active tasks:           1
Stopped tasks (24h):    0
Recently completed:     2

Next action: A task is waiting for user input: review the proposed events.json.
Resume:      amplitude-wizard --install-dir /Users/me/myapp

Example JSON envelope (wizard orchestration status --json)

{
  "v": 1,
  "type": "orchestration_status",
  "generatedAt": "2026-05-09T12:34:56.789Z",
  "installDir": "/Users/me/myapp",
  "storePath": "/Users/me/.amplitude/wizard/runs/3d8f2a1b9c4e/orchestration.json",
  "storeExists": true,
  "lastStoppingPoint": {
    "generatedAt": 1715258096789,
    "currentSessionId": "session_a3f9e7c2d1b48a09f5c6",
    "currentGoal": "set up Amplitude in nextjs",
    "currentBranch": "feat/amplitude-setup",
    "currentWorktree": "/Users/me/myapp",
    "activeTasks": [
      {
        "id": "task_b1c2d3e4f5a6b7c8d9e0",
        "sessionId": "session_a3f9e7c2d1b48a09f5c6",
        "label": "event plan confirmation",
        "state": "waiting_for_user",
        "ownership": [],
        "subagentKind": "instrumentation",
        "createdAt": 1715258090000,
        "updatedAt": 1715258091000,
        "startedAt": 1715258090500,
        "waitingFor": {
          "id": "cp_event_plan",
          "kind": "event_plan_confirm",
          "summary": "review the proposed events.json",
          "enteredAt": 1715258091000
        }
      }
    ],
    "stoppedTasks": [],
    "recentlyCompletedTasks": [],
    "relevantOwnership": [],
    "pendingChoices": [
      {
        "id": "cp_event_plan",
        "kind": "event_plan_confirm",
        "summary": "review the proposed events.json",
        "enteredAt": 1715258091000
      }
    ],
    "pendingMcpActions": [],
    "pendingManualVerifications": [],
    "nextAction": {
      "kind": "await_user_choice",
      "description": "A task is waiting for user input: review the proposed events.json.",
      "command": ["amplitude-wizard", "--install-dir", "/Users/me/myapp"]
    },
    "resumeCommand": "amplitude-wizard --install-dir /Users/me/myapp"
  }
}

Schemas

All persisted shapes have a runtime Zod validator in src/lib/orchestration/schemas.ts. The on-disk envelope carries an explicit version: 1 literal — a version-mismatched file returns kind: 'corrupt' from loadStore so readers can distinguish "no store yet" from "found a store but couldn't parse it." See docs/orchestration.md for the full schema reference.

Exit codes

The new commands extend the existing ExitCode contract — see docs/exit-codes.md:

Code When
0 command succeeded
1 unexpected error reading the store / serialising output
2 invalid args — bad --state value, malformed task_<id> / session_<id> prefix, unknown id
130 SIGINT during the command

These commands are read-only and never trigger auth flows or network calls, so codes 3 / 4 / 10 / 11 / 12 / 13 / 20 are not reachable.

Performance / cost

Each task transition triggers ≤ 1 atomic write of a small JSON file (temp-file + rename via atomicWriteJSON, mode 0o600). A typical wizard run touches a few hundred transitions; the orchestration file therefore stays well under the I/O budget the wizard already spends on runs/<hash>/log.ndjson writes. PR 3 will add a debounced in-memory cache for high-frequency call sites if needed.

Backward compatibility

  • WizardSession is untouched. Every existing call site continues to read from / write to the in-memory snapshot.
  • The new orchestration store is mirrored from a single high-leverage hook (session start in src/run.ts). Mirror failures are logged and swallowed — they cannot block the wizard run.
  • All existing CLI flags, env vars, exit codes, NDJSON envelope (v: 1), and MCP server behavior are preserved.
  • One new hidden yargs shadow flag (--cache-dir) was added so .strict() accepts the existing AMPLITUDE_WIZARD_CACHE_DIR env-var auto-mapping on every command (the env var was already read by src/utils/storage-paths.ts:getCacheRoot; this PR just unblocks .strict()).

Tests added (48 total, all passing)

  • src/lib/orchestration/__tests__/lifecycle.test.ts — 10 tests; exhaustive transition matrix, identity rejection, terminal-state outbound rejection, illegal-transition error message contract.
  • src/lib/orchestration/__tests__/schemas.test.ts — 14 tests; round-trip tests for Task, Session, Subagent, Ownership, TaskResult, OrchestrationStoreFile, StatusEnvelope. Negative cases for malformed ids, unknown lifecycle states, broken discriminated unions, wrong version literal.
  • src/lib/orchestration/__tests__/store.test.ts — 11 tests; create/list/transition round-trips, illegal-transition rejection, terminal result stamping, idempotent ownership, atomic-write durability (a thrown write leaves the prior file intact and orphan-free), corrupt store handling for invalid JSON and schema mismatch.
  • src/lib/orchestration/__tests__/last-stopping-point.test.ts — 5 tests; empty store snapshot, populated-store grouping, auth-blocked → fix_auth next-action, 24-hour window expiry, ownership aggregation.
  • src/commands/__tests__/orchestration.test.ts — 8 CLI smoke tests spawning the real bin.ts against a tmp install dir. Validates JSON output of every new command against its Zod envelope schema. Checks exit codes for both happy-path and error cases (bad state filter, nonexistent ids).

Test plan

  • pnpm exec vitest run --pool=forks --maxWorkers=1 src/lib/orchestration/ src/commands/__tests__/orchestration.test.ts — 48/48 passing
  • pnpm exec vitest run --pool=forks --maxWorkers=1 — full suite 3828/3828 passing
  • pnpm test:bdd — 100 scenarios / 445 steps passing
  • pnpm lint — clean (one pre-existing warning in EventPlanFullScreen.test.tsx, untouched)
  • pnpm build — compiles + smoke test passes

Manual verification

  • Run node dist/bin.js orchestration status --json in a fresh project — empty store, nextAction.kind === "none".
  • Run a normal wizard session, then re-invoke orchestration status — observe a populated store, currentSessionId set.
  • Run wizard tasks --state running — filter takes effect.
  • Run wizard task <bogus-id> — exit code 2.
  • Run wizard resume <session-id> — prints (does not execute) a resume command. Pass --execute to actually invoke it.
  • Inspect ~/.amplitude/wizard/runs/<hash>/orchestration.json — file mode is 0o600, JSON validates against OrchestrationStoreFileSchema.

Known limitations (deferred)

  • TUI hasn't been wired to read from the orchestration store — src/ui/tui/ is unchanged. (PR 3.)
  • The MCP server (amplitude-wizard mcp serve) hasn't gained read-only tools that wrap the store. (PR 3.)
  • Existing tasks.push(...) sites in src/ui/tui/store.ts and src/lib/wizard-session.ts continue to use the legacy in-memory shape. The orchestration store is mirrored only from session start in src/run.ts. (PR 2 widens this; PR 3 retires the duplicate.)
  • pendingChoices / pendingMcpActions / pendingManualVerifications are stub arrays in PR 1 — the schema is stable but the producer sites land in PR 2.

Follow-up roadmap

  • PR 2 — Concrete PendingCheckpoint schemas; route existing user-choice / event-plan-confirm / MCP-app prompt sites through the store; checkpoints + MCP-app lifecycle.
  • PR 3 — TUI v2 reads from the store; retire duplicated state; MCP-server tool parity (read-only orchestration tools); perf / batched writes.

🤖 Generated with Claude Code


Note

Medium Risk
Adds a new file-backed orchestration store plus multiple new CLI commands and JSON contracts; while largely read-only, it introduces new persistence and exit-path behavior that could affect tooling and session startup in edge cases (permissions/corrupt state).

Overview
Introduces a new durable v2 orchestration store under src/lib/orchestration/ (file-backed JSON, atomic writes, Zod-validated schemas) with an explicit TaskLifecycle state machine and computeLastStoppingPoint derivation for “what to do next”/resume hints.

Adds six new read-only inspection commands (tasks, task <id>, sessions, session <id>, resume <session-id> [--execute], orchestration status) that emit Zod-validated JSON envelopes (auto-JSON when piped) and standardized exit codes; wires them into bin.ts and exports.

Mirrors wizard session start into the orchestration store from src/run.ts (best-effort, errors logged and swallowed), makes CLI_INVOCATION a constant npx @amplitude/wizard for stable resume commands, adds a hidden --cache-dir passthrough for AMPLITUDE_WIZARD_CACHE_DIR, and includes extensive new unit + CLI smoke tests plus new docs (docs/orchestration.md, docs/exit-codes.md).

Reviewed by Cursor Bugbot for commit be4dbdd. Bugbot is set up for automated code reviews on this repo. Configure here.

…/resume commands (PR 1 of 3)

Introduce src/lib/orchestration/ — a durable, file-backed orchestration
store that becomes the source of truth for sessions, tasks, subagents,
ownership, and last-stopping-point. Adds six new read-only CLI commands
(tasks/task/sessions/session/resume/orchestration status), each emitting
Zod-validated JSON envelopes for outer agents.

Foundation only. Legacy WizardSession remains the live in-memory
surface; PR 2 wires checkpoints + MCP-app lifecycle, PR 3 retires
duplicate state and ships the TUI redesign + MCP-server tool parity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw kelsonpw requested a review from a team as a code owner May 9, 2026 15:10
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Raw spawn from node:child_process breaks Windows
    • Replaced await import('node:child_process') with await import('../utils/cross-platform-spawn.js') so the amplitude-wizard .cmd shim is resolved correctly on Windows.
  • ✅ Fixed: User-provided --install-dir missing tilde expansion
    • Added resolveInstallDir(argv.installDir) in resolveCommonOpts to properly expand tilde and resolve the path, matching the pattern used by the dashboard command.

Create PR

Or push these changes by commenting:

@cursor push e5add8f93d
Preview (e5add8f93d)
diff --git a/src/commands/orchestration.ts b/src/commands/orchestration.ts
--- a/src/commands/orchestration.ts
+++ b/src/commands/orchestration.ts
@@ -45,7 +45,8 @@
   json?: boolean;
   human?: boolean;
 }): Promise<CommonOpts> {
-  const installDir = argv.installDir ?? process.cwd();
+  const { resolveInstallDir } = await import('../utils/install-dir.js');
+  const installDir = resolveInstallDir(argv.installDir);
   const { resolveMode } = await import('../lib/mode-config.js');
   const { jsonOutput } = resolveMode({
     json: argv.json,
@@ -583,7 +584,7 @@
         if (execute) {
           // Spawn the resume command. Default behavior is "print only" for
           // safety — orchestrators that want auto-execution opt in.
-          const { spawn } = await import('node:child_process');
+          const { spawn } = await import('../utils/cross-platform-spawn.js');
           const [cmd, ...rest] = command;
           if (!cmd) {
             if (opts.jsonOutput)

You can send follow-ups to the cloud agent here.

Comment thread src/commands/orchestration.ts
Comment thread src/commands/orchestration.ts
resolveCommonOpts now passes argv.installDir through resolveInstallDir
so quoted/env-sourced `~` actually expands instead of being treated
as a literal directory name.

The resume --execute path now imports spawn from utils/cross-platform-
spawn so the npm-installed `amplitude-wizard` .cmd shim resolves on
Windows. Node's built-in spawn does not consult PATHEXT and would fail
with ENOENT for every Windows user invoking `wizard resume --execute`.
kelsonpw added a commit that referenced this pull request May 9, 2026
…kpoints + MCP-app lifecycle

Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of
the v2 orchestration foundation:

- Choice — typed user-choice records with stable promptId for de-dup,
  requiresHuman automation gate, and full status transitions
  (pending → answered/expired/cancelled/superseded).
- Verification — manual out-of-band verification records with
  status transitions (pending → passed/failed/skipped, skipped/failed
  may recover to passed; passed/skipped/failed may supersede).
- McpAppCapability — durable lifecycle for every MCP-app capability
  with an anti-nag invariant: install_skipped → needs_user_choice
  REQUIRES a non-empty lastStateChangeReason.

New CLI commands:
- wizard choice list / show / answer (with --confirm-human gate)
- wizard verification list / show / mark

Wires last-stopping-point's pendingChoices / pendingMcpActions /
pendingManualVerifications arrays to read real records (was [] in PR 1).
Two callsites instrumented as the PR 2 wiring beachhead:
- env-selection in src/commands/helpers.ts (Choice mirror + answer)
- event-plan-approval in src/lib/wizard-tools.ts (Verification mirror)

Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI.
No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Resume command ignores session-id for action derivation
    • Added an optional sessionId parameter to computeLastStoppingPoint so it looks up the specific session and scopes task filtering to that session, and updated the resume command handler to pass the user-specified session ID.

Create PR

Or push these changes by commenting:

@cursor push 13e12425d3
Preview (13e12425d3)
diff --git a/src/commands/orchestration.ts b/src/commands/orchestration.ts
--- a/src/commands/orchestration.ts
+++ b/src/commands/orchestration.ts
@@ -559,7 +559,9 @@
           else getUI().log.error(`Session ${sessionIdRaw} not found`);
           process.exit(ExitCode.INVALID_ARGS);
         }
-        const lsp = computeLastStoppingPoint(opts.installDir);
+        const lsp = computeLastStoppingPoint(opts.installDir, {
+          sessionId: sessionId!,
+        });
         const command = lsp.nextAction.command;
         const description = lsp.nextAction.description;
 

diff --git a/src/lib/orchestration/last-stopping-point.ts b/src/lib/orchestration/last-stopping-point.ts
--- a/src/lib/orchestration/last-stopping-point.ts
+++ b/src/lib/orchestration/last-stopping-point.ts
@@ -16,7 +16,13 @@
 import { execFileSync } from 'node:child_process';
 
 import { TaskLifecycle, isActive } from './lifecycle';
-import type { LastStoppingPoint, NextAction, Ownership, Task } from './state';
+import type {
+  LastStoppingPoint,
+  NextAction,
+  Ownership,
+  SessionId,
+  Task,
+} from './state';
 import { getOrchestrationStore } from './store';
 import { CLI_INVOCATION } from '../../commands/context';
 
@@ -182,23 +188,27 @@
  */
 export function computeLastStoppingPoint(
   installDir: string,
-  options?: { now?: number; cliInvocation?: string[] },
+  options?: { now?: number; cliInvocation?: string[]; sessionId?: SessionId },
 ): LastStoppingPoint {
   const now = options?.now ?? Date.now();
   const cutoff = now - TWENTY_FOUR_HOURS_MS;
   const store = getOrchestrationStore(installDir);
   const file = store.read();
 
-  const session =
-    file.sessions
-      .filter((s) => s.status === 'active')
-      .sort((a, b) => b.createdAt - a.createdAt)[0] ?? null;
+  const session = options?.sessionId
+    ? file.sessions.find((s) => s.id === options.sessionId) ?? null
+    : file.sessions
+        .filter((s) => s.status === 'active')
+        .sort((a, b) => b.createdAt - a.createdAt)[0] ?? null;
 
   const branch = session?.branch ?? tryDetectBranch(installDir);
   const worktree = session?.worktree ?? tryDetectWorktree(installDir);
 
+  const scopeToSession = (t: Task): boolean =>
+    !options?.sessionId || t.sessionId === options.sessionId;
+
   const activeTasks = file.tasks
-    .filter((t) => isActive(t.state))
+    .filter((t) => isActive(t.state) && scopeToSession(t))
     .sort((a, b) => b.updatedAt - a.updatedAt);
   const stoppedTasks = file.tasks
     .filter(
@@ -206,11 +216,17 @@
         (t.state === TaskLifecycle.Failed ||
           t.state === TaskLifecycle.Cancelled ||
           t.state === TaskLifecycle.Superseded) &&
-        t.updatedAt >= cutoff,
+        t.updatedAt >= cutoff &&
+        scopeToSession(t),
     )
     .sort((a, b) => b.updatedAt - a.updatedAt);
   const recentlyCompletedTasks = file.tasks
-    .filter((t) => t.state === TaskLifecycle.Completed && t.updatedAt >= cutoff)
+    .filter(
+      (t) =>
+        t.state === TaskLifecycle.Completed &&
+        t.updatedAt >= cutoff &&
+        scopeToSession(t),
+    )
     .sort((a, b) => b.updatedAt - a.updatedAt);
 
   // Aggregate ownership across all live tasks plus recently-stopped tasks

You can send follow-ups to the cloud agent here.

Comment thread src/commands/orchestration.ts
kelsonpw added a commit that referenced this pull request May 9, 2026
…ty + perf hot-paths + resilience

Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2.

PR 3 lands the state-driven foundation that the broader v2 TUI redesign
will sit on. Five concerns, all additive — every PR 1 + PR 2 surface
keeps working unchanged.

A. TUI v2 wiring — `/status` overlay renders the same data
   `wizard orchestration status --json` emits, sectioned for human
   reading. ManualVerificationRibbon mounts on OutroScreen so success-
   looking UI cannot appear while a verification is pending.
   ChoiceCheckpointBanner is a reusable primitive for surfacing typed
   Choice records with the full UX contract (why-asking, recommended,
   safe-default, reversibility, consequence-if-skipped).

B. MCP-server tool parity — every read-only orchestration CLI command
   now has a matching MCP tool. Both surfaces call into the same
   builders in `src/lib/orchestration/envelopes.ts`, so output is
   byte-for-byte identical (modulo `generatedAt`). Server stays
   read-only by design — mutators stay on the CLI.

C. Perf hot-paths — `withReadCache(fn)` amortises store reads across
   builders inside one command/tool invocation. `per-run-cache.ts`
   memoises repeated `gh pr view` / MCP-availability calls within a
   single run.

D. Bugs found and fixed —
   - success-looking UI while blocked on a verification → ribbon
   - choices asked again after a durable answer → addChoice de-dup
     (covered in PR 2; regression test added)
   - skipped MCP apps not remembered → covered by anti-nag invariant
     (PR 2; surfaced via /status)
   Background agents continuing after cancellation: out of scope —
   call out as known limitation.

E. Resilience — token-expired-during-long-task. agent-runner's
   AUTH_ERROR branch now mirrors the K/R question to a durable Choice
   (kind=keep_or_revert_files) plus a manual_pr_test Verification.
   `wizard status --json` thereafter shows
   `nextAction.kind === 'await_user_choice'`.

F. Tests — 40+ new tests:
   - envelope schema parity (CLI ↔ MCP tool)
   - StatusOverlay rendering all sections
   - ChoiceCheckpoint UX contract (every required field surfaced)
   - OutroScreen verification ribbon regression
   - per-run-cache (memoize / memoizeAsync / invalidate)
   - auth-error resilience (Choice + Verification + LSP shape)
   - perf-status-cold (internal-cold-start bound)
   All 3919 unit tests pass; 100/100 BDD scenarios pass.

G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI
   integration model, envelopes layer, MCP tool parity table, perf
   measurements, resilience flow). New `docs/agent-consumability.md`
   covers CLI / MCP / NDJSON consumption with worked examples (Claude
   Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated.

Out of scope (future PRs):
- Full TUI screen-tree redesign / information-architecture refactor.
- Widening the Choice/Verification wiring beachhead beyond
  env-selection + event-plan-approval.
- Retiring legacy `WizardSession`.
- esbuild-bundled CLI for sub-200ms cold-start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw
Copy link
Copy Markdown
Member Author

kelsonpw commented May 9, 2026

Bugbot triage pass complete: 0 stale, 2 live (both fixed in e1bad84 — Windows spawn + tilde expansion), 0 defensible.

`wizard resume <session-id>` validated the requested session existed but
then called `computeLastStoppingPoint(installDir)` without scoping, which
always derived its next action from the *most recently created active
session* — not the one the user asked about. The envelope's `sessionId`
field still echoed the requested ID, so the command/description shown
could describe a different session entirely.

`computeLastStoppingPoint` now accepts an optional `sessionId` that
restricts both session metadata and task buckets to that session. The
resume command threads the resolved session id through, and an added
test pins the scoping behavior against a two-session fixture.
kelsonpw added a commit that referenced this pull request May 9, 2026
…kpoints + MCP-app lifecycle

Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of
the v2 orchestration foundation:

- Choice — typed user-choice records with stable promptId for de-dup,
  requiresHuman automation gate, and full status transitions
  (pending → answered/expired/cancelled/superseded).
- Verification — manual out-of-band verification records with
  status transitions (pending → passed/failed/skipped, skipped/failed
  may recover to passed; passed/skipped/failed may supersede).
- McpAppCapability — durable lifecycle for every MCP-app capability
  with an anti-nag invariant: install_skipped → needs_user_choice
  REQUIRES a non-empty lastStateChangeReason.

New CLI commands:
- wizard choice list / show / answer (with --confirm-human gate)
- wizard verification list / show / mark

Wires last-stopping-point's pendingChoices / pendingMcpActions /
pendingManualVerifications arrays to read real records (was [] in PR 1).
Two callsites instrumented as the PR 2 wiring beachhead:
- env-selection in src/commands/helpers.ts (Choice mirror + answer)
- event-plan-approval in src/lib/wizard-tools.ts (Verification mirror)

Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI.
No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Missing error handler on spawned child process
    • Added a child.on('error', ...) handler that logs the error (respecting JSON output mode) and exits with ExitCode.GENERAL_ERROR, preventing an uncaught exception if the spawned binary fails to launch.
  • ✅ Fixed: Redundant identical ensureDir calls in saveStore
    • Removed the redundant ensureDir(getRunDir(next.installDir)) call since ensureDir(dirname(path)) already creates the same directory, and removed the now-unused getRunDir import.

Create PR

Or push these changes by commenting:

@cursor push e57619e3cb
Preview (e57619e3cb)
diff --git a/src/commands/orchestration.ts b/src/commands/orchestration.ts
--- a/src/commands/orchestration.ts
+++ b/src/commands/orchestration.ts
@@ -563,7 +563,7 @@
         // and description belong to the session the user asked for, not the
         // most-recently-active session in the store.
         const lsp = computeLastStoppingPoint(opts.installDir, {
-          sessionId: session!.id,
+          sessionId: session.id,
         });
         const command = lsp.nextAction.command;
         const description = lsp.nextAction.description;
@@ -609,6 +609,15 @@
             process.exit(ExitCode.GENERAL_ERROR);
           }
           const child = spawn(cmd, rest, { stdio: 'inherit' });
+          child.on('error', (err) => {
+            if (opts.jsonOutput)
+              emitJsonError(`Failed to spawn resume command: ${err.message}`);
+            else
+              getUI().log.error(
+                `Failed to spawn resume command: ${err.message}`,
+              );
+            process.exit(ExitCode.GENERAL_ERROR);
+          });
           child.on('exit', (code) => {
             process.exit(code ?? 0);
           });

diff --git a/src/lib/orchestration/store.ts b/src/lib/orchestration/store.ts
--- a/src/lib/orchestration/store.ts
+++ b/src/lib/orchestration/store.ts
@@ -43,7 +43,6 @@
 import { OrchestrationStoreFileSchema } from './schemas';
 import { TaskLifecycle, assertTransition, isTerminal } from './lifecycle';
 import { getOrchestrationStoreFile } from './storage-paths';
-import { getRunDir } from '../../utils/storage-paths';
 import { dirname } from 'node:path';
 
 // ── Id helpers ────────────────────────────────────────────────────────
@@ -155,7 +154,6 @@
   OrchestrationStoreFileSchema.parse(next);
   const path = getOrchestrationStoreFile(next.installDir);
   ensureDir(dirname(path));
-  ensureDir(getRunDir(next.installDir));
   atomicWriteJSON(path, next, { mode: 0o600 });
 }

You can send follow-ups to the cloud agent here.

Comment thread src/commands/orchestration.ts
Comment thread src/lib/orchestration/store.ts Outdated
kelsonpw added a commit that referenced this pull request May 9, 2026
…ty + perf hot-paths + resilience

Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2.

PR 3 lands the state-driven foundation that the broader v2 TUI redesign
will sit on. Five concerns, all additive — every PR 1 + PR 2 surface
keeps working unchanged.

A. TUI v2 wiring — `/status` overlay renders the same data
   `wizard orchestration status --json` emits, sectioned for human
   reading. ManualVerificationRibbon mounts on OutroScreen so success-
   looking UI cannot appear while a verification is pending.
   ChoiceCheckpointBanner is a reusable primitive for surfacing typed
   Choice records with the full UX contract (why-asking, recommended,
   safe-default, reversibility, consequence-if-skipped).

B. MCP-server tool parity — every read-only orchestration CLI command
   now has a matching MCP tool. Both surfaces call into the same
   builders in `src/lib/orchestration/envelopes.ts`, so output is
   byte-for-byte identical (modulo `generatedAt`). Server stays
   read-only by design — mutators stay on the CLI.

C. Perf hot-paths — `withReadCache(fn)` amortises store reads across
   builders inside one command/tool invocation. `per-run-cache.ts`
   memoises repeated `gh pr view` / MCP-availability calls within a
   single run.

D. Bugs found and fixed —
   - success-looking UI while blocked on a verification → ribbon
   - choices asked again after a durable answer → addChoice de-dup
     (covered in PR 2; regression test added)
   - skipped MCP apps not remembered → covered by anti-nag invariant
     (PR 2; surfaced via /status)
   Background agents continuing after cancellation: out of scope —
   call out as known limitation.

E. Resilience — token-expired-during-long-task. agent-runner's
   AUTH_ERROR branch now mirrors the K/R question to a durable Choice
   (kind=keep_or_revert_files) plus a manual_pr_test Verification.
   `wizard status --json` thereafter shows
   `nextAction.kind === 'await_user_choice'`.

F. Tests — 40+ new tests:
   - envelope schema parity (CLI ↔ MCP tool)
   - StatusOverlay rendering all sections
   - ChoiceCheckpoint UX contract (every required field surfaced)
   - OutroScreen verification ribbon regression
   - per-run-cache (memoize / memoizeAsync / invalidate)
   - auth-error resilience (Choice + Verification + LSP shape)
   - perf-status-cold (internal-cold-start bound)
   All 3919 unit tests pass; 100/100 BDD scenarios pass.

G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI
   integration model, envelopes layer, MCP tool parity table, perf
   measurements, resilience flow). New `docs/agent-consumability.md`
   covers CLI / MCP / NDJSON consumption with worked examples (Claude
   Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated.

Out of scope (future PRs):
- Full TUI screen-tree redesign / information-architecture refactor.
- Widening the Choice/Verification wiring beachhead beyond
  env-selection + event-plan-approval.
- Retiring legacy `WizardSession`.
- esbuild-bundled CLI for sub-200ms cold-start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added 4 commits May 9, 2026 10:06
- `resume --execute` now attaches `child.on('error', …)` before `exit`.
  Previously a synchronous spawn failure (ENOENT, EACCES, missing PATH
  entry on Windows) fired an unhandled `error` event, which Node's
  EventEmitter rethrows — crashing the CLI with a stack trace instead
  of producing a clean message + GENERAL_ERROR exit.
- `saveStore` was calling `ensureDir(dirname(path))` and then
  `ensureDir(getRunDir(installDir))` — both resolve to the same run
  directory because `getOrchestrationStoreFile()` is defined as
  `join(getRunDir(installDir), 'orchestration.json')`. Drop the second
  call and the now-unused `getRunDir` import.
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Description uses hardcoded CLI_INVOCATION instead of configurable invocation
    • Replaced the hardcoded CLI_INVOCATION on line 170 with cliPrefix.join(' ') so the description string uses the same configurable invocation as the command array.

Create PR

Or push these changes by commenting:

@cursor push b9a4e38b5d
Preview (b9a4e38b5d)
diff --git a/src/lib/orchestration/last-stopping-point.ts b/src/lib/orchestration/last-stopping-point.ts
--- a/src/lib/orchestration/last-stopping-point.ts
+++ b/src/lib/orchestration/last-stopping-point.ts
@@ -167,7 +167,9 @@
     const recent = args.stoppedTasks[0];
     return {
       kind: 'inspect_failure',
-      description: `Most recent stop: ${recent.label} (${recent.state}). Inspect with \`${CLI_INVOCATION} task ${recent.id}\`.`,
+      description: `Most recent stop: ${recent.label} (${
+        recent.state
+      }). Inspect with \`${cliPrefix.join(' ')} task ${recent.id}\`.`,
       command: [...cliPrefix, 'task', recent.id, ...installDirArgs],
     };
   }

You can send follow-ups to the cloud agent here.

Comment thread src/lib/orchestration/last-stopping-point.ts Outdated
`deriveNextAction` builds an `inspect_failure` next-action when the
most recent task has stopped. The structured `command` array uses
the configurable `cliPrefix` (sourced from `args.invocation`, which
flows from `options.cliInvocation` on `computeLastStoppingPoint`),
but the inline shell hint embedded in `description` was templating
the hardcoded module-level `CLI_INVOCATION` constant. A custom
invocation (e.g. an alternate `wizard` symlink, or a test harness
overriding the binary name) would surface a description that says
\`amplitude-wizard task <id>\` while the JSON payload's `command`
points at the configured executable. Use `cliPrefix.join(' ')` for
both so the human and machine views always agree.
kelsonpw added a commit that referenced this pull request May 9, 2026
…dundancy + supervisor + live status refresh (stacks on #691)

Stacks on #691#690#689. Merge after PRs 1+2+3.

## Summary

- **Beachhead widening**: centralized `record*Choice` / `record*Verification`
  helpers in `src/lib/orchestration/wiring.ts` and wired them through every
  major user-choice and manual-verification surface in the wizard (MCP install,
  Slack, region select, OAuth browser login, project creation, dashboard
  setup, event-plan revision, logout). Existing TUI screens / agent prompts
  continue to drive the user-facing flow; the orchestration store mirror is
  ADDITIVE so outer agents inspecting `wizard status --json` see typed records.
  Mirror failures swallow + log so they NEVER break the user-facing path.
- **WizardSession boundary**: docblock at the top of `wizard-session.ts`
  now spells out the contract — `WizardSession` = transient TUI display
  state; `OrchestrationStore` = durable orchestration state; never duplicate
  fields between them. Audit table in `docs/orchestration.md` (PR 4 section)
  walks every field. PR 4 deletes zero fields by design — the redundant
  *concept* (Subagent / Task / Ownership double-bookkeeping) was already
  avoided in PR 1; PR 4 cements the contract for PR 5's screen-tree redesign.
- **Background-agent supervision**: new `Supervisor` class in
  `src/lib/orchestration/supervisor.ts`. Tracks subprocess PIDs that map to
  `Subagent` rows, writes `<runDir>/heartbeats/<pid>.txt` every 5s,
  SIGTERMs on SIGINT/SIGTERM (with 5s grace before SIGKILL), reaps stale
  heartbeats (>30s old + PID gone) by transitioning the rooted Task to
  `cancelled`. Startup recovery transitions orphaned-but-running Tasks to
  `failed: 'process gone'`. Eliminates the "stopped agents shown as running"
  drift.
- **Live `/status` refresh**: new `watchOrchestrationStore` (debounced 200ms,
  watches the parent dir to survive `atomicWriteJSON`'s rename) +
  `useOrchestrationStore` React hook. `StatusOverlayScreen` plumbs the hook
  in so the overlay re-renders when a sibling shell mutates the store via
  `wizard choice answer`, `wizard verification mark`, etc.

## Tests

+30 tests (3919 → 3949 vitest, 100/100 BDD):
- 20 wiring tests (each `record*` helper, dedup invariant, answerByPromptId,
  anti-nag re-record, verification mark-passed contract)
- 5 supervisor tests (track + heartbeat write, terminateAll + signal/marking,
  stale-heartbeat reap, recoverOrphanedSubagents, untrack)
- 5 watcher tests (write fires onChange, debounce coalesces a burst,
  dispose idempotency, no-fire-after-dispose, late-mount before file exists)

All test surfaces:
- `pnpm exec vitest run --pool=forks --maxWorkers=1` → 3949/3949
- `pnpm test:bdd` → 100/100
- `pnpm build` → green
- `pnpm lint` → green (1 pre-existing warning unchanged)
- `pnpm exec tsc --noEmit -p tsconfig.json` → clean

## Backward compatibility

- No public-contract changes. `wizard status --json`, `wizard choice list`,
  `wizard verification list`, and the MCP server's read-only tools all keep
  emitting the same envelope shapes; PR 4 just produces *more* records in
  them.
- AI-SDK migration unaffected — no fields removed from `WizardSession`.

## Known limitations

- TUI screen-tree redesign still PR 5.
- Cold-start bundling still a follow-up.
- Some less-trafficked prompt surfaces (the inner agent's `choose` tool,
  per-tool MCP auth confirmations) intentionally keep their existing
  transient-text path. The audit table in `docs/orchestration.md` documents
  what was wired and what was skipped (and why).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Resume command string breaks on paths with spaces
    • Added a shellJoin helper that single-quotes any argv element containing whitespace or shell metacharacters, and replaced all five command.join(' ') call sites in last-stopping-point.ts and orchestration.ts with it.

Create PR

Or push these changes by commenting:

@cursor push de7da72a3c
Preview (de7da72a3c)
diff --git a/src/commands/orchestration.ts b/src/commands/orchestration.ts
--- a/src/commands/orchestration.ts
+++ b/src/commands/orchestration.ts
@@ -536,7 +536,7 @@
         const { getOrchestrationStore } = await import(
           '../lib/orchestration/store.js'
         );
-        const { computeLastStoppingPoint } = await import(
+        const { computeLastStoppingPoint, shellJoin } = await import(
           '../lib/orchestration/last-stopping-point.js'
         );
         const { ResumeEnvelopeSchema } = await import(
@@ -583,7 +583,7 @@
         } else {
           const ui = getUI();
           ui.log.info(description);
-          ui.log.info(`Resume: ${chalk.bold(command.join(' '))}`);
+          ui.log.info(`Resume: ${chalk.bold(shellJoin(command))}`);
           if (!execute) {
             ui.log.info(
               chalk.dim('(pass --execute to invoke this command directly)'),
@@ -666,7 +666,7 @@
               const { getOrchestrationStore } = await import(
                 '../lib/orchestration/store.js'
               );
-              const { computeLastStoppingPoint } = await import(
+              const { computeLastStoppingPoint, shellJoin } = await import(
                 '../lib/orchestration/last-stopping-point.js'
               );
               const { StatusEnvelopeSchema } = await import(
@@ -696,7 +696,7 @@
                   );
                   ui.log.info(lsp.nextAction.description);
                   ui.log.info(
-                    `Resume: ${chalk.bold(lsp.nextAction.command.join(' '))}`,
+                    `Resume: ${chalk.bold(shellJoin(lsp.nextAction.command))}`,
                   );
                 } else {
                   ui.log.info(
@@ -729,7 +729,7 @@
                   ui.log.info(`Next action: ${lsp.nextAction.description}`);
                   ui.log.info(
                     `Resume:      ${chalk.bold(
-                      lsp.nextAction.command.join(' '),
+                      shellJoin(lsp.nextAction.command),
                     )}`,
                   );
                 }

diff --git a/src/lib/orchestration/last-stopping-point.ts b/src/lib/orchestration/last-stopping-point.ts
--- a/src/lib/orchestration/last-stopping-point.ts
+++ b/src/lib/orchestration/last-stopping-point.ts
@@ -66,6 +66,18 @@
   }
 }
 
+/**
+ * Join an argv array into a copy-pasteable shell string, quoting any
+ * arguments that contain whitespace or shell metacharacters.
+ */
+export function shellJoin(argv: readonly string[]): string {
+  return argv
+    .map((arg) =>
+      /[\s"'\\`$!#&|;()<>]/.test(arg) ? `'${arg.replace(/'/g, "'\\''")}'` : arg,
+    )
+    .join(' ');
+}
+
 function dedupeOwnership(ownership: Ownership[]): Ownership[] {
   const seen = new Set<string>();
   const out: Ownership[] = [];
@@ -171,7 +183,7 @@
     // `CLI_INVOCATION` here meant a custom invocation (e.g. test harness,
     // alternate `wizard` symlink) would print the wrong command name in
     // the human-readable hint while emitting the correct one in JSON.
-    const cliInline = cliPrefix.join(' ');
+    const cliInline = shellJoin(cliPrefix);
     return {
       kind: 'inspect_failure',
       description: `Most recent stop: ${recent.label} (${recent.state}). Inspect with \`${cliInline} task ${recent.id}\`.`,
@@ -296,6 +308,6 @@
     pendingMcpActions,
     pendingManualVerifications,
     nextAction,
-    resumeCommand: nextAction.command.join(' '),
+    resumeCommand: shellJoin(nextAction.command),
   };
 }

You can send follow-ups to the cloud agent here.

Comment thread src/lib/orchestration/last-stopping-point.ts Outdated
kelsonpw added a commit that referenced this pull request May 9, 2026
…rdown (stacks on #693)

Stacks on #693#691#690#689. Merge after PRs 1+2+3+4.

PR 5 turns the TUI from "screens that mostly work" into a serious
operator interface with a coherent IA, shared glyph vocabulary, and
render-cost discipline.

IA redesign:
- Three-zone layout (header / body / chrome).
- Header: JourneyStepper + identity + mode badge. Mode badge surfaces
  agent / ci / nested / mcp-server states; suppressed in plain
  interactive mode.
- Operator Overview screen (`/status`) reframed: title + mode badge
  + 1-line summary, then sectioned by Session / Primary work /
  Background / Pending choices / Pending verifications / MCP
  capabilities / Owned artifacts / Next action. Live-refresh on
  orchestration store mutations via PR 4's file-watcher hook.

Glyph palette (canonical vocabulary):
  ○ queued · › running · … waiting · ⏸ blocked · ✓ completed
  ✗ failed · ⊘ cancelled · ⮕ superseded
Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future
"swap one glyph" change is a one-line edit, not a hunt across the
screen tree. Pinned by unit tests so silent drift trips a test.

Slash command coherence:
- New `/help` command lists every registered command grouped by
  "available anytime" vs "available before/after a setup run".
  When a run is active, the second group is renamed "paused while
  a setup run is active (Ctrl+C to cancel, then retry)" so the user
  knows exactly why a command can't fire.
- Multi-line command feedback (e.g. /help, /diagnostics) renders
  with hanging indent so it reads as one block.

Render-cost teardown:
- New `useWizardSelector(store, selector, isEqual?)` slice hook.
  Components subscribed to a slice no longer rerender for unrelated
  store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported
  for the common case.
- Render-cost benchmark fixture pins the contract: 3 task transitions
  + 5 status bumps → tasks slice 3 renders, status slice 5 renders,
  whole-store subscriber 8+ renders. Slicing cuts each subscriber's
  render budget by ~60%.

Tests added (40 over the base 3949):
- lifecycle-display vocabulary (5)
- mode-badge env resolution (9)
- /help text generation (6)
- HeaderBar mode badge rendering (5)
- useWizardSelector primitives + render-cost ceiling (4 + 3)
- StatusOverlayScreen glyph palette + summary + mode badge (7)
- StatusOverlayScreen Operator Overview reframing (existing test
  updated to match new section names) (1)

Build, lint, vitest (3989/3989), BDD (100/100) all green.

Backward compatibility:
- All existing slash commands continue to work the same way; /help is
  additive.
- /status overlay's data shape is unchanged from PR 3; only the
  rendering reorganized.
- --agent, --ci, --json, manifest, plan, apply, verify, MCP server,
  v: 1 envelope, exit codes — all unchanged.
- Mode badge is suppressed in plain interactive mode, preserving the
  prior header look for the most common case.
- ProgressList still uses a blank gutter for `pending` rows rather
  than the canonical ○ glyph (deliberate UX trade-off — see comment
  in ProgressList.tsx).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added 2 commits May 9, 2026 10:31
`resumeCommand` is the human-facing copy-pasteable form of
`nextAction.command`. It was built with `nextAction.command.join(' ')`,
which silently corrupts paths containing spaces (e.g. an `installDir`
of `/Users/me/my project` would land in the shell as two separate
words). The structured `command` array stayed correct, but the string
the user is invited to paste into a terminal would fail.

Add a small `shellJoin` / `shellQuote` helper that wraps tokens with
shell metacharacters or whitespace in single quotes (with the standard
`'\''` close/escape/reopen dance for embedded single quotes). Tokens
that are already shell-safe stay unquoted so the common case stays
readable.
…estration commands

The structured `lsp.nextAction.command` array is correct, but joining it with
` ` for the human-facing `Resume:` line splits paths-with-spaces into multiple
shell words. Switch every human-display callsite to `lsp.resumeCommand`, which
already shellQuotes via `shellJoin`. Mirrors the fix that was applied to the
LSP envelope, but covers the remaining `wizard resume` and `wizard
orchestration status` print paths.
kelsonpw added a commit that referenced this pull request May 9, 2026
…ty + perf hot-paths + resilience

Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2.

PR 3 lands the state-driven foundation that the broader v2 TUI redesign
will sit on. Five concerns, all additive — every PR 1 + PR 2 surface
keeps working unchanged.

A. TUI v2 wiring — `/status` overlay renders the same data
   `wizard orchestration status --json` emits, sectioned for human
   reading. ManualVerificationRibbon mounts on OutroScreen so success-
   looking UI cannot appear while a verification is pending.
   ChoiceCheckpointBanner is a reusable primitive for surfacing typed
   Choice records with the full UX contract (why-asking, recommended,
   safe-default, reversibility, consequence-if-skipped).

B. MCP-server tool parity — every read-only orchestration CLI command
   now has a matching MCP tool. Both surfaces call into the same
   builders in `src/lib/orchestration/envelopes.ts`, so output is
   byte-for-byte identical (modulo `generatedAt`). Server stays
   read-only by design — mutators stay on the CLI.

C. Perf hot-paths — `withReadCache(fn)` amortises store reads across
   builders inside one command/tool invocation. `per-run-cache.ts`
   memoises repeated `gh pr view` / MCP-availability calls within a
   single run.

D. Bugs found and fixed —
   - success-looking UI while blocked on a verification → ribbon
   - choices asked again after a durable answer → addChoice de-dup
     (covered in PR 2; regression test added)
   - skipped MCP apps not remembered → covered by anti-nag invariant
     (PR 2; surfaced via /status)
   Background agents continuing after cancellation: out of scope —
   call out as known limitation.

E. Resilience — token-expired-during-long-task. agent-runner's
   AUTH_ERROR branch now mirrors the K/R question to a durable Choice
   (kind=keep_or_revert_files) plus a manual_pr_test Verification.
   `wizard status --json` thereafter shows
   `nextAction.kind === 'await_user_choice'`.

F. Tests — 40+ new tests:
   - envelope schema parity (CLI ↔ MCP tool)
   - StatusOverlay rendering all sections
   - ChoiceCheckpoint UX contract (every required field surfaced)
   - OutroScreen verification ribbon regression
   - per-run-cache (memoize / memoizeAsync / invalidate)
   - auth-error resilience (Choice + Verification + LSP shape)
   - perf-status-cold (internal-cold-start bound)
   All 3919 unit tests pass; 100/100 BDD scenarios pass.

G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI
   integration model, envelopes layer, MCP tool parity table, perf
   measurements, resilience flow). New `docs/agent-consumability.md`
   covers CLI / MCP / NDJSON consumption with worked examples (Claude
   Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated.

Out of scope (future PRs):
- Full TUI screen-tree redesign / information-architecture refactor.
- Widening the Choice/Verification wiring beachhead beyond
  env-selection + event-plan-approval.
- Retiring legacy `WizardSession`.
- esbuild-bundled CLI for sub-200ms cold-start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 9, 2026
…dundancy + supervisor + live status refresh (stacks on #691)

Stacks on #691#690#689. Merge after PRs 1+2+3.

## Summary

- **Beachhead widening**: centralized `record*Choice` / `record*Verification`
  helpers in `src/lib/orchestration/wiring.ts` and wired them through every
  major user-choice and manual-verification surface in the wizard (MCP install,
  Slack, region select, OAuth browser login, project creation, dashboard
  setup, event-plan revision, logout). Existing TUI screens / agent prompts
  continue to drive the user-facing flow; the orchestration store mirror is
  ADDITIVE so outer agents inspecting `wizard status --json` see typed records.
  Mirror failures swallow + log so they NEVER break the user-facing path.
- **WizardSession boundary**: docblock at the top of `wizard-session.ts`
  now spells out the contract — `WizardSession` = transient TUI display
  state; `OrchestrationStore` = durable orchestration state; never duplicate
  fields between them. Audit table in `docs/orchestration.md` (PR 4 section)
  walks every field. PR 4 deletes zero fields by design — the redundant
  *concept* (Subagent / Task / Ownership double-bookkeeping) was already
  avoided in PR 1; PR 4 cements the contract for PR 5's screen-tree redesign.
- **Background-agent supervision**: new `Supervisor` class in
  `src/lib/orchestration/supervisor.ts`. Tracks subprocess PIDs that map to
  `Subagent` rows, writes `<runDir>/heartbeats/<pid>.txt` every 5s,
  SIGTERMs on SIGINT/SIGTERM (with 5s grace before SIGKILL), reaps stale
  heartbeats (>30s old + PID gone) by transitioning the rooted Task to
  `cancelled`. Startup recovery transitions orphaned-but-running Tasks to
  `failed: 'process gone'`. Eliminates the "stopped agents shown as running"
  drift.
- **Live `/status` refresh**: new `watchOrchestrationStore` (debounced 200ms,
  watches the parent dir to survive `atomicWriteJSON`'s rename) +
  `useOrchestrationStore` React hook. `StatusOverlayScreen` plumbs the hook
  in so the overlay re-renders when a sibling shell mutates the store via
  `wizard choice answer`, `wizard verification mark`, etc.

## Tests

+30 tests (3919 → 3949 vitest, 100/100 BDD):
- 20 wiring tests (each `record*` helper, dedup invariant, answerByPromptId,
  anti-nag re-record, verification mark-passed contract)
- 5 supervisor tests (track + heartbeat write, terminateAll + signal/marking,
  stale-heartbeat reap, recoverOrphanedSubagents, untrack)
- 5 watcher tests (write fires onChange, debounce coalesces a burst,
  dispose idempotency, no-fire-after-dispose, late-mount before file exists)

All test surfaces:
- `pnpm exec vitest run --pool=forks --maxWorkers=1` → 3949/3949
- `pnpm test:bdd` → 100/100
- `pnpm build` → green
- `pnpm lint` → green (1 pre-existing warning unchanged)
- `pnpm exec tsc --noEmit -p tsconfig.json` → clean

## Backward compatibility

- No public-contract changes. `wizard status --json`, `wizard choice list`,
  `wizard verification list`, and the MCP server's read-only tools all keep
  emitting the same envelope shapes; PR 4 just produces *more* records in
  them.
- AI-SDK migration unaffected — no fields removed from `WizardSession`.

## Known limitations

- TUI screen-tree redesign still PR 5.
- Cold-start bundling still a follow-up.
- Some less-trafficked prompt surfaces (the inner agent's `choose` tool,
  per-tool MCP auth confirmations) intentionally keep their existing
  transient-text path. The audit table in `docs/orchestration.md` documents
  what was wired and what was skipped (and why).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 9, 2026
…rdown (stacks on #693)

Stacks on #693#691#690#689. Merge after PRs 1+2+3+4.

PR 5 turns the TUI from "screens that mostly work" into a serious
operator interface with a coherent IA, shared glyph vocabulary, and
render-cost discipline.

IA redesign:
- Three-zone layout (header / body / chrome).
- Header: JourneyStepper + identity + mode badge. Mode badge surfaces
  agent / ci / nested / mcp-server states; suppressed in plain
  interactive mode.
- Operator Overview screen (`/status`) reframed: title + mode badge
  + 1-line summary, then sectioned by Session / Primary work /
  Background / Pending choices / Pending verifications / MCP
  capabilities / Owned artifacts / Next action. Live-refresh on
  orchestration store mutations via PR 4's file-watcher hook.

Glyph palette (canonical vocabulary):
  ○ queued · › running · … waiting · ⏸ blocked · ✓ completed
  ✗ failed · ⊘ cancelled · ⮕ superseded
Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future
"swap one glyph" change is a one-line edit, not a hunt across the
screen tree. Pinned by unit tests so silent drift trips a test.

Slash command coherence:
- New `/help` command lists every registered command grouped by
  "available anytime" vs "available before/after a setup run".
  When a run is active, the second group is renamed "paused while
  a setup run is active (Ctrl+C to cancel, then retry)" so the user
  knows exactly why a command can't fire.
- Multi-line command feedback (e.g. /help, /diagnostics) renders
  with hanging indent so it reads as one block.

Render-cost teardown:
- New `useWizardSelector(store, selector, isEqual?)` slice hook.
  Components subscribed to a slice no longer rerender for unrelated
  store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported
  for the common case.
- Render-cost benchmark fixture pins the contract: 3 task transitions
  + 5 status bumps → tasks slice 3 renders, status slice 5 renders,
  whole-store subscriber 8+ renders. Slicing cuts each subscriber's
  render budget by ~60%.

Tests added (40 over the base 3949):
- lifecycle-display vocabulary (5)
- mode-badge env resolution (9)
- /help text generation (6)
- HeaderBar mode badge rendering (5)
- useWizardSelector primitives + render-cost ceiling (4 + 3)
- StatusOverlayScreen glyph palette + summary + mode badge (7)
- StatusOverlayScreen Operator Overview reframing (existing test
  updated to match new section names) (1)

Build, lint, vitest (3989/3989), BDD (100/100) all green.

Backward compatibility:
- All existing slash commands continue to work the same way; /help is
  additive.
- /status overlay's data shape is unchanged from PR 3; only the
  rendering reorganized.
- --agent, --ci, --json, manifest, plan, apply, verify, MCP server,
  v: 1 envelope, exit codes — all unchanged.
- Mode badge is suppressed in plain interactive mode, preserving the
  prior header look for the most common case.
- ProgressList still uses a blank gutter for `pending` rows rather
  than the canonical ○ glyph (deliberate UX trade-off — see comment
  in ProgressList.tsx).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 9, 2026
…kpoints + MCP-app lifecycle

Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of
the v2 orchestration foundation:

- Choice — typed user-choice records with stable promptId for de-dup,
  requiresHuman automation gate, and full status transitions
  (pending → answered/expired/cancelled/superseded).
- Verification — manual out-of-band verification records with
  status transitions (pending → passed/failed/skipped, skipped/failed
  may recover to passed; passed/skipped/failed may supersede).
- McpAppCapability — durable lifecycle for every MCP-app capability
  with an anti-nag invariant: install_skipped → needs_user_choice
  REQUIRES a non-empty lastStateChangeReason.

New CLI commands:
- wizard choice list / show / answer (with --confirm-human gate)
- wizard verification list / show / mark

Wires last-stopping-point's pendingChoices / pendingMcpActions /
pendingManualVerifications arrays to read real records (was [] in PR 1).
Two callsites instrumented as the PR 2 wiring beachhead:
- env-selection in src/commands/helpers.ts (Choice mirror + answer)
- event-plan-approval in src/lib/wizard-tools.ts (Verification mirror)

Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI.
No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 9, 2026
…ty + perf hot-paths + resilience

Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2.

PR 3 lands the state-driven foundation that the broader v2 TUI redesign
will sit on. Five concerns, all additive — every PR 1 + PR 2 surface
keeps working unchanged.

A. TUI v2 wiring — `/status` overlay renders the same data
   `wizard orchestration status --json` emits, sectioned for human
   reading. ManualVerificationRibbon mounts on OutroScreen so success-
   looking UI cannot appear while a verification is pending.
   ChoiceCheckpointBanner is a reusable primitive for surfacing typed
   Choice records with the full UX contract (why-asking, recommended,
   safe-default, reversibility, consequence-if-skipped).

B. MCP-server tool parity — every read-only orchestration CLI command
   now has a matching MCP tool. Both surfaces call into the same
   builders in `src/lib/orchestration/envelopes.ts`, so output is
   byte-for-byte identical (modulo `generatedAt`). Server stays
   read-only by design — mutators stay on the CLI.

C. Perf hot-paths — `withReadCache(fn)` amortises store reads across
   builders inside one command/tool invocation. `per-run-cache.ts`
   memoises repeated `gh pr view` / MCP-availability calls within a
   single run.

D. Bugs found and fixed —
   - success-looking UI while blocked on a verification → ribbon
   - choices asked again after a durable answer → addChoice de-dup
     (covered in PR 2; regression test added)
   - skipped MCP apps not remembered → covered by anti-nag invariant
     (PR 2; surfaced via /status)
   Background agents continuing after cancellation: out of scope —
   call out as known limitation.

E. Resilience — token-expired-during-long-task. agent-runner's
   AUTH_ERROR branch now mirrors the K/R question to a durable Choice
   (kind=keep_or_revert_files) plus a manual_pr_test Verification.
   `wizard status --json` thereafter shows
   `nextAction.kind === 'await_user_choice'`.

F. Tests — 40+ new tests:
   - envelope schema parity (CLI ↔ MCP tool)
   - StatusOverlay rendering all sections
   - ChoiceCheckpoint UX contract (every required field surfaced)
   - OutroScreen verification ribbon regression
   - per-run-cache (memoize / memoizeAsync / invalidate)
   - auth-error resilience (Choice + Verification + LSP shape)
   - perf-status-cold (internal-cold-start bound)
   All 3919 unit tests pass; 100/100 BDD scenarios pass.

G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI
   integration model, envelopes layer, MCP tool parity table, perf
   measurements, resilience flow). New `docs/agent-consumability.md`
   covers CLI / MCP / NDJSON consumption with worked examples (Claude
   Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated.

Out of scope (future PRs):
- Full TUI screen-tree redesign / information-architecture refactor.
- Widening the Choice/Verification wiring beachhead beyond
  env-selection + event-plan-approval.
- Retiring legacy `WizardSession`.
- esbuild-bundled CLI for sub-200ms cold-start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 9, 2026
…dundancy + supervisor + live status refresh (stacks on #691)

Stacks on #691#690#689. Merge after PRs 1+2+3.

## Summary

- **Beachhead widening**: centralized `record*Choice` / `record*Verification`
  helpers in `src/lib/orchestration/wiring.ts` and wired them through every
  major user-choice and manual-verification surface in the wizard (MCP install,
  Slack, region select, OAuth browser login, project creation, dashboard
  setup, event-plan revision, logout). Existing TUI screens / agent prompts
  continue to drive the user-facing flow; the orchestration store mirror is
  ADDITIVE so outer agents inspecting `wizard status --json` see typed records.
  Mirror failures swallow + log so they NEVER break the user-facing path.
- **WizardSession boundary**: docblock at the top of `wizard-session.ts`
  now spells out the contract — `WizardSession` = transient TUI display
  state; `OrchestrationStore` = durable orchestration state; never duplicate
  fields between them. Audit table in `docs/orchestration.md` (PR 4 section)
  walks every field. PR 4 deletes zero fields by design — the redundant
  *concept* (Subagent / Task / Ownership double-bookkeeping) was already
  avoided in PR 1; PR 4 cements the contract for PR 5's screen-tree redesign.
- **Background-agent supervision**: new `Supervisor` class in
  `src/lib/orchestration/supervisor.ts`. Tracks subprocess PIDs that map to
  `Subagent` rows, writes `<runDir>/heartbeats/<pid>.txt` every 5s,
  SIGTERMs on SIGINT/SIGTERM (with 5s grace before SIGKILL), reaps stale
  heartbeats (>30s old + PID gone) by transitioning the rooted Task to
  `cancelled`. Startup recovery transitions orphaned-but-running Tasks to
  `failed: 'process gone'`. Eliminates the "stopped agents shown as running"
  drift.
- **Live `/status` refresh**: new `watchOrchestrationStore` (debounced 200ms,
  watches the parent dir to survive `atomicWriteJSON`'s rename) +
  `useOrchestrationStore` React hook. `StatusOverlayScreen` plumbs the hook
  in so the overlay re-renders when a sibling shell mutates the store via
  `wizard choice answer`, `wizard verification mark`, etc.

## Tests

+30 tests (3919 → 3949 vitest, 100/100 BDD):
- 20 wiring tests (each `record*` helper, dedup invariant, answerByPromptId,
  anti-nag re-record, verification mark-passed contract)
- 5 supervisor tests (track + heartbeat write, terminateAll + signal/marking,
  stale-heartbeat reap, recoverOrphanedSubagents, untrack)
- 5 watcher tests (write fires onChange, debounce coalesces a burst,
  dispose idempotency, no-fire-after-dispose, late-mount before file exists)

All test surfaces:
- `pnpm exec vitest run --pool=forks --maxWorkers=1` → 3949/3949
- `pnpm test:bdd` → 100/100
- `pnpm build` → green
- `pnpm lint` → green (1 pre-existing warning unchanged)
- `pnpm exec tsc --noEmit -p tsconfig.json` → clean

## Backward compatibility

- No public-contract changes. `wizard status --json`, `wizard choice list`,
  `wizard verification list`, and the MCP server's read-only tools all keep
  emitting the same envelope shapes; PR 4 just produces *more* records in
  them.
- AI-SDK migration unaffected — no fields removed from `WizardSession`.

## Known limitations

- TUI screen-tree redesign still PR 5.
- Cold-start bundling still a follow-up.
- Some less-trafficked prompt surfaces (the inner agent's `choose` tool,
  per-tool MCP auth confirmations) intentionally keep their existing
  transient-text path. The audit table in `docs/orchestration.md` documents
  what was wired and what was skipped (and why).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 9, 2026
…rdown (stacks on #693)

Stacks on #693#691#690#689. Merge after PRs 1+2+3+4.

PR 5 turns the TUI from "screens that mostly work" into a serious
operator interface with a coherent IA, shared glyph vocabulary, and
render-cost discipline.

IA redesign:
- Three-zone layout (header / body / chrome).
- Header: JourneyStepper + identity + mode badge. Mode badge surfaces
  agent / ci / nested / mcp-server states; suppressed in plain
  interactive mode.
- Operator Overview screen (`/status`) reframed: title + mode badge
  + 1-line summary, then sectioned by Session / Primary work /
  Background / Pending choices / Pending verifications / MCP
  capabilities / Owned artifacts / Next action. Live-refresh on
  orchestration store mutations via PR 4's file-watcher hook.

Glyph palette (canonical vocabulary):
  ○ queued · › running · … waiting · ⏸ blocked · ✓ completed
  ✗ failed · ⊘ cancelled · ⮕ superseded
Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future
"swap one glyph" change is a one-line edit, not a hunt across the
screen tree. Pinned by unit tests so silent drift trips a test.

Slash command coherence:
- New `/help` command lists every registered command grouped by
  "available anytime" vs "available before/after a setup run".
  When a run is active, the second group is renamed "paused while
  a setup run is active (Ctrl+C to cancel, then retry)" so the user
  knows exactly why a command can't fire.
- Multi-line command feedback (e.g. /help, /diagnostics) renders
  with hanging indent so it reads as one block.

Render-cost teardown:
- New `useWizardSelector(store, selector, isEqual?)` slice hook.
  Components subscribed to a slice no longer rerender for unrelated
  store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported
  for the common case.
- Render-cost benchmark fixture pins the contract: 3 task transitions
  + 5 status bumps → tasks slice 3 renders, status slice 5 renders,
  whole-store subscriber 8+ renders. Slicing cuts each subscriber's
  render budget by ~60%.

Tests added (40 over the base 3949):
- lifecycle-display vocabulary (5)
- mode-badge env resolution (9)
- /help text generation (6)
- HeaderBar mode badge rendering (5)
- useWizardSelector primitives + render-cost ceiling (4 + 3)
- StatusOverlayScreen glyph palette + summary + mode badge (7)
- StatusOverlayScreen Operator Overview reframing (existing test
  updated to match new section names) (1)

Build, lint, vitest (3989/3989), BDD (100/100) all green.

Backward compatibility:
- All existing slash commands continue to work the same way; /help is
  additive.
- /status overlay's data shape is unchanged from PR 3; only the
  rendering reorganized.
- --agent, --ci, --json, manifest, plan, apply, verify, MCP server,
  v: 1 envelope, exit codes — all unchanged.
- Mode badge is suppressed in plain interactive mode, preserving the
  prior header look for the most common case.
- ProgressList still uses a blank gutter for `pending` rows rather
  than the canonical ○ glyph (deliberate UX trade-off — see comment
  in ProgressList.tsx).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 9, 2026
…rdown (stacks on #693)

Stacks on #693#691#690#689. Merge after PRs 1+2+3+4.

PR 5 turns the TUI from "screens that mostly work" into a serious
operator interface with a coherent IA, shared glyph vocabulary, and
render-cost discipline.

IA redesign:
- Three-zone layout (header / body / chrome).
- Header: JourneyStepper + identity + mode badge. Mode badge surfaces
  agent / ci / nested / mcp-server states; suppressed in plain
  interactive mode.
- Operator Overview screen (`/status`) reframed: title + mode badge
  + 1-line summary, then sectioned by Session / Primary work /
  Background / Pending choices / Pending verifications / MCP
  capabilities / Owned artifacts / Next action. Live-refresh on
  orchestration store mutations via PR 4's file-watcher hook.

Glyph palette (canonical vocabulary):
  ○ queued · › running · … waiting · ⏸ blocked · ✓ completed
  ✗ failed · ⊘ cancelled · ⮕ superseded
Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future
"swap one glyph" change is a one-line edit, not a hunt across the
screen tree. Pinned by unit tests so silent drift trips a test.

Slash command coherence:
- New `/help` command lists every registered command grouped by
  "available anytime" vs "available before/after a setup run".
  When a run is active, the second group is renamed "paused while
  a setup run is active (Ctrl+C to cancel, then retry)" so the user
  knows exactly why a command can't fire.
- Multi-line command feedback (e.g. /help, /diagnostics) renders
  with hanging indent so it reads as one block.

Render-cost teardown:
- New `useWizardSelector(store, selector, isEqual?)` slice hook.
  Components subscribed to a slice no longer rerender for unrelated
  store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported
  for the common case.
- Render-cost benchmark fixture pins the contract: 3 task transitions
  + 5 status bumps → tasks slice 3 renders, status slice 5 renders,
  whole-store subscriber 8+ renders. Slicing cuts each subscriber's
  render budget by ~60%.

Tests added (40 over the base 3949):
- lifecycle-display vocabulary (5)
- mode-badge env resolution (9)
- /help text generation (6)
- HeaderBar mode badge rendering (5)
- useWizardSelector primitives + render-cost ceiling (4 + 3)
- StatusOverlayScreen glyph palette + summary + mode badge (7)
- StatusOverlayScreen Operator Overview reframing (existing test
  updated to match new section names) (1)

Build, lint, vitest (3989/3989), BDD (100/100) all green.

Backward compatibility:
- All existing slash commands continue to work the same way; /help is
  additive.
- /status overlay's data shape is unchanged from PR 3; only the
  rendering reorganized.
- --agent, --ci, --json, manifest, plan, apply, verify, MCP server,
  v: 1 envelope, exit codes — all unchanged.
- Mode badge is suppressed in plain interactive mode, preserving the
  prior header look for the most common case.
- ProgressList still uses a blank gutter for `pending` rows rather
  than the canonical ○ glyph (deliberate UX trade-off — see comment
  in ProgressList.tsx).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 10, 2026
…kpoints + MCP-app lifecycle

Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of
the v2 orchestration foundation:

- Choice — typed user-choice records with stable promptId for de-dup,
  requiresHuman automation gate, and full status transitions
  (pending → answered/expired/cancelled/superseded).
- Verification — manual out-of-band verification records with
  status transitions (pending → passed/failed/skipped, skipped/failed
  may recover to passed; passed/skipped/failed may supersede).
- McpAppCapability — durable lifecycle for every MCP-app capability
  with an anti-nag invariant: install_skipped → needs_user_choice
  REQUIRES a non-empty lastStateChangeReason.

New CLI commands:
- wizard choice list / show / answer (with --confirm-human gate)
- wizard verification list / show / mark

Wires last-stopping-point's pendingChoices / pendingMcpActions /
pendingManualVerifications arrays to read real records (was [] in PR 1).
Two callsites instrumented as the PR 2 wiring beachhead:
- env-selection in src/commands/helpers.ts (Choice mirror + answer)
- event-plan-approval in src/lib/wizard-tools.ts (Verification mirror)

Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI.
No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 10, 2026
…ty + perf hot-paths + resilience

Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2.

PR 3 lands the state-driven foundation that the broader v2 TUI redesign
will sit on. Five concerns, all additive — every PR 1 + PR 2 surface
keeps working unchanged.

A. TUI v2 wiring — `/status` overlay renders the same data
   `wizard orchestration status --json` emits, sectioned for human
   reading. ManualVerificationRibbon mounts on OutroScreen so success-
   looking UI cannot appear while a verification is pending.
   ChoiceCheckpointBanner is a reusable primitive for surfacing typed
   Choice records with the full UX contract (why-asking, recommended,
   safe-default, reversibility, consequence-if-skipped).

B. MCP-server tool parity — every read-only orchestration CLI command
   now has a matching MCP tool. Both surfaces call into the same
   builders in `src/lib/orchestration/envelopes.ts`, so output is
   byte-for-byte identical (modulo `generatedAt`). Server stays
   read-only by design — mutators stay on the CLI.

C. Perf hot-paths — `withReadCache(fn)` amortises store reads across
   builders inside one command/tool invocation. `per-run-cache.ts`
   memoises repeated `gh pr view` / MCP-availability calls within a
   single run.

D. Bugs found and fixed —
   - success-looking UI while blocked on a verification → ribbon
   - choices asked again after a durable answer → addChoice de-dup
     (covered in PR 2; regression test added)
   - skipped MCP apps not remembered → covered by anti-nag invariant
     (PR 2; surfaced via /status)
   Background agents continuing after cancellation: out of scope —
   call out as known limitation.

E. Resilience — token-expired-during-long-task. agent-runner's
   AUTH_ERROR branch now mirrors the K/R question to a durable Choice
   (kind=keep_or_revert_files) plus a manual_pr_test Verification.
   `wizard status --json` thereafter shows
   `nextAction.kind === 'await_user_choice'`.

F. Tests — 40+ new tests:
   - envelope schema parity (CLI ↔ MCP tool)
   - StatusOverlay rendering all sections
   - ChoiceCheckpoint UX contract (every required field surfaced)
   - OutroScreen verification ribbon regression
   - per-run-cache (memoize / memoizeAsync / invalidate)
   - auth-error resilience (Choice + Verification + LSP shape)
   - perf-status-cold (internal-cold-start bound)
   All 3919 unit tests pass; 100/100 BDD scenarios pass.

G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI
   integration model, envelopes layer, MCP tool parity table, perf
   measurements, resilience flow). New `docs/agent-consumability.md`
   covers CLI / MCP / NDJSON consumption with worked examples (Claude
   Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated.

Out of scope (future PRs):
- Full TUI screen-tree redesign / information-architecture refactor.
- Widening the Choice/Verification wiring beachhead beyond
  env-selection + event-plan-approval.
- Retiring legacy `WizardSession`.
- esbuild-bundled CLI for sub-200ms cold-start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 10, 2026
…dundancy + supervisor + live status refresh (stacks on #691)

Stacks on #691#690#689. Merge after PRs 1+2+3.

## Summary

- **Beachhead widening**: centralized `record*Choice` / `record*Verification`
  helpers in `src/lib/orchestration/wiring.ts` and wired them through every
  major user-choice and manual-verification surface in the wizard (MCP install,
  Slack, region select, OAuth browser login, project creation, dashboard
  setup, event-plan revision, logout). Existing TUI screens / agent prompts
  continue to drive the user-facing flow; the orchestration store mirror is
  ADDITIVE so outer agents inspecting `wizard status --json` see typed records.
  Mirror failures swallow + log so they NEVER break the user-facing path.
- **WizardSession boundary**: docblock at the top of `wizard-session.ts`
  now spells out the contract — `WizardSession` = transient TUI display
  state; `OrchestrationStore` = durable orchestration state; never duplicate
  fields between them. Audit table in `docs/orchestration.md` (PR 4 section)
  walks every field. PR 4 deletes zero fields by design — the redundant
  *concept* (Subagent / Task / Ownership double-bookkeeping) was already
  avoided in PR 1; PR 4 cements the contract for PR 5's screen-tree redesign.
- **Background-agent supervision**: new `Supervisor` class in
  `src/lib/orchestration/supervisor.ts`. Tracks subprocess PIDs that map to
  `Subagent` rows, writes `<runDir>/heartbeats/<pid>.txt` every 5s,
  SIGTERMs on SIGINT/SIGTERM (with 5s grace before SIGKILL), reaps stale
  heartbeats (>30s old + PID gone) by transitioning the rooted Task to
  `cancelled`. Startup recovery transitions orphaned-but-running Tasks to
  `failed: 'process gone'`. Eliminates the "stopped agents shown as running"
  drift.
- **Live `/status` refresh**: new `watchOrchestrationStore` (debounced 200ms,
  watches the parent dir to survive `atomicWriteJSON`'s rename) +
  `useOrchestrationStore` React hook. `StatusOverlayScreen` plumbs the hook
  in so the overlay re-renders when a sibling shell mutates the store via
  `wizard choice answer`, `wizard verification mark`, etc.

## Tests

+30 tests (3919 → 3949 vitest, 100/100 BDD):
- 20 wiring tests (each `record*` helper, dedup invariant, answerByPromptId,
  anti-nag re-record, verification mark-passed contract)
- 5 supervisor tests (track + heartbeat write, terminateAll + signal/marking,
  stale-heartbeat reap, recoverOrphanedSubagents, untrack)
- 5 watcher tests (write fires onChange, debounce coalesces a burst,
  dispose idempotency, no-fire-after-dispose, late-mount before file exists)

All test surfaces:
- `pnpm exec vitest run --pool=forks --maxWorkers=1` → 3949/3949
- `pnpm test:bdd` → 100/100
- `pnpm build` → green
- `pnpm lint` → green (1 pre-existing warning unchanged)
- `pnpm exec tsc --noEmit -p tsconfig.json` → clean

## Backward compatibility

- No public-contract changes. `wizard status --json`, `wizard choice list`,
  `wizard verification list`, and the MCP server's read-only tools all keep
  emitting the same envelope shapes; PR 4 just produces *more* records in
  them.
- AI-SDK migration unaffected — no fields removed from `WizardSession`.

## Known limitations

- TUI screen-tree redesign still PR 5.
- Cold-start bundling still a follow-up.
- Some less-trafficked prompt surfaces (the inner agent's `choose` tool,
  per-tool MCP auth confirmations) intentionally keep their existing
  transient-text path. The audit table in `docs/orchestration.md` documents
  what was wired and what was skipped (and why).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added a commit that referenced this pull request May 10, 2026
…rdown (stacks on #693)

Stacks on #693#691#690#689. Merge after PRs 1+2+3+4.

PR 5 turns the TUI from "screens that mostly work" into a serious
operator interface with a coherent IA, shared glyph vocabulary, and
render-cost discipline.

IA redesign:
- Three-zone layout (header / body / chrome).
- Header: JourneyStepper + identity + mode badge. Mode badge surfaces
  agent / ci / nested / mcp-server states; suppressed in plain
  interactive mode.
- Operator Overview screen (`/status`) reframed: title + mode badge
  + 1-line summary, then sectioned by Session / Primary work /
  Background / Pending choices / Pending verifications / MCP
  capabilities / Owned artifacts / Next action. Live-refresh on
  orchestration store mutations via PR 4's file-watcher hook.

Glyph palette (canonical vocabulary):
  ○ queued · › running · … waiting · ⏸ blocked · ✓ completed
  ✗ failed · ⊘ cancelled · ⮕ superseded
Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future
"swap one glyph" change is a one-line edit, not a hunt across the
screen tree. Pinned by unit tests so silent drift trips a test.

Slash command coherence:
- New `/help` command lists every registered command grouped by
  "available anytime" vs "available before/after a setup run".
  When a run is active, the second group is renamed "paused while
  a setup run is active (Ctrl+C to cancel, then retry)" so the user
  knows exactly why a command can't fire.
- Multi-line command feedback (e.g. /help, /diagnostics) renders
  with hanging indent so it reads as one block.

Render-cost teardown:
- New `useWizardSelector(store, selector, isEqual?)` slice hook.
  Components subscribed to a slice no longer rerender for unrelated
  store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported
  for the common case.
- Render-cost benchmark fixture pins the contract: 3 task transitions
  + 5 status bumps → tasks slice 3 renders, status slice 5 renders,
  whole-store subscriber 8+ renders. Slicing cuts each subscriber's
  render budget by ~60%.

Tests added (40 over the base 3949):
- lifecycle-display vocabulary (5)
- mode-badge env resolution (9)
- /help text generation (6)
- HeaderBar mode badge rendering (5)
- useWizardSelector primitives + render-cost ceiling (4 + 3)
- StatusOverlayScreen glyph palette + summary + mode badge (7)
- StatusOverlayScreen Operator Overview reframing (existing test
  updated to match new section names) (1)

Build, lint, vitest (3989/3989), BDD (100/100) all green.

Backward compatibility:
- All existing slash commands continue to work the same way; /help is
  additive.
- /status overlay's data shape is unchanged from PR 3; only the
  rendering reorganized.
- --agent, --ci, --json, manifest, plan, apply, verify, MCP server,
  v: 1 envelope, exit codes — all unchanged.
- Mode badge is suppressed in plain interactive mode, preserving the
  prior header look for the most common case.
- ProgressList still uses a blank gutter for `pending` rows rather
  than the canonical ○ glyph (deliberate UX trade-off — see comment
  in ProgressList.tsx).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kelsonpw added 3 commits May 12, 2026 16:26
…andle shape

`globalThis.clearInterval` has a strict `string | number | Timeout |
undefined` parameter; `opts.clearInterval` (declared `(handle: unknown)
=> void` so tests can override with a fake) is broader. Without an
explicit type annotation, the union `opts.clearInterval ??
globalThis.clearInterval` collapses to the strict shape and rejects
the `pollHandle: unknown` we pass at call sites.

Pin both `setIntervalFn` and `clearIntervalFn` to the options-interface
signature so the test override remains usable AND the inferred type
accepts `unknown` handles. No runtime change.
Persisted `orchestration_status` JSON (and every other user-facing
message that uses `CLI_INVOCATION`) was emitting `amplitude-wizard
--install-dir X` as the suggested resume command. That only works when
the user has explicitly run `npm install -g @amplitude/wizard` — npx
users see the hint and get "command not found".

The previous detection logic (`/_npx/` in argv[1] OR
`npm_command=exec`) caught most real-world npx invocations but missed
common ones: `pnpm try:prod`, `node dist/bin.js`, any wrapper that
strips npm env vars. The persisted `resumeCommand` from such a run
was then wrong for the user reading it back later.

Hardcoding to `npx @amplitude/wizard` is universally correct: it works
when npx defers to a global install AND when it fetches from the
registry. Removes a class of "command not found" support tickets.
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Unhandled errors in resolveCommonOpts silently crash commands
    • Moved resolveCommonOpts inside the try/catch block in all six command handlers and used optional chaining (opts?.jsonOutput) in catch blocks so errors are properly reported with structured output and ExitCode.GENERAL_ERROR.

Create PR

Or push these changes by commenting:

@cursor push 7f5c80d4c4
Preview (7f5c80d4c4)
diff --git a/src/commands/orchestration.ts b/src/commands/orchestration.ts
--- a/src/commands/orchestration.ts
+++ b/src/commands/orchestration.ts
@@ -129,12 +129,13 @@
     }),
   handler: (argv) => {
     void (async () => {
-      const opts = await resolveCommonOpts({
-        installDir: argv['install-dir'] as string | undefined,
-        json: argv.json as boolean | undefined,
-        human: argv.human as boolean | undefined,
-      });
+      let opts: CommonOpts | undefined;
       try {
+        opts = await resolveCommonOpts({
+          installDir: argv['install-dir'] as string | undefined,
+          json: argv.json as boolean | undefined,
+          human: argv.human as boolean | undefined,
+        });
         const { getOrchestrationStore } = await import(
           '../lib/orchestration/store.js'
         );
@@ -201,7 +202,7 @@
         process.exit(ExitCode.SUCCESS);
       } catch (e) {
         const message = e instanceof Error ? e.message : String(e);
-        if (opts.jsonOutput) emitJsonError(`tasks listing failed: ${message}`);
+        if (opts?.jsonOutput) emitJsonError(`tasks listing failed: ${message}`);
         else getUI().log.error(`Tasks listing failed: ${message}`);
         process.exit(ExitCode.GENERAL_ERROR);
       }
@@ -229,13 +230,14 @@
       }),
   handler: (argv) => {
     void (async () => {
-      const opts = await resolveCommonOpts({
-        installDir: argv['install-dir'] as string | undefined,
-        json: argv.json as boolean | undefined,
-        human: argv.human as boolean | undefined,
-      });
+      let opts: CommonOpts | undefined;
       const idRaw = argv.id as string;
       try {
+        opts = await resolveCommonOpts({
+          installDir: argv['install-dir'] as string | undefined,
+          json: argv.json as boolean | undefined,
+          human: argv.human as boolean | undefined,
+        });
         const { getOrchestrationStore } = await import(
           '../lib/orchestration/store.js'
         );
@@ -314,7 +316,7 @@
         process.exit(ExitCode.SUCCESS);
       } catch (e) {
         const message = e instanceof Error ? e.message : String(e);
-        if (opts.jsonOutput) emitJsonError(`task lookup failed: ${message}`);
+        if (opts?.jsonOutput) emitJsonError(`task lookup failed: ${message}`);
         else getUI().log.error(`Task lookup failed: ${message}`);
         process.exit(ExitCode.GENERAL_ERROR);
       }
@@ -336,12 +338,13 @@
     }),
   handler: (argv) => {
     void (async () => {
-      const opts = await resolveCommonOpts({
-        installDir: argv['install-dir'] as string | undefined,
-        json: argv.json as boolean | undefined,
-        human: argv.human as boolean | undefined,
-      });
+      let opts: CommonOpts | undefined;
       try {
+        opts = await resolveCommonOpts({
+          installDir: argv['install-dir'] as string | undefined,
+          json: argv.json as boolean | undefined,
+          human: argv.human as boolean | undefined,
+        });
         const { getOrchestrationStore } = await import(
           '../lib/orchestration/store.js'
         );
@@ -397,7 +400,7 @@
         process.exit(ExitCode.SUCCESS);
       } catch (e) {
         const message = e instanceof Error ? e.message : String(e);
-        if (opts.jsonOutput)
+        if (opts?.jsonOutput)
           emitJsonError(`sessions listing failed: ${message}`);
         else getUI().log.error(`Sessions listing failed: ${message}`);
         process.exit(ExitCode.GENERAL_ERROR);
@@ -426,13 +429,14 @@
       }),
   handler: (argv) => {
     void (async () => {
-      const opts = await resolveCommonOpts({
-        installDir: argv['install-dir'] as string | undefined,
-        json: argv.json as boolean | undefined,
-        human: argv.human as boolean | undefined,
-      });
+      let opts: CommonOpts | undefined;
       const idRaw = argv.id as string;
       try {
+        opts = await resolveCommonOpts({
+          installDir: argv['install-dir'] as string | undefined,
+          json: argv.json as boolean | undefined,
+          human: argv.human as boolean | undefined,
+        });
         const { getOrchestrationStore } = await import(
           '../lib/orchestration/store.js'
         );
@@ -490,7 +494,8 @@
         process.exit(ExitCode.SUCCESS);
       } catch (e) {
         const message = e instanceof Error ? e.message : String(e);
-        if (opts.jsonOutput) emitJsonError(`session lookup failed: ${message}`);
+        if (opts?.jsonOutput)
+          emitJsonError(`session lookup failed: ${message}`);
         else getUI().log.error(`Session lookup failed: ${message}`);
         process.exit(ExitCode.GENERAL_ERROR);
       }
@@ -525,14 +530,15 @@
       }),
   handler: (argv) => {
     void (async () => {
-      const opts = await resolveCommonOpts({
-        installDir: argv['install-dir'] as string | undefined,
-        json: argv.json as boolean | undefined,
-        human: argv.human as boolean | undefined,
-      });
+      let opts: CommonOpts | undefined;
       const sessionIdRaw = argv['session-id'] as string;
       const execute = Boolean(argv.execute);
       try {
+        opts = await resolveCommonOpts({
+          installDir: argv['install-dir'] as string | undefined,
+          json: argv.json as boolean | undefined,
+          human: argv.human as boolean | undefined,
+        });
         const { getOrchestrationStore } = await import(
           '../lib/orchestration/store.js'
         );
@@ -619,7 +625,7 @@
           // CLI failure.
           child.on('error', (err) => {
             const message = err instanceof Error ? err.message : String(err);
-            if (opts.jsonOutput)
+            if (opts?.jsonOutput)
               emitJsonError(`Failed to spawn resume command: ${message}`);
             else
               getUI().log.error(`Failed to spawn resume command: ${message}`);
@@ -633,7 +639,7 @@
         process.exit(ExitCode.SUCCESS);
       } catch (e) {
         const message = e instanceof Error ? e.message : String(e);
-        if (opts.jsonOutput) emitJsonError(`resume failed: ${message}`);
+        if (opts?.jsonOutput) emitJsonError(`resume failed: ${message}`);
         else getUI().log.error(`Resume failed: ${message}`);
         process.exit(ExitCode.GENERAL_ERROR);
       }
@@ -660,12 +666,13 @@
           }),
         (argv) => {
           void (async () => {
-            const opts = await resolveCommonOpts({
-              installDir: argv['install-dir'],
-              json: argv.json as boolean | undefined,
-              human: argv.human as boolean | undefined,
-            });
+            let opts: CommonOpts | undefined;
             try {
+              opts = await resolveCommonOpts({
+                installDir: argv['install-dir'],
+                json: argv.json as boolean | undefined,
+                human: argv.human as boolean | undefined,
+              });
               const { getOrchestrationStore } = await import(
                 '../lib/orchestration/store.js'
               );
@@ -734,7 +741,7 @@
               process.exit(ExitCode.SUCCESS);
             } catch (e) {
               const message = e instanceof Error ? e.message : String(e);
-              if (opts.jsonOutput)
+              if (opts?.jsonOutput)
                 emitJsonError(`orchestration status failed: ${message}`);
               else getUI().log.error(`Orchestration status failed: ${message}`);
               process.exit(ExitCode.GENERAL_ERROR);

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit be4dbdd. Configure here.

else getUI().log.error(`Tasks listing failed: ${message}`);
process.exit(ExitCode.GENERAL_ERROR);
}
})();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unhandled errors in resolveCommonOpts silently crash commands

Low Severity

In all six command handlers, resolveCommonOpts is called outside the try/catch block. If it rejects (e.g., process.cwd() throws because the working directory was deleted, or a dynamic import fails), the error propagates out of the void (async () => { … })() wrapper as an unhandled promise rejection. The process terminates with exit code 1 and no structured error output (emitJsonError or getUI().log.error) instead of the expected ExitCode.GENERAL_ERROR with a descriptive message. Moving resolveCommonOpts inside the try block (with a fallback for opts.jsonOutput) would make error reporting consistent with other failure paths.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit be4dbdd. Configure here.

@kelsonpw kelsonpw merged commit 5761ef7 into main May 13, 2026
11 checks passed
kelsonpw added a commit that referenced this pull request May 13, 2026
…lers (#759)

resolveCommonOpts was awaited OUTSIDE the try block in each of:
tasksCommand, taskCommand, sessionsCommand, sessionCommand,
resumeCommand, and `orchestration status`. If it rejected
(process.cwd() failed, dynamic import failed, or an unexpected throw
from mode-config / install-dir resolution), the rejection propagated
out of the `void (async () => {...})()` wrapper as an unhandled
promise. Node terminated with exit 1 and no structured error output,
instead of the expected ExitCode.GENERAL_ERROR + emitJsonError /
log.error path that the rest of the handler honors.

Lift the call inside try and use optional chaining `opts?.jsonOutput`
in catch blocks so a failure during resolveCommonOpts itself routes
through the same JSON-error / human-error reporting as any other
failure in the command body.

Bugbot finding on PR #689 (merged); fixing post-merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant