feat(orchestration): v2 foundation — durable state, lifecycle, status/resume commands (PR 1 of 3)#689
Conversation
…/resume commands (PR 1 of 3) Introduce src/lib/orchestration/ — a durable, file-backed orchestration store that becomes the source of truth for sessions, tasks, subagents, ownership, and last-stopping-point. Adds six new read-only CLI commands (tasks/task/sessions/session/resume/orchestration status), each emitting Zod-validated JSON envelopes for outer agents. Foundation only. Legacy WizardSession remains the live in-memory surface; PR 2 wires checkpoints + MCP-app lifecycle, PR 3 retires duplicate state and ships the TUI redesign + MCP-server tool parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Raw
spawnfromnode:child_processbreaks Windows- Replaced
await import('node:child_process')withawait import('../utils/cross-platform-spawn.js')so theamplitude-wizard.cmd shim is resolved correctly on Windows.
- Replaced
- ✅ Fixed: User-provided
--install-dirmissing tilde expansion- Added
resolveInstallDir(argv.installDir)inresolveCommonOptsto properly expand tilde and resolve the path, matching the pattern used by the dashboard command.
- Added
Or push these changes by commenting:
@cursor push e5add8f93d
Preview (e5add8f93d)
diff --git a/src/commands/orchestration.ts b/src/commands/orchestration.ts
--- a/src/commands/orchestration.ts
+++ b/src/commands/orchestration.ts
@@ -45,7 +45,8 @@
json?: boolean;
human?: boolean;
}): Promise<CommonOpts> {
- const installDir = argv.installDir ?? process.cwd();
+ const { resolveInstallDir } = await import('../utils/install-dir.js');
+ const installDir = resolveInstallDir(argv.installDir);
const { resolveMode } = await import('../lib/mode-config.js');
const { jsonOutput } = resolveMode({
json: argv.json,
@@ -583,7 +584,7 @@
if (execute) {
// Spawn the resume command. Default behavior is "print only" for
// safety — orchestrators that want auto-execution opt in.
- const { spawn } = await import('node:child_process');
+ const { spawn } = await import('../utils/cross-platform-spawn.js');
const [cmd, ...rest] = command;
if (!cmd) {
if (opts.jsonOutput)You can send follow-ups to the cloud agent here.
resolveCommonOpts now passes argv.installDir through resolveInstallDir so quoted/env-sourced `~` actually expands instead of being treated as a literal directory name. The resume --execute path now imports spawn from utils/cross-platform- spawn so the npm-installed `amplitude-wizard` .cmd shim resolves on Windows. Node's built-in spawn does not consult PATHEXT and would fail with ENOENT for every Windows user invoking `wizard resume --execute`.
…kpoints + MCP-app lifecycle Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of the v2 orchestration foundation: - Choice — typed user-choice records with stable promptId for de-dup, requiresHuman automation gate, and full status transitions (pending → answered/expired/cancelled/superseded). - Verification — manual out-of-band verification records with status transitions (pending → passed/failed/skipped, skipped/failed may recover to passed; passed/skipped/failed may supersede). - McpAppCapability — durable lifecycle for every MCP-app capability with an anti-nag invariant: install_skipped → needs_user_choice REQUIRES a non-empty lastStateChangeReason. New CLI commands: - wizard choice list / show / answer (with --confirm-human gate) - wizard verification list / show / mark Wires last-stopping-point's pendingChoices / pendingMcpActions / pendingManualVerifications arrays to read real records (was [] in PR 1). Two callsites instrumented as the PR 2 wiring beachhead: - env-selection in src/commands/helpers.ts (Choice mirror + answer) - event-plan-approval in src/lib/wizard-tools.ts (Verification mirror) Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI. No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Resume command ignores session-id for action derivation
- Added an optional
sessionIdparameter tocomputeLastStoppingPointso it looks up the specific session and scopes task filtering to that session, and updated theresumecommand handler to pass the user-specified session ID.
- Added an optional
Or push these changes by commenting:
@cursor push 13e12425d3
Preview (13e12425d3)
diff --git a/src/commands/orchestration.ts b/src/commands/orchestration.ts
--- a/src/commands/orchestration.ts
+++ b/src/commands/orchestration.ts
@@ -559,7 +559,9 @@
else getUI().log.error(`Session ${sessionIdRaw} not found`);
process.exit(ExitCode.INVALID_ARGS);
}
- const lsp = computeLastStoppingPoint(opts.installDir);
+ const lsp = computeLastStoppingPoint(opts.installDir, {
+ sessionId: sessionId!,
+ });
const command = lsp.nextAction.command;
const description = lsp.nextAction.description;
diff --git a/src/lib/orchestration/last-stopping-point.ts b/src/lib/orchestration/last-stopping-point.ts
--- a/src/lib/orchestration/last-stopping-point.ts
+++ b/src/lib/orchestration/last-stopping-point.ts
@@ -16,7 +16,13 @@
import { execFileSync } from 'node:child_process';
import { TaskLifecycle, isActive } from './lifecycle';
-import type { LastStoppingPoint, NextAction, Ownership, Task } from './state';
+import type {
+ LastStoppingPoint,
+ NextAction,
+ Ownership,
+ SessionId,
+ Task,
+} from './state';
import { getOrchestrationStore } from './store';
import { CLI_INVOCATION } from '../../commands/context';
@@ -182,23 +188,27 @@
*/
export function computeLastStoppingPoint(
installDir: string,
- options?: { now?: number; cliInvocation?: string[] },
+ options?: { now?: number; cliInvocation?: string[]; sessionId?: SessionId },
): LastStoppingPoint {
const now = options?.now ?? Date.now();
const cutoff = now - TWENTY_FOUR_HOURS_MS;
const store = getOrchestrationStore(installDir);
const file = store.read();
- const session =
- file.sessions
- .filter((s) => s.status === 'active')
- .sort((a, b) => b.createdAt - a.createdAt)[0] ?? null;
+ const session = options?.sessionId
+ ? file.sessions.find((s) => s.id === options.sessionId) ?? null
+ : file.sessions
+ .filter((s) => s.status === 'active')
+ .sort((a, b) => b.createdAt - a.createdAt)[0] ?? null;
const branch = session?.branch ?? tryDetectBranch(installDir);
const worktree = session?.worktree ?? tryDetectWorktree(installDir);
+ const scopeToSession = (t: Task): boolean =>
+ !options?.sessionId || t.sessionId === options.sessionId;
+
const activeTasks = file.tasks
- .filter((t) => isActive(t.state))
+ .filter((t) => isActive(t.state) && scopeToSession(t))
.sort((a, b) => b.updatedAt - a.updatedAt);
const stoppedTasks = file.tasks
.filter(
@@ -206,11 +216,17 @@
(t.state === TaskLifecycle.Failed ||
t.state === TaskLifecycle.Cancelled ||
t.state === TaskLifecycle.Superseded) &&
- t.updatedAt >= cutoff,
+ t.updatedAt >= cutoff &&
+ scopeToSession(t),
)
.sort((a, b) => b.updatedAt - a.updatedAt);
const recentlyCompletedTasks = file.tasks
- .filter((t) => t.state === TaskLifecycle.Completed && t.updatedAt >= cutoff)
+ .filter(
+ (t) =>
+ t.state === TaskLifecycle.Completed &&
+ t.updatedAt >= cutoff &&
+ scopeToSession(t),
+ )
.sort((a, b) => b.updatedAt - a.updatedAt);
// Aggregate ownership across all live tasks plus recently-stopped tasksYou can send follow-ups to the cloud agent here.
…ty + perf hot-paths + resilience Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2. PR 3 lands the state-driven foundation that the broader v2 TUI redesign will sit on. Five concerns, all additive — every PR 1 + PR 2 surface keeps working unchanged. A. TUI v2 wiring — `/status` overlay renders the same data `wizard orchestration status --json` emits, sectioned for human reading. ManualVerificationRibbon mounts on OutroScreen so success- looking UI cannot appear while a verification is pending. ChoiceCheckpointBanner is a reusable primitive for surfacing typed Choice records with the full UX contract (why-asking, recommended, safe-default, reversibility, consequence-if-skipped). B. MCP-server tool parity — every read-only orchestration CLI command now has a matching MCP tool. Both surfaces call into the same builders in `src/lib/orchestration/envelopes.ts`, so output is byte-for-byte identical (modulo `generatedAt`). Server stays read-only by design — mutators stay on the CLI. C. Perf hot-paths — `withReadCache(fn)` amortises store reads across builders inside one command/tool invocation. `per-run-cache.ts` memoises repeated `gh pr view` / MCP-availability calls within a single run. D. Bugs found and fixed — - success-looking UI while blocked on a verification → ribbon - choices asked again after a durable answer → addChoice de-dup (covered in PR 2; regression test added) - skipped MCP apps not remembered → covered by anti-nag invariant (PR 2; surfaced via /status) Background agents continuing after cancellation: out of scope — call out as known limitation. E. Resilience — token-expired-during-long-task. agent-runner's AUTH_ERROR branch now mirrors the K/R question to a durable Choice (kind=keep_or_revert_files) plus a manual_pr_test Verification. `wizard status --json` thereafter shows `nextAction.kind === 'await_user_choice'`. F. Tests — 40+ new tests: - envelope schema parity (CLI ↔ MCP tool) - StatusOverlay rendering all sections - ChoiceCheckpoint UX contract (every required field surfaced) - OutroScreen verification ribbon regression - per-run-cache (memoize / memoizeAsync / invalidate) - auth-error resilience (Choice + Verification + LSP shape) - perf-status-cold (internal-cold-start bound) All 3919 unit tests pass; 100/100 BDD scenarios pass. G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI integration model, envelopes layer, MCP tool parity table, perf measurements, resilience flow). New `docs/agent-consumability.md` covers CLI / MCP / NDJSON consumption with worked examples (Claude Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated. Out of scope (future PRs): - Full TUI screen-tree redesign / information-architecture refactor. - Widening the Choice/Verification wiring beachhead beyond env-selection + event-plan-approval. - Retiring legacy `WizardSession`. - esbuild-bundled CLI for sub-200ms cold-start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Bugbot triage pass complete: 0 stale, 2 live (both fixed in e1bad84 — Windows spawn + tilde expansion), 0 defensible. |
`wizard resume <session-id>` validated the requested session existed but then called `computeLastStoppingPoint(installDir)` without scoping, which always derived its next action from the *most recently created active session* — not the one the user asked about. The envelope's `sessionId` field still echoed the requested ID, so the command/description shown could describe a different session entirely. `computeLastStoppingPoint` now accepts an optional `sessionId` that restricts both session metadata and task buckets to that session. The resume command threads the resolved session id through, and an added test pins the scoping behavior against a two-session fixture.
…kpoints + MCP-app lifecycle Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of the v2 orchestration foundation: - Choice — typed user-choice records with stable promptId for de-dup, requiresHuman automation gate, and full status transitions (pending → answered/expired/cancelled/superseded). - Verification — manual out-of-band verification records with status transitions (pending → passed/failed/skipped, skipped/failed may recover to passed; passed/skipped/failed may supersede). - McpAppCapability — durable lifecycle for every MCP-app capability with an anti-nag invariant: install_skipped → needs_user_choice REQUIRES a non-empty lastStateChangeReason. New CLI commands: - wizard choice list / show / answer (with --confirm-human gate) - wizard verification list / show / mark Wires last-stopping-point's pendingChoices / pendingMcpActions / pendingManualVerifications arrays to read real records (was [] in PR 1). Two callsites instrumented as the PR 2 wiring beachhead: - env-selection in src/commands/helpers.ts (Choice mirror + answer) - event-plan-approval in src/lib/wizard-tools.ts (Verification mirror) Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI. No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Missing error handler on spawned child process
- Added a
child.on('error', ...)handler that logs the error (respecting JSON output mode) and exits withExitCode.GENERAL_ERROR, preventing an uncaught exception if the spawned binary fails to launch.
- Added a
- ✅ Fixed: Redundant identical
ensureDircalls insaveStore- Removed the redundant
ensureDir(getRunDir(next.installDir))call sinceensureDir(dirname(path))already creates the same directory, and removed the now-unusedgetRunDirimport.
- Removed the redundant
Or push these changes by commenting:
@cursor push e57619e3cb
Preview (e57619e3cb)
diff --git a/src/commands/orchestration.ts b/src/commands/orchestration.ts
--- a/src/commands/orchestration.ts
+++ b/src/commands/orchestration.ts
@@ -563,7 +563,7 @@
// and description belong to the session the user asked for, not the
// most-recently-active session in the store.
const lsp = computeLastStoppingPoint(opts.installDir, {
- sessionId: session!.id,
+ sessionId: session.id,
});
const command = lsp.nextAction.command;
const description = lsp.nextAction.description;
@@ -609,6 +609,15 @@
process.exit(ExitCode.GENERAL_ERROR);
}
const child = spawn(cmd, rest, { stdio: 'inherit' });
+ child.on('error', (err) => {
+ if (opts.jsonOutput)
+ emitJsonError(`Failed to spawn resume command: ${err.message}`);
+ else
+ getUI().log.error(
+ `Failed to spawn resume command: ${err.message}`,
+ );
+ process.exit(ExitCode.GENERAL_ERROR);
+ });
child.on('exit', (code) => {
process.exit(code ?? 0);
});
diff --git a/src/lib/orchestration/store.ts b/src/lib/orchestration/store.ts
--- a/src/lib/orchestration/store.ts
+++ b/src/lib/orchestration/store.ts
@@ -43,7 +43,6 @@
import { OrchestrationStoreFileSchema } from './schemas';
import { TaskLifecycle, assertTransition, isTerminal } from './lifecycle';
import { getOrchestrationStoreFile } from './storage-paths';
-import { getRunDir } from '../../utils/storage-paths';
import { dirname } from 'node:path';
// ── Id helpers ────────────────────────────────────────────────────────
@@ -155,7 +154,6 @@
OrchestrationStoreFileSchema.parse(next);
const path = getOrchestrationStoreFile(next.installDir);
ensureDir(dirname(path));
- ensureDir(getRunDir(next.installDir));
atomicWriteJSON(path, next, { mode: 0o600 });
}You can send follow-ups to the cloud agent here.
…ty + perf hot-paths + resilience Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2. PR 3 lands the state-driven foundation that the broader v2 TUI redesign will sit on. Five concerns, all additive — every PR 1 + PR 2 surface keeps working unchanged. A. TUI v2 wiring — `/status` overlay renders the same data `wizard orchestration status --json` emits, sectioned for human reading. ManualVerificationRibbon mounts on OutroScreen so success- looking UI cannot appear while a verification is pending. ChoiceCheckpointBanner is a reusable primitive for surfacing typed Choice records with the full UX contract (why-asking, recommended, safe-default, reversibility, consequence-if-skipped). B. MCP-server tool parity — every read-only orchestration CLI command now has a matching MCP tool. Both surfaces call into the same builders in `src/lib/orchestration/envelopes.ts`, so output is byte-for-byte identical (modulo `generatedAt`). Server stays read-only by design — mutators stay on the CLI. C. Perf hot-paths — `withReadCache(fn)` amortises store reads across builders inside one command/tool invocation. `per-run-cache.ts` memoises repeated `gh pr view` / MCP-availability calls within a single run. D. Bugs found and fixed — - success-looking UI while blocked on a verification → ribbon - choices asked again after a durable answer → addChoice de-dup (covered in PR 2; regression test added) - skipped MCP apps not remembered → covered by anti-nag invariant (PR 2; surfaced via /status) Background agents continuing after cancellation: out of scope — call out as known limitation. E. Resilience — token-expired-during-long-task. agent-runner's AUTH_ERROR branch now mirrors the K/R question to a durable Choice (kind=keep_or_revert_files) plus a manual_pr_test Verification. `wizard status --json` thereafter shows `nextAction.kind === 'await_user_choice'`. F. Tests — 40+ new tests: - envelope schema parity (CLI ↔ MCP tool) - StatusOverlay rendering all sections - ChoiceCheckpoint UX contract (every required field surfaced) - OutroScreen verification ribbon regression - per-run-cache (memoize / memoizeAsync / invalidate) - auth-error resilience (Choice + Verification + LSP shape) - perf-status-cold (internal-cold-start bound) All 3919 unit tests pass; 100/100 BDD scenarios pass. G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI integration model, envelopes layer, MCP tool parity table, perf measurements, resilience flow). New `docs/agent-consumability.md` covers CLI / MCP / NDJSON consumption with worked examples (Claude Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated. Out of scope (future PRs): - Full TUI screen-tree redesign / information-architecture refactor. - Widening the Choice/Verification wiring beachhead beyond env-selection + event-plan-approval. - Retiring legacy `WizardSession`. - esbuild-bundled CLI for sub-200ms cold-start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- `resume --execute` now attaches `child.on('error', …)` before `exit`.
Previously a synchronous spawn failure (ENOENT, EACCES, missing PATH
entry on Windows) fired an unhandled `error` event, which Node's
EventEmitter rethrows — crashing the CLI with a stack trace instead
of producing a clean message + GENERAL_ERROR exit.
- `saveStore` was calling `ensureDir(dirname(path))` and then
`ensureDir(getRunDir(installDir))` — both resolve to the same run
directory because `getOrchestrationStoreFile()` is defined as
`join(getRunDir(installDir), 'orchestration.json')`. Drop the second
call and the now-unused `getRunDir` import.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Description uses hardcoded CLI_INVOCATION instead of configurable invocation
- Replaced the hardcoded
CLI_INVOCATIONon line 170 withcliPrefix.join(' ')so the description string uses the same configurable invocation as the command array.
- Replaced the hardcoded
Or push these changes by commenting:
@cursor push b9a4e38b5d
Preview (b9a4e38b5d)
diff --git a/src/lib/orchestration/last-stopping-point.ts b/src/lib/orchestration/last-stopping-point.ts
--- a/src/lib/orchestration/last-stopping-point.ts
+++ b/src/lib/orchestration/last-stopping-point.ts
@@ -167,7 +167,9 @@
const recent = args.stoppedTasks[0];
return {
kind: 'inspect_failure',
- description: `Most recent stop: ${recent.label} (${recent.state}). Inspect with \`${CLI_INVOCATION} task ${recent.id}\`.`,
+ description: `Most recent stop: ${recent.label} (${
+ recent.state
+ }). Inspect with \`${cliPrefix.join(' ')} task ${recent.id}\`.`,
command: [...cliPrefix, 'task', recent.id, ...installDirArgs],
};
}You can send follow-ups to the cloud agent here.
`deriveNextAction` builds an `inspect_failure` next-action when the
most recent task has stopped. The structured `command` array uses
the configurable `cliPrefix` (sourced from `args.invocation`, which
flows from `options.cliInvocation` on `computeLastStoppingPoint`),
but the inline shell hint embedded in `description` was templating
the hardcoded module-level `CLI_INVOCATION` constant. A custom
invocation (e.g. an alternate `wizard` symlink, or a test harness
overriding the binary name) would surface a description that says
\`amplitude-wizard task <id>\` while the JSON payload's `command`
points at the configured executable. Use `cliPrefix.join(' ')` for
both so the human and machine views always agree.
…dundancy + supervisor + live status refresh (stacks on #691) Stacks on #691 → #690 → #689. Merge after PRs 1+2+3. ## Summary - **Beachhead widening**: centralized `record*Choice` / `record*Verification` helpers in `src/lib/orchestration/wiring.ts` and wired them through every major user-choice and manual-verification surface in the wizard (MCP install, Slack, region select, OAuth browser login, project creation, dashboard setup, event-plan revision, logout). Existing TUI screens / agent prompts continue to drive the user-facing flow; the orchestration store mirror is ADDITIVE so outer agents inspecting `wizard status --json` see typed records. Mirror failures swallow + log so they NEVER break the user-facing path. - **WizardSession boundary**: docblock at the top of `wizard-session.ts` now spells out the contract — `WizardSession` = transient TUI display state; `OrchestrationStore` = durable orchestration state; never duplicate fields between them. Audit table in `docs/orchestration.md` (PR 4 section) walks every field. PR 4 deletes zero fields by design — the redundant *concept* (Subagent / Task / Ownership double-bookkeeping) was already avoided in PR 1; PR 4 cements the contract for PR 5's screen-tree redesign. - **Background-agent supervision**: new `Supervisor` class in `src/lib/orchestration/supervisor.ts`. Tracks subprocess PIDs that map to `Subagent` rows, writes `<runDir>/heartbeats/<pid>.txt` every 5s, SIGTERMs on SIGINT/SIGTERM (with 5s grace before SIGKILL), reaps stale heartbeats (>30s old + PID gone) by transitioning the rooted Task to `cancelled`. Startup recovery transitions orphaned-but-running Tasks to `failed: 'process gone'`. Eliminates the "stopped agents shown as running" drift. - **Live `/status` refresh**: new `watchOrchestrationStore` (debounced 200ms, watches the parent dir to survive `atomicWriteJSON`'s rename) + `useOrchestrationStore` React hook. `StatusOverlayScreen` plumbs the hook in so the overlay re-renders when a sibling shell mutates the store via `wizard choice answer`, `wizard verification mark`, etc. ## Tests +30 tests (3919 → 3949 vitest, 100/100 BDD): - 20 wiring tests (each `record*` helper, dedup invariant, answerByPromptId, anti-nag re-record, verification mark-passed contract) - 5 supervisor tests (track + heartbeat write, terminateAll + signal/marking, stale-heartbeat reap, recoverOrphanedSubagents, untrack) - 5 watcher tests (write fires onChange, debounce coalesces a burst, dispose idempotency, no-fire-after-dispose, late-mount before file exists) All test surfaces: - `pnpm exec vitest run --pool=forks --maxWorkers=1` → 3949/3949 - `pnpm test:bdd` → 100/100 - `pnpm build` → green - `pnpm lint` → green (1 pre-existing warning unchanged) - `pnpm exec tsc --noEmit -p tsconfig.json` → clean ## Backward compatibility - No public-contract changes. `wizard status --json`, `wizard choice list`, `wizard verification list`, and the MCP server's read-only tools all keep emitting the same envelope shapes; PR 4 just produces *more* records in them. - AI-SDK migration unaffected — no fields removed from `WizardSession`. ## Known limitations - TUI screen-tree redesign still PR 5. - Cold-start bundling still a follow-up. - Some less-trafficked prompt surfaces (the inner agent's `choose` tool, per-tool MCP auth confirmations) intentionally keep their existing transient-text path. The audit table in `docs/orchestration.md` documents what was wired and what was skipped (and why). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Resume command string breaks on paths with spaces
- Added a
shellJoinhelper that single-quotes any argv element containing whitespace or shell metacharacters, and replaced all fivecommand.join(' ')call sites inlast-stopping-point.tsandorchestration.tswith it.
- Added a
Or push these changes by commenting:
@cursor push de7da72a3c
Preview (de7da72a3c)
diff --git a/src/commands/orchestration.ts b/src/commands/orchestration.ts
--- a/src/commands/orchestration.ts
+++ b/src/commands/orchestration.ts
@@ -536,7 +536,7 @@
const { getOrchestrationStore } = await import(
'../lib/orchestration/store.js'
);
- const { computeLastStoppingPoint } = await import(
+ const { computeLastStoppingPoint, shellJoin } = await import(
'../lib/orchestration/last-stopping-point.js'
);
const { ResumeEnvelopeSchema } = await import(
@@ -583,7 +583,7 @@
} else {
const ui = getUI();
ui.log.info(description);
- ui.log.info(`Resume: ${chalk.bold(command.join(' '))}`);
+ ui.log.info(`Resume: ${chalk.bold(shellJoin(command))}`);
if (!execute) {
ui.log.info(
chalk.dim('(pass --execute to invoke this command directly)'),
@@ -666,7 +666,7 @@
const { getOrchestrationStore } = await import(
'../lib/orchestration/store.js'
);
- const { computeLastStoppingPoint } = await import(
+ const { computeLastStoppingPoint, shellJoin } = await import(
'../lib/orchestration/last-stopping-point.js'
);
const { StatusEnvelopeSchema } = await import(
@@ -696,7 +696,7 @@
);
ui.log.info(lsp.nextAction.description);
ui.log.info(
- `Resume: ${chalk.bold(lsp.nextAction.command.join(' '))}`,
+ `Resume: ${chalk.bold(shellJoin(lsp.nextAction.command))}`,
);
} else {
ui.log.info(
@@ -729,7 +729,7 @@
ui.log.info(`Next action: ${lsp.nextAction.description}`);
ui.log.info(
`Resume: ${chalk.bold(
- lsp.nextAction.command.join(' '),
+ shellJoin(lsp.nextAction.command),
)}`,
);
}
diff --git a/src/lib/orchestration/last-stopping-point.ts b/src/lib/orchestration/last-stopping-point.ts
--- a/src/lib/orchestration/last-stopping-point.ts
+++ b/src/lib/orchestration/last-stopping-point.ts
@@ -66,6 +66,18 @@
}
}
+/**
+ * Join an argv array into a copy-pasteable shell string, quoting any
+ * arguments that contain whitespace or shell metacharacters.
+ */
+export function shellJoin(argv: readonly string[]): string {
+ return argv
+ .map((arg) =>
+ /[\s"'\\`$!#&|;()<>]/.test(arg) ? `'${arg.replace(/'/g, "'\\''")}'` : arg,
+ )
+ .join(' ');
+}
+
function dedupeOwnership(ownership: Ownership[]): Ownership[] {
const seen = new Set<string>();
const out: Ownership[] = [];
@@ -171,7 +183,7 @@
// `CLI_INVOCATION` here meant a custom invocation (e.g. test harness,
// alternate `wizard` symlink) would print the wrong command name in
// the human-readable hint while emitting the correct one in JSON.
- const cliInline = cliPrefix.join(' ');
+ const cliInline = shellJoin(cliPrefix);
return {
kind: 'inspect_failure',
description: `Most recent stop: ${recent.label} (${recent.state}). Inspect with \`${cliInline} task ${recent.id}\`.`,
@@ -296,6 +308,6 @@
pendingMcpActions,
pendingManualVerifications,
nextAction,
- resumeCommand: nextAction.command.join(' '),
+ resumeCommand: shellJoin(nextAction.command),
};
}You can send follow-ups to the cloud agent here.
…rdown (stacks on #693) Stacks on #693 → #691 → #690 → #689. Merge after PRs 1+2+3+4. PR 5 turns the TUI from "screens that mostly work" into a serious operator interface with a coherent IA, shared glyph vocabulary, and render-cost discipline. IA redesign: - Three-zone layout (header / body / chrome). - Header: JourneyStepper + identity + mode badge. Mode badge surfaces agent / ci / nested / mcp-server states; suppressed in plain interactive mode. - Operator Overview screen (`/status`) reframed: title + mode badge + 1-line summary, then sectioned by Session / Primary work / Background / Pending choices / Pending verifications / MCP capabilities / Owned artifacts / Next action. Live-refresh on orchestration store mutations via PR 4's file-watcher hook. Glyph palette (canonical vocabulary): ○ queued · › running · … waiting · ⏸ blocked · ✓ completed ✗ failed · ⊘ cancelled · ⮕ superseded Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future "swap one glyph" change is a one-line edit, not a hunt across the screen tree. Pinned by unit tests so silent drift trips a test. Slash command coherence: - New `/help` command lists every registered command grouped by "available anytime" vs "available before/after a setup run". When a run is active, the second group is renamed "paused while a setup run is active (Ctrl+C to cancel, then retry)" so the user knows exactly why a command can't fire. - Multi-line command feedback (e.g. /help, /diagnostics) renders with hanging indent so it reads as one block. Render-cost teardown: - New `useWizardSelector(store, selector, isEqual?)` slice hook. Components subscribed to a slice no longer rerender for unrelated store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported for the common case. - Render-cost benchmark fixture pins the contract: 3 task transitions + 5 status bumps → tasks slice 3 renders, status slice 5 renders, whole-store subscriber 8+ renders. Slicing cuts each subscriber's render budget by ~60%. Tests added (40 over the base 3949): - lifecycle-display vocabulary (5) - mode-badge env resolution (9) - /help text generation (6) - HeaderBar mode badge rendering (5) - useWizardSelector primitives + render-cost ceiling (4 + 3) - StatusOverlayScreen glyph palette + summary + mode badge (7) - StatusOverlayScreen Operator Overview reframing (existing test updated to match new section names) (1) Build, lint, vitest (3989/3989), BDD (100/100) all green. Backward compatibility: - All existing slash commands continue to work the same way; /help is additive. - /status overlay's data shape is unchanged from PR 3; only the rendering reorganized. - --agent, --ci, --json, manifest, plan, apply, verify, MCP server, v: 1 envelope, exit codes — all unchanged. - Mode badge is suppressed in plain interactive mode, preserving the prior header look for the most common case. - ProgressList still uses a blank gutter for `pending` rows rather than the canonical ○ glyph (deliberate UX trade-off — see comment in ProgressList.tsx). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`resumeCommand` is the human-facing copy-pasteable form of
`nextAction.command`. It was built with `nextAction.command.join(' ')`,
which silently corrupts paths containing spaces (e.g. an `installDir`
of `/Users/me/my project` would land in the shell as two separate
words). The structured `command` array stayed correct, but the string
the user is invited to paste into a terminal would fail.
Add a small `shellJoin` / `shellQuote` helper that wraps tokens with
shell metacharacters or whitespace in single quotes (with the standard
`'\''` close/escape/reopen dance for embedded single quotes). Tokens
that are already shell-safe stay unquoted so the common case stays
readable.
…estration commands The structured `lsp.nextAction.command` array is correct, but joining it with ` ` for the human-facing `Resume:` line splits paths-with-spaces into multiple shell words. Switch every human-display callsite to `lsp.resumeCommand`, which already shellQuotes via `shellJoin`. Mirrors the fix that was applied to the LSP envelope, but covers the remaining `wizard resume` and `wizard orchestration status` print paths.
…ty + perf hot-paths + resilience Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2. PR 3 lands the state-driven foundation that the broader v2 TUI redesign will sit on. Five concerns, all additive — every PR 1 + PR 2 surface keeps working unchanged. A. TUI v2 wiring — `/status` overlay renders the same data `wizard orchestration status --json` emits, sectioned for human reading. ManualVerificationRibbon mounts on OutroScreen so success- looking UI cannot appear while a verification is pending. ChoiceCheckpointBanner is a reusable primitive for surfacing typed Choice records with the full UX contract (why-asking, recommended, safe-default, reversibility, consequence-if-skipped). B. MCP-server tool parity — every read-only orchestration CLI command now has a matching MCP tool. Both surfaces call into the same builders in `src/lib/orchestration/envelopes.ts`, so output is byte-for-byte identical (modulo `generatedAt`). Server stays read-only by design — mutators stay on the CLI. C. Perf hot-paths — `withReadCache(fn)` amortises store reads across builders inside one command/tool invocation. `per-run-cache.ts` memoises repeated `gh pr view` / MCP-availability calls within a single run. D. Bugs found and fixed — - success-looking UI while blocked on a verification → ribbon - choices asked again after a durable answer → addChoice de-dup (covered in PR 2; regression test added) - skipped MCP apps not remembered → covered by anti-nag invariant (PR 2; surfaced via /status) Background agents continuing after cancellation: out of scope — call out as known limitation. E. Resilience — token-expired-during-long-task. agent-runner's AUTH_ERROR branch now mirrors the K/R question to a durable Choice (kind=keep_or_revert_files) plus a manual_pr_test Verification. `wizard status --json` thereafter shows `nextAction.kind === 'await_user_choice'`. F. Tests — 40+ new tests: - envelope schema parity (CLI ↔ MCP tool) - StatusOverlay rendering all sections - ChoiceCheckpoint UX contract (every required field surfaced) - OutroScreen verification ribbon regression - per-run-cache (memoize / memoizeAsync / invalidate) - auth-error resilience (Choice + Verification + LSP shape) - perf-status-cold (internal-cold-start bound) All 3919 unit tests pass; 100/100 BDD scenarios pass. G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI integration model, envelopes layer, MCP tool parity table, perf measurements, resilience flow). New `docs/agent-consumability.md` covers CLI / MCP / NDJSON consumption with worked examples (Claude Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated. Out of scope (future PRs): - Full TUI screen-tree redesign / information-architecture refactor. - Widening the Choice/Verification wiring beachhead beyond env-selection + event-plan-approval. - Retiring legacy `WizardSession`. - esbuild-bundled CLI for sub-200ms cold-start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dundancy + supervisor + live status refresh (stacks on #691) Stacks on #691 → #690 → #689. Merge after PRs 1+2+3. ## Summary - **Beachhead widening**: centralized `record*Choice` / `record*Verification` helpers in `src/lib/orchestration/wiring.ts` and wired them through every major user-choice and manual-verification surface in the wizard (MCP install, Slack, region select, OAuth browser login, project creation, dashboard setup, event-plan revision, logout). Existing TUI screens / agent prompts continue to drive the user-facing flow; the orchestration store mirror is ADDITIVE so outer agents inspecting `wizard status --json` see typed records. Mirror failures swallow + log so they NEVER break the user-facing path. - **WizardSession boundary**: docblock at the top of `wizard-session.ts` now spells out the contract — `WizardSession` = transient TUI display state; `OrchestrationStore` = durable orchestration state; never duplicate fields between them. Audit table in `docs/orchestration.md` (PR 4 section) walks every field. PR 4 deletes zero fields by design — the redundant *concept* (Subagent / Task / Ownership double-bookkeeping) was already avoided in PR 1; PR 4 cements the contract for PR 5's screen-tree redesign. - **Background-agent supervision**: new `Supervisor` class in `src/lib/orchestration/supervisor.ts`. Tracks subprocess PIDs that map to `Subagent` rows, writes `<runDir>/heartbeats/<pid>.txt` every 5s, SIGTERMs on SIGINT/SIGTERM (with 5s grace before SIGKILL), reaps stale heartbeats (>30s old + PID gone) by transitioning the rooted Task to `cancelled`. Startup recovery transitions orphaned-but-running Tasks to `failed: 'process gone'`. Eliminates the "stopped agents shown as running" drift. - **Live `/status` refresh**: new `watchOrchestrationStore` (debounced 200ms, watches the parent dir to survive `atomicWriteJSON`'s rename) + `useOrchestrationStore` React hook. `StatusOverlayScreen` plumbs the hook in so the overlay re-renders when a sibling shell mutates the store via `wizard choice answer`, `wizard verification mark`, etc. ## Tests +30 tests (3919 → 3949 vitest, 100/100 BDD): - 20 wiring tests (each `record*` helper, dedup invariant, answerByPromptId, anti-nag re-record, verification mark-passed contract) - 5 supervisor tests (track + heartbeat write, terminateAll + signal/marking, stale-heartbeat reap, recoverOrphanedSubagents, untrack) - 5 watcher tests (write fires onChange, debounce coalesces a burst, dispose idempotency, no-fire-after-dispose, late-mount before file exists) All test surfaces: - `pnpm exec vitest run --pool=forks --maxWorkers=1` → 3949/3949 - `pnpm test:bdd` → 100/100 - `pnpm build` → green - `pnpm lint` → green (1 pre-existing warning unchanged) - `pnpm exec tsc --noEmit -p tsconfig.json` → clean ## Backward compatibility - No public-contract changes. `wizard status --json`, `wizard choice list`, `wizard verification list`, and the MCP server's read-only tools all keep emitting the same envelope shapes; PR 4 just produces *more* records in them. - AI-SDK migration unaffected — no fields removed from `WizardSession`. ## Known limitations - TUI screen-tree redesign still PR 5. - Cold-start bundling still a follow-up. - Some less-trafficked prompt surfaces (the inner agent's `choose` tool, per-tool MCP auth confirmations) intentionally keep their existing transient-text path. The audit table in `docs/orchestration.md` documents what was wired and what was skipped (and why). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rdown (stacks on #693) Stacks on #693 → #691 → #690 → #689. Merge after PRs 1+2+3+4. PR 5 turns the TUI from "screens that mostly work" into a serious operator interface with a coherent IA, shared glyph vocabulary, and render-cost discipline. IA redesign: - Three-zone layout (header / body / chrome). - Header: JourneyStepper + identity + mode badge. Mode badge surfaces agent / ci / nested / mcp-server states; suppressed in plain interactive mode. - Operator Overview screen (`/status`) reframed: title + mode badge + 1-line summary, then sectioned by Session / Primary work / Background / Pending choices / Pending verifications / MCP capabilities / Owned artifacts / Next action. Live-refresh on orchestration store mutations via PR 4's file-watcher hook. Glyph palette (canonical vocabulary): ○ queued · › running · … waiting · ⏸ blocked · ✓ completed ✗ failed · ⊘ cancelled · ⮕ superseded Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future "swap one glyph" change is a one-line edit, not a hunt across the screen tree. Pinned by unit tests so silent drift trips a test. Slash command coherence: - New `/help` command lists every registered command grouped by "available anytime" vs "available before/after a setup run". When a run is active, the second group is renamed "paused while a setup run is active (Ctrl+C to cancel, then retry)" so the user knows exactly why a command can't fire. - Multi-line command feedback (e.g. /help, /diagnostics) renders with hanging indent so it reads as one block. Render-cost teardown: - New `useWizardSelector(store, selector, isEqual?)` slice hook. Components subscribed to a slice no longer rerender for unrelated store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported for the common case. - Render-cost benchmark fixture pins the contract: 3 task transitions + 5 status bumps → tasks slice 3 renders, status slice 5 renders, whole-store subscriber 8+ renders. Slicing cuts each subscriber's render budget by ~60%. Tests added (40 over the base 3949): - lifecycle-display vocabulary (5) - mode-badge env resolution (9) - /help text generation (6) - HeaderBar mode badge rendering (5) - useWizardSelector primitives + render-cost ceiling (4 + 3) - StatusOverlayScreen glyph palette + summary + mode badge (7) - StatusOverlayScreen Operator Overview reframing (existing test updated to match new section names) (1) Build, lint, vitest (3989/3989), BDD (100/100) all green. Backward compatibility: - All existing slash commands continue to work the same way; /help is additive. - /status overlay's data shape is unchanged from PR 3; only the rendering reorganized. - --agent, --ci, --json, manifest, plan, apply, verify, MCP server, v: 1 envelope, exit codes — all unchanged. - Mode badge is suppressed in plain interactive mode, preserving the prior header look for the most common case. - ProgressList still uses a blank gutter for `pending` rows rather than the canonical ○ glyph (deliberate UX trade-off — see comment in ProgressList.tsx). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…kpoints + MCP-app lifecycle Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of the v2 orchestration foundation: - Choice — typed user-choice records with stable promptId for de-dup, requiresHuman automation gate, and full status transitions (pending → answered/expired/cancelled/superseded). - Verification — manual out-of-band verification records with status transitions (pending → passed/failed/skipped, skipped/failed may recover to passed; passed/skipped/failed may supersede). - McpAppCapability — durable lifecycle for every MCP-app capability with an anti-nag invariant: install_skipped → needs_user_choice REQUIRES a non-empty lastStateChangeReason. New CLI commands: - wizard choice list / show / answer (with --confirm-human gate) - wizard verification list / show / mark Wires last-stopping-point's pendingChoices / pendingMcpActions / pendingManualVerifications arrays to read real records (was [] in PR 1). Two callsites instrumented as the PR 2 wiring beachhead: - env-selection in src/commands/helpers.ts (Choice mirror + answer) - event-plan-approval in src/lib/wizard-tools.ts (Verification mirror) Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI. No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ty + perf hot-paths + resilience Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2. PR 3 lands the state-driven foundation that the broader v2 TUI redesign will sit on. Five concerns, all additive — every PR 1 + PR 2 surface keeps working unchanged. A. TUI v2 wiring — `/status` overlay renders the same data `wizard orchestration status --json` emits, sectioned for human reading. ManualVerificationRibbon mounts on OutroScreen so success- looking UI cannot appear while a verification is pending. ChoiceCheckpointBanner is a reusable primitive for surfacing typed Choice records with the full UX contract (why-asking, recommended, safe-default, reversibility, consequence-if-skipped). B. MCP-server tool parity — every read-only orchestration CLI command now has a matching MCP tool. Both surfaces call into the same builders in `src/lib/orchestration/envelopes.ts`, so output is byte-for-byte identical (modulo `generatedAt`). Server stays read-only by design — mutators stay on the CLI. C. Perf hot-paths — `withReadCache(fn)` amortises store reads across builders inside one command/tool invocation. `per-run-cache.ts` memoises repeated `gh pr view` / MCP-availability calls within a single run. D. Bugs found and fixed — - success-looking UI while blocked on a verification → ribbon - choices asked again after a durable answer → addChoice de-dup (covered in PR 2; regression test added) - skipped MCP apps not remembered → covered by anti-nag invariant (PR 2; surfaced via /status) Background agents continuing after cancellation: out of scope — call out as known limitation. E. Resilience — token-expired-during-long-task. agent-runner's AUTH_ERROR branch now mirrors the K/R question to a durable Choice (kind=keep_or_revert_files) plus a manual_pr_test Verification. `wizard status --json` thereafter shows `nextAction.kind === 'await_user_choice'`. F. Tests — 40+ new tests: - envelope schema parity (CLI ↔ MCP tool) - StatusOverlay rendering all sections - ChoiceCheckpoint UX contract (every required field surfaced) - OutroScreen verification ribbon regression - per-run-cache (memoize / memoizeAsync / invalidate) - auth-error resilience (Choice + Verification + LSP shape) - perf-status-cold (internal-cold-start bound) All 3919 unit tests pass; 100/100 BDD scenarios pass. G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI integration model, envelopes layer, MCP tool parity table, perf measurements, resilience flow). New `docs/agent-consumability.md` covers CLI / MCP / NDJSON consumption with worked examples (Claude Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated. Out of scope (future PRs): - Full TUI screen-tree redesign / information-architecture refactor. - Widening the Choice/Verification wiring beachhead beyond env-selection + event-plan-approval. - Retiring legacy `WizardSession`. - esbuild-bundled CLI for sub-200ms cold-start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dundancy + supervisor + live status refresh (stacks on #691) Stacks on #691 → #690 → #689. Merge after PRs 1+2+3. ## Summary - **Beachhead widening**: centralized `record*Choice` / `record*Verification` helpers in `src/lib/orchestration/wiring.ts` and wired them through every major user-choice and manual-verification surface in the wizard (MCP install, Slack, region select, OAuth browser login, project creation, dashboard setup, event-plan revision, logout). Existing TUI screens / agent prompts continue to drive the user-facing flow; the orchestration store mirror is ADDITIVE so outer agents inspecting `wizard status --json` see typed records. Mirror failures swallow + log so they NEVER break the user-facing path. - **WizardSession boundary**: docblock at the top of `wizard-session.ts` now spells out the contract — `WizardSession` = transient TUI display state; `OrchestrationStore` = durable orchestration state; never duplicate fields between them. Audit table in `docs/orchestration.md` (PR 4 section) walks every field. PR 4 deletes zero fields by design — the redundant *concept* (Subagent / Task / Ownership double-bookkeeping) was already avoided in PR 1; PR 4 cements the contract for PR 5's screen-tree redesign. - **Background-agent supervision**: new `Supervisor` class in `src/lib/orchestration/supervisor.ts`. Tracks subprocess PIDs that map to `Subagent` rows, writes `<runDir>/heartbeats/<pid>.txt` every 5s, SIGTERMs on SIGINT/SIGTERM (with 5s grace before SIGKILL), reaps stale heartbeats (>30s old + PID gone) by transitioning the rooted Task to `cancelled`. Startup recovery transitions orphaned-but-running Tasks to `failed: 'process gone'`. Eliminates the "stopped agents shown as running" drift. - **Live `/status` refresh**: new `watchOrchestrationStore` (debounced 200ms, watches the parent dir to survive `atomicWriteJSON`'s rename) + `useOrchestrationStore` React hook. `StatusOverlayScreen` plumbs the hook in so the overlay re-renders when a sibling shell mutates the store via `wizard choice answer`, `wizard verification mark`, etc. ## Tests +30 tests (3919 → 3949 vitest, 100/100 BDD): - 20 wiring tests (each `record*` helper, dedup invariant, answerByPromptId, anti-nag re-record, verification mark-passed contract) - 5 supervisor tests (track + heartbeat write, terminateAll + signal/marking, stale-heartbeat reap, recoverOrphanedSubagents, untrack) - 5 watcher tests (write fires onChange, debounce coalesces a burst, dispose idempotency, no-fire-after-dispose, late-mount before file exists) All test surfaces: - `pnpm exec vitest run --pool=forks --maxWorkers=1` → 3949/3949 - `pnpm test:bdd` → 100/100 - `pnpm build` → green - `pnpm lint` → green (1 pre-existing warning unchanged) - `pnpm exec tsc --noEmit -p tsconfig.json` → clean ## Backward compatibility - No public-contract changes. `wizard status --json`, `wizard choice list`, `wizard verification list`, and the MCP server's read-only tools all keep emitting the same envelope shapes; PR 4 just produces *more* records in them. - AI-SDK migration unaffected — no fields removed from `WizardSession`. ## Known limitations - TUI screen-tree redesign still PR 5. - Cold-start bundling still a follow-up. - Some less-trafficked prompt surfaces (the inner agent's `choose` tool, per-tool MCP auth confirmations) intentionally keep their existing transient-text path. The audit table in `docs/orchestration.md` documents what was wired and what was skipped (and why). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rdown (stacks on #693) Stacks on #693 → #691 → #690 → #689. Merge after PRs 1+2+3+4. PR 5 turns the TUI from "screens that mostly work" into a serious operator interface with a coherent IA, shared glyph vocabulary, and render-cost discipline. IA redesign: - Three-zone layout (header / body / chrome). - Header: JourneyStepper + identity + mode badge. Mode badge surfaces agent / ci / nested / mcp-server states; suppressed in plain interactive mode. - Operator Overview screen (`/status`) reframed: title + mode badge + 1-line summary, then sectioned by Session / Primary work / Background / Pending choices / Pending verifications / MCP capabilities / Owned artifacts / Next action. Live-refresh on orchestration store mutations via PR 4's file-watcher hook. Glyph palette (canonical vocabulary): ○ queued · › running · … waiting · ⏸ blocked · ✓ completed ✗ failed · ⊘ cancelled · ⮕ superseded Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future "swap one glyph" change is a one-line edit, not a hunt across the screen tree. Pinned by unit tests so silent drift trips a test. Slash command coherence: - New `/help` command lists every registered command grouped by "available anytime" vs "available before/after a setup run". When a run is active, the second group is renamed "paused while a setup run is active (Ctrl+C to cancel, then retry)" so the user knows exactly why a command can't fire. - Multi-line command feedback (e.g. /help, /diagnostics) renders with hanging indent so it reads as one block. Render-cost teardown: - New `useWizardSelector(store, selector, isEqual?)` slice hook. Components subscribed to a slice no longer rerender for unrelated store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported for the common case. - Render-cost benchmark fixture pins the contract: 3 task transitions + 5 status bumps → tasks slice 3 renders, status slice 5 renders, whole-store subscriber 8+ renders. Slicing cuts each subscriber's render budget by ~60%. Tests added (40 over the base 3949): - lifecycle-display vocabulary (5) - mode-badge env resolution (9) - /help text generation (6) - HeaderBar mode badge rendering (5) - useWizardSelector primitives + render-cost ceiling (4 + 3) - StatusOverlayScreen glyph palette + summary + mode badge (7) - StatusOverlayScreen Operator Overview reframing (existing test updated to match new section names) (1) Build, lint, vitest (3989/3989), BDD (100/100) all green. Backward compatibility: - All existing slash commands continue to work the same way; /help is additive. - /status overlay's data shape is unchanged from PR 3; only the rendering reorganized. - --agent, --ci, --json, manifest, plan, apply, verify, MCP server, v: 1 envelope, exit codes — all unchanged. - Mode badge is suppressed in plain interactive mode, preserving the prior header look for the most common case. - ProgressList still uses a blank gutter for `pending` rows rather than the canonical ○ glyph (deliberate UX trade-off — see comment in ProgressList.tsx). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rdown (stacks on #693) Stacks on #693 → #691 → #690 → #689. Merge after PRs 1+2+3+4. PR 5 turns the TUI from "screens that mostly work" into a serious operator interface with a coherent IA, shared glyph vocabulary, and render-cost discipline. IA redesign: - Three-zone layout (header / body / chrome). - Header: JourneyStepper + identity + mode badge. Mode badge surfaces agent / ci / nested / mcp-server states; suppressed in plain interactive mode. - Operator Overview screen (`/status`) reframed: title + mode badge + 1-line summary, then sectioned by Session / Primary work / Background / Pending choices / Pending verifications / MCP capabilities / Owned artifacts / Next action. Live-refresh on orchestration store mutations via PR 4's file-watcher hook. Glyph palette (canonical vocabulary): ○ queued · › running · … waiting · ⏸ blocked · ✓ completed ✗ failed · ⊘ cancelled · ⮕ superseded Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future "swap one glyph" change is a one-line edit, not a hunt across the screen tree. Pinned by unit tests so silent drift trips a test. Slash command coherence: - New `/help` command lists every registered command grouped by "available anytime" vs "available before/after a setup run". When a run is active, the second group is renamed "paused while a setup run is active (Ctrl+C to cancel, then retry)" so the user knows exactly why a command can't fire. - Multi-line command feedback (e.g. /help, /diagnostics) renders with hanging indent so it reads as one block. Render-cost teardown: - New `useWizardSelector(store, selector, isEqual?)` slice hook. Components subscribed to a slice no longer rerender for unrelated store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported for the common case. - Render-cost benchmark fixture pins the contract: 3 task transitions + 5 status bumps → tasks slice 3 renders, status slice 5 renders, whole-store subscriber 8+ renders. Slicing cuts each subscriber's render budget by ~60%. Tests added (40 over the base 3949): - lifecycle-display vocabulary (5) - mode-badge env resolution (9) - /help text generation (6) - HeaderBar mode badge rendering (5) - useWizardSelector primitives + render-cost ceiling (4 + 3) - StatusOverlayScreen glyph palette + summary + mode badge (7) - StatusOverlayScreen Operator Overview reframing (existing test updated to match new section names) (1) Build, lint, vitest (3989/3989), BDD (100/100) all green. Backward compatibility: - All existing slash commands continue to work the same way; /help is additive. - /status overlay's data shape is unchanged from PR 3; only the rendering reorganized. - --agent, --ci, --json, manifest, plan, apply, verify, MCP server, v: 1 envelope, exit codes — all unchanged. - Mode badge is suppressed in plain interactive mode, preserving the prior header look for the most common case. - ProgressList still uses a blank gutter for `pending` rows rather than the canonical ○ glyph (deliberate UX trade-off — see comment in ProgressList.tsx). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…kpoints + MCP-app lifecycle Stacks on PR 1 (#689). Adds three typed checkpoint surfaces on top of the v2 orchestration foundation: - Choice — typed user-choice records with stable promptId for de-dup, requiresHuman automation gate, and full status transitions (pending → answered/expired/cancelled/superseded). - Verification — manual out-of-band verification records with status transitions (pending → passed/failed/skipped, skipped/failed may recover to passed; passed/skipped/failed may supersede). - McpAppCapability — durable lifecycle for every MCP-app capability with an anti-nag invariant: install_skipped → needs_user_choice REQUIRES a non-empty lastStateChangeReason. New CLI commands: - wizard choice list / show / answer (with --confirm-human gate) - wizard verification list / show / mark Wires last-stopping-point's pendingChoices / pendingMcpActions / pendingManualVerifications arrays to read real records (was [] in PR 1). Two callsites instrumented as the PR 2 wiring beachhead: - env-selection in src/commands/helpers.ts (Choice mirror + answer) - event-plan-approval in src/lib/wizard-tools.ts (Verification mirror) Adds 42 tests across choices/verifications/mcp-app-lifecycle/last-stopping-point/CLI. No TUI changes (deferred to PR 3); no MCP-server tool changes (deferred to PR 3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ty + perf hot-paths + resilience Stacks on #690 (which stacks on #689). Merge after PRs 1 + 2. PR 3 lands the state-driven foundation that the broader v2 TUI redesign will sit on. Five concerns, all additive — every PR 1 + PR 2 surface keeps working unchanged. A. TUI v2 wiring — `/status` overlay renders the same data `wizard orchestration status --json` emits, sectioned for human reading. ManualVerificationRibbon mounts on OutroScreen so success- looking UI cannot appear while a verification is pending. ChoiceCheckpointBanner is a reusable primitive for surfacing typed Choice records with the full UX contract (why-asking, recommended, safe-default, reversibility, consequence-if-skipped). B. MCP-server tool parity — every read-only orchestration CLI command now has a matching MCP tool. Both surfaces call into the same builders in `src/lib/orchestration/envelopes.ts`, so output is byte-for-byte identical (modulo `generatedAt`). Server stays read-only by design — mutators stay on the CLI. C. Perf hot-paths — `withReadCache(fn)` amortises store reads across builders inside one command/tool invocation. `per-run-cache.ts` memoises repeated `gh pr view` / MCP-availability calls within a single run. D. Bugs found and fixed — - success-looking UI while blocked on a verification → ribbon - choices asked again after a durable answer → addChoice de-dup (covered in PR 2; regression test added) - skipped MCP apps not remembered → covered by anti-nag invariant (PR 2; surfaced via /status) Background agents continuing after cancellation: out of scope — call out as known limitation. E. Resilience — token-expired-during-long-task. agent-runner's AUTH_ERROR branch now mirrors the K/R question to a durable Choice (kind=keep_or_revert_files) plus a manual_pr_test Verification. `wizard status --json` thereafter shows `nextAction.kind === 'await_user_choice'`. F. Tests — 40+ new tests: - envelope schema parity (CLI ↔ MCP tool) - StatusOverlay rendering all sections - ChoiceCheckpoint UX contract (every required field surfaced) - OutroScreen verification ribbon regression - per-run-cache (memoize / memoizeAsync / invalidate) - auth-error resilience (Choice + Verification + LSP shape) - perf-status-cold (internal-cold-start bound) All 3919 unit tests pass; 100/100 BDD scenarios pass. G. Docs — extended `docs/orchestration.md` with PR 3 sections (TUI integration model, envelopes layer, MCP tool parity table, perf measurements, resilience flow). New `docs/agent-consumability.md` covers CLI / MCP / NDJSON consumption with worked examples (Claude Code, Cursor, CI bots, watchdogs). README + CLAUDE.md updated. Out of scope (future PRs): - Full TUI screen-tree redesign / information-architecture refactor. - Widening the Choice/Verification wiring beachhead beyond env-selection + event-plan-approval. - Retiring legacy `WizardSession`. - esbuild-bundled CLI for sub-200ms cold-start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dundancy + supervisor + live status refresh (stacks on #691) Stacks on #691 → #690 → #689. Merge after PRs 1+2+3. ## Summary - **Beachhead widening**: centralized `record*Choice` / `record*Verification` helpers in `src/lib/orchestration/wiring.ts` and wired them through every major user-choice and manual-verification surface in the wizard (MCP install, Slack, region select, OAuth browser login, project creation, dashboard setup, event-plan revision, logout). Existing TUI screens / agent prompts continue to drive the user-facing flow; the orchestration store mirror is ADDITIVE so outer agents inspecting `wizard status --json` see typed records. Mirror failures swallow + log so they NEVER break the user-facing path. - **WizardSession boundary**: docblock at the top of `wizard-session.ts` now spells out the contract — `WizardSession` = transient TUI display state; `OrchestrationStore` = durable orchestration state; never duplicate fields between them. Audit table in `docs/orchestration.md` (PR 4 section) walks every field. PR 4 deletes zero fields by design — the redundant *concept* (Subagent / Task / Ownership double-bookkeeping) was already avoided in PR 1; PR 4 cements the contract for PR 5's screen-tree redesign. - **Background-agent supervision**: new `Supervisor` class in `src/lib/orchestration/supervisor.ts`. Tracks subprocess PIDs that map to `Subagent` rows, writes `<runDir>/heartbeats/<pid>.txt` every 5s, SIGTERMs on SIGINT/SIGTERM (with 5s grace before SIGKILL), reaps stale heartbeats (>30s old + PID gone) by transitioning the rooted Task to `cancelled`. Startup recovery transitions orphaned-but-running Tasks to `failed: 'process gone'`. Eliminates the "stopped agents shown as running" drift. - **Live `/status` refresh**: new `watchOrchestrationStore` (debounced 200ms, watches the parent dir to survive `atomicWriteJSON`'s rename) + `useOrchestrationStore` React hook. `StatusOverlayScreen` plumbs the hook in so the overlay re-renders when a sibling shell mutates the store via `wizard choice answer`, `wizard verification mark`, etc. ## Tests +30 tests (3919 → 3949 vitest, 100/100 BDD): - 20 wiring tests (each `record*` helper, dedup invariant, answerByPromptId, anti-nag re-record, verification mark-passed contract) - 5 supervisor tests (track + heartbeat write, terminateAll + signal/marking, stale-heartbeat reap, recoverOrphanedSubagents, untrack) - 5 watcher tests (write fires onChange, debounce coalesces a burst, dispose idempotency, no-fire-after-dispose, late-mount before file exists) All test surfaces: - `pnpm exec vitest run --pool=forks --maxWorkers=1` → 3949/3949 - `pnpm test:bdd` → 100/100 - `pnpm build` → green - `pnpm lint` → green (1 pre-existing warning unchanged) - `pnpm exec tsc --noEmit -p tsconfig.json` → clean ## Backward compatibility - No public-contract changes. `wizard status --json`, `wizard choice list`, `wizard verification list`, and the MCP server's read-only tools all keep emitting the same envelope shapes; PR 4 just produces *more* records in them. - AI-SDK migration unaffected — no fields removed from `WizardSession`. ## Known limitations - TUI screen-tree redesign still PR 5. - Cold-start bundling still a follow-up. - Some less-trafficked prompt surfaces (the inner agent's `choose` tool, per-tool MCP auth confirmations) intentionally keep their existing transient-text path. The audit table in `docs/orchestration.md` documents what was wired and what was skipped (and why). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rdown (stacks on #693) Stacks on #693 → #691 → #690 → #689. Merge after PRs 1+2+3+4. PR 5 turns the TUI from "screens that mostly work" into a serious operator interface with a coherent IA, shared glyph vocabulary, and render-cost discipline. IA redesign: - Three-zone layout (header / body / chrome). - Header: JourneyStepper + identity + mode badge. Mode badge surfaces agent / ci / nested / mcp-server states; suppressed in plain interactive mode. - Operator Overview screen (`/status`) reframed: title + mode badge + 1-line summary, then sectioned by Session / Primary work / Background / Pending choices / Pending verifications / MCP capabilities / Owned artifacts / Next action. Live-refresh on orchestration store mutations via PR 4's file-watcher hook. Glyph palette (canonical vocabulary): ○ queued · › running · … waiting · ⏸ blocked · ✓ completed ✗ failed · ⊘ cancelled · ⮕ superseded Centralized in `src/ui/tui/utils/lifecycle-display.ts` so a future "swap one glyph" change is a one-line edit, not a hunt across the screen tree. Pinned by unit tests so silent drift trips a test. Slash command coherence: - New `/help` command lists every registered command grouped by "available anytime" vs "available before/after a setup run". When a run is active, the second group is renamed "paused while a setup run is active (Ctrl+C to cancel, then retry)" so the user knows exactly why a command can't fire. - Multi-line command feedback (e.g. /help, /diagnostics) renders with hanging indent so it reads as one block. Render-cost teardown: - New `useWizardSelector(store, selector, isEqual?)` slice hook. Components subscribed to a slice no longer rerender for unrelated store ticks. `shallowArrayEqual` and `shallowObjectEqual` exported for the common case. - Render-cost benchmark fixture pins the contract: 3 task transitions + 5 status bumps → tasks slice 3 renders, status slice 5 renders, whole-store subscriber 8+ renders. Slicing cuts each subscriber's render budget by ~60%. Tests added (40 over the base 3949): - lifecycle-display vocabulary (5) - mode-badge env resolution (9) - /help text generation (6) - HeaderBar mode badge rendering (5) - useWizardSelector primitives + render-cost ceiling (4 + 3) - StatusOverlayScreen glyph palette + summary + mode badge (7) - StatusOverlayScreen Operator Overview reframing (existing test updated to match new section names) (1) Build, lint, vitest (3989/3989), BDD (100/100) all green. Backward compatibility: - All existing slash commands continue to work the same way; /help is additive. - /status overlay's data shape is unchanged from PR 3; only the rendering reorganized. - --agent, --ci, --json, manifest, plan, apply, verify, MCP server, v: 1 envelope, exit codes — all unchanged. - Mode badge is suppressed in plain interactive mode, preserving the prior header look for the most common case. - ProgressList still uses a blank gutter for `pending` rows rather than the canonical ○ glyph (deliberate UX trade-off — see comment in ProgressList.tsx). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…andle shape `globalThis.clearInterval` has a strict `string | number | Timeout | undefined` parameter; `opts.clearInterval` (declared `(handle: unknown) => void` so tests can override with a fake) is broader. Without an explicit type annotation, the union `opts.clearInterval ?? globalThis.clearInterval` collapses to the strict shape and rejects the `pollHandle: unknown` we pass at call sites. Pin both `setIntervalFn` and `clearIntervalFn` to the options-interface signature so the test override remains usable AND the inferred type accepts `unknown` handles. No runtime change.
Persisted `orchestration_status` JSON (and every other user-facing message that uses `CLI_INVOCATION`) was emitting `amplitude-wizard --install-dir X` as the suggested resume command. That only works when the user has explicitly run `npm install -g @amplitude/wizard` — npx users see the hint and get "command not found". The previous detection logic (`/_npx/` in argv[1] OR `npm_command=exec`) caught most real-world npx invocations but missed common ones: `pnpm try:prod`, `node dist/bin.js`, any wrapper that strips npm env vars. The persisted `resumeCommand` from such a run was then wrong for the user reading it back later. Hardcoding to `npx @amplitude/wizard` is universally correct: it works when npx defers to a global install AND when it fetches from the registry. Removes a class of "command not found" support tickets.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Unhandled errors in
resolveCommonOptssilently crash commands- Moved resolveCommonOpts inside the try/catch block in all six command handlers and used optional chaining (opts?.jsonOutput) in catch blocks so errors are properly reported with structured output and ExitCode.GENERAL_ERROR.
Or push these changes by commenting:
@cursor push 7f5c80d4c4
Preview (7f5c80d4c4)
diff --git a/src/commands/orchestration.ts b/src/commands/orchestration.ts
--- a/src/commands/orchestration.ts
+++ b/src/commands/orchestration.ts
@@ -129,12 +129,13 @@
}),
handler: (argv) => {
void (async () => {
- const opts = await resolveCommonOpts({
- installDir: argv['install-dir'] as string | undefined,
- json: argv.json as boolean | undefined,
- human: argv.human as boolean | undefined,
- });
+ let opts: CommonOpts | undefined;
try {
+ opts = await resolveCommonOpts({
+ installDir: argv['install-dir'] as string | undefined,
+ json: argv.json as boolean | undefined,
+ human: argv.human as boolean | undefined,
+ });
const { getOrchestrationStore } = await import(
'../lib/orchestration/store.js'
);
@@ -201,7 +202,7 @@
process.exit(ExitCode.SUCCESS);
} catch (e) {
const message = e instanceof Error ? e.message : String(e);
- if (opts.jsonOutput) emitJsonError(`tasks listing failed: ${message}`);
+ if (opts?.jsonOutput) emitJsonError(`tasks listing failed: ${message}`);
else getUI().log.error(`Tasks listing failed: ${message}`);
process.exit(ExitCode.GENERAL_ERROR);
}
@@ -229,13 +230,14 @@
}),
handler: (argv) => {
void (async () => {
- const opts = await resolveCommonOpts({
- installDir: argv['install-dir'] as string | undefined,
- json: argv.json as boolean | undefined,
- human: argv.human as boolean | undefined,
- });
+ let opts: CommonOpts | undefined;
const idRaw = argv.id as string;
try {
+ opts = await resolveCommonOpts({
+ installDir: argv['install-dir'] as string | undefined,
+ json: argv.json as boolean | undefined,
+ human: argv.human as boolean | undefined,
+ });
const { getOrchestrationStore } = await import(
'../lib/orchestration/store.js'
);
@@ -314,7 +316,7 @@
process.exit(ExitCode.SUCCESS);
} catch (e) {
const message = e instanceof Error ? e.message : String(e);
- if (opts.jsonOutput) emitJsonError(`task lookup failed: ${message}`);
+ if (opts?.jsonOutput) emitJsonError(`task lookup failed: ${message}`);
else getUI().log.error(`Task lookup failed: ${message}`);
process.exit(ExitCode.GENERAL_ERROR);
}
@@ -336,12 +338,13 @@
}),
handler: (argv) => {
void (async () => {
- const opts = await resolveCommonOpts({
- installDir: argv['install-dir'] as string | undefined,
- json: argv.json as boolean | undefined,
- human: argv.human as boolean | undefined,
- });
+ let opts: CommonOpts | undefined;
try {
+ opts = await resolveCommonOpts({
+ installDir: argv['install-dir'] as string | undefined,
+ json: argv.json as boolean | undefined,
+ human: argv.human as boolean | undefined,
+ });
const { getOrchestrationStore } = await import(
'../lib/orchestration/store.js'
);
@@ -397,7 +400,7 @@
process.exit(ExitCode.SUCCESS);
} catch (e) {
const message = e instanceof Error ? e.message : String(e);
- if (opts.jsonOutput)
+ if (opts?.jsonOutput)
emitJsonError(`sessions listing failed: ${message}`);
else getUI().log.error(`Sessions listing failed: ${message}`);
process.exit(ExitCode.GENERAL_ERROR);
@@ -426,13 +429,14 @@
}),
handler: (argv) => {
void (async () => {
- const opts = await resolveCommonOpts({
- installDir: argv['install-dir'] as string | undefined,
- json: argv.json as boolean | undefined,
- human: argv.human as boolean | undefined,
- });
+ let opts: CommonOpts | undefined;
const idRaw = argv.id as string;
try {
+ opts = await resolveCommonOpts({
+ installDir: argv['install-dir'] as string | undefined,
+ json: argv.json as boolean | undefined,
+ human: argv.human as boolean | undefined,
+ });
const { getOrchestrationStore } = await import(
'../lib/orchestration/store.js'
);
@@ -490,7 +494,8 @@
process.exit(ExitCode.SUCCESS);
} catch (e) {
const message = e instanceof Error ? e.message : String(e);
- if (opts.jsonOutput) emitJsonError(`session lookup failed: ${message}`);
+ if (opts?.jsonOutput)
+ emitJsonError(`session lookup failed: ${message}`);
else getUI().log.error(`Session lookup failed: ${message}`);
process.exit(ExitCode.GENERAL_ERROR);
}
@@ -525,14 +530,15 @@
}),
handler: (argv) => {
void (async () => {
- const opts = await resolveCommonOpts({
- installDir: argv['install-dir'] as string | undefined,
- json: argv.json as boolean | undefined,
- human: argv.human as boolean | undefined,
- });
+ let opts: CommonOpts | undefined;
const sessionIdRaw = argv['session-id'] as string;
const execute = Boolean(argv.execute);
try {
+ opts = await resolveCommonOpts({
+ installDir: argv['install-dir'] as string | undefined,
+ json: argv.json as boolean | undefined,
+ human: argv.human as boolean | undefined,
+ });
const { getOrchestrationStore } = await import(
'../lib/orchestration/store.js'
);
@@ -619,7 +625,7 @@
// CLI failure.
child.on('error', (err) => {
const message = err instanceof Error ? err.message : String(err);
- if (opts.jsonOutput)
+ if (opts?.jsonOutput)
emitJsonError(`Failed to spawn resume command: ${message}`);
else
getUI().log.error(`Failed to spawn resume command: ${message}`);
@@ -633,7 +639,7 @@
process.exit(ExitCode.SUCCESS);
} catch (e) {
const message = e instanceof Error ? e.message : String(e);
- if (opts.jsonOutput) emitJsonError(`resume failed: ${message}`);
+ if (opts?.jsonOutput) emitJsonError(`resume failed: ${message}`);
else getUI().log.error(`Resume failed: ${message}`);
process.exit(ExitCode.GENERAL_ERROR);
}
@@ -660,12 +666,13 @@
}),
(argv) => {
void (async () => {
- const opts = await resolveCommonOpts({
- installDir: argv['install-dir'],
- json: argv.json as boolean | undefined,
- human: argv.human as boolean | undefined,
- });
+ let opts: CommonOpts | undefined;
try {
+ opts = await resolveCommonOpts({
+ installDir: argv['install-dir'],
+ json: argv.json as boolean | undefined,
+ human: argv.human as boolean | undefined,
+ });
const { getOrchestrationStore } = await import(
'../lib/orchestration/store.js'
);
@@ -734,7 +741,7 @@
process.exit(ExitCode.SUCCESS);
} catch (e) {
const message = e instanceof Error ? e.message : String(e);
- if (opts.jsonOutput)
+ if (opts?.jsonOutput)
emitJsonError(`orchestration status failed: ${message}`);
else getUI().log.error(`Orchestration status failed: ${message}`);
process.exit(ExitCode.GENERAL_ERROR);You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit be4dbdd. Configure here.
| else getUI().log.error(`Tasks listing failed: ${message}`); | ||
| process.exit(ExitCode.GENERAL_ERROR); | ||
| } | ||
| })(); |
There was a problem hiding this comment.
Unhandled errors in resolveCommonOpts silently crash commands
Low Severity
In all six command handlers, resolveCommonOpts is called outside the try/catch block. If it rejects (e.g., process.cwd() throws because the working directory was deleted, or a dynamic import fails), the error propagates out of the void (async () => { … })() wrapper as an unhandled promise rejection. The process terminates with exit code 1 and no structured error output (emitJsonError or getUI().log.error) instead of the expected ExitCode.GENERAL_ERROR with a descriptive message. Moving resolveCommonOpts inside the try block (with a fallback for opts.jsonOutput) would make error reporting consistent with other failure paths.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit be4dbdd. Configure here.
…lers (#759) resolveCommonOpts was awaited OUTSIDE the try block in each of: tasksCommand, taskCommand, sessionsCommand, sessionCommand, resumeCommand, and `orchestration status`. If it rejected (process.cwd() failed, dynamic import failed, or an unexpected throw from mode-config / install-dir resolution), the rejection propagated out of the `void (async () => {...})()` wrapper as an unhandled promise. Node terminated with exit 1 and no structured error output, instead of the expected ExitCode.GENERAL_ERROR + emitJsonError / log.error path that the rest of the handler honors. Lift the call inside try and use optional chaining `opts?.jsonOutput` in catch blocks so a failure during resolveCommonOpts itself routes through the same JSON-error / human-error reporting as any other failure in the command body. Bugbot finding on PR #689 (merged); fixing post-merge.



TL;DR
OrchestrationStore),Tasklifecycle state machine, and six new read-only inspection commands (tasks,task,sessions,session,resume,orchestration status) — all behind Zod-validated--jsonenvelopes.src/lib/orchestration/__tests__/lifecycle.test.ts(transition matrix; illegal transitions throwIllegalTaskTransitionErrorrather than corrupting state).Problem
Wizard's
WizardSession(src/lib/wizard-session.ts) is the de-facto orchestration state, but it's an in-memory snapshot held in a single process — there's no durable, machine-readable surface for "what's running, what stopped, what's the resume command." Outer agents that wrap the wizard scrape TUI text or greplog.ndjson. Status JSON exists for a few specific commands (wizard status,wizard projects list) but each shape is ad-hoc — there is no unified envelope.Task lifecycle is implicit in the
tasksarray (booleansdone/error). Last-stopping-point is partially captured bysession-checkpoint.tsbut only stores intro-phase state, not active task ownership of branches/PRs/worktrees.Why this is "PR 1 of 3"
This PR introduces a single durable orchestration store that becomes the source of truth for sessions, tasks, subagents, ownership, last-stopping-point, and structured task results.
Foundation only. The TUI redesign, the user-choice / verification / MCP-app lifecycle plumbing, and the MCP-server tool surface deferred to PRs 2 and 3:
WizardSessionstays live alongside.PendingCheckpointwith concrete schemas, routes existing user-choice / event-plan-confirm / MCP-app prompt sites through the store. ThependingChoices/pendingMcpActions/pendingManualVerificationsarrays start carrying real content.Architecture is documented in
docs/orchestration.md.State model
Task lifecycle
Implemented in
src/lib/orchestration/lifecycle.ts. TheassertTransition(taskId, from, to)helper is the trust boundary — illegal transitions throwIllegalTaskTransitionErrorrather than corrupt persisted state.New commands
wizard tasks--state,--session-idwizard task <id>wizard sessionswizard session <id>wizard resume <session-id>--execute) the resume commandwizard orchestration statusLastStoppingPointsnapshotEvery command supports
--json(auto-enabled when stdout isn't a TTY) and validates its JSON payload against a Zod envelope schema before writing. A regression in the producer surfaces as a thrown ZodError on the producer side rather than silent corruption downstream.Example human output
Example JSON envelope (
wizard orchestration status --json){ "v": 1, "type": "orchestration_status", "generatedAt": "2026-05-09T12:34:56.789Z", "installDir": "/Users/me/myapp", "storePath": "/Users/me/.amplitude/wizard/runs/3d8f2a1b9c4e/orchestration.json", "storeExists": true, "lastStoppingPoint": { "generatedAt": 1715258096789, "currentSessionId": "session_a3f9e7c2d1b48a09f5c6", "currentGoal": "set up Amplitude in nextjs", "currentBranch": "feat/amplitude-setup", "currentWorktree": "/Users/me/myapp", "activeTasks": [ { "id": "task_b1c2d3e4f5a6b7c8d9e0", "sessionId": "session_a3f9e7c2d1b48a09f5c6", "label": "event plan confirmation", "state": "waiting_for_user", "ownership": [], "subagentKind": "instrumentation", "createdAt": 1715258090000, "updatedAt": 1715258091000, "startedAt": 1715258090500, "waitingFor": { "id": "cp_event_plan", "kind": "event_plan_confirm", "summary": "review the proposed events.json", "enteredAt": 1715258091000 } } ], "stoppedTasks": [], "recentlyCompletedTasks": [], "relevantOwnership": [], "pendingChoices": [ { "id": "cp_event_plan", "kind": "event_plan_confirm", "summary": "review the proposed events.json", "enteredAt": 1715258091000 } ], "pendingMcpActions": [], "pendingManualVerifications": [], "nextAction": { "kind": "await_user_choice", "description": "A task is waiting for user input: review the proposed events.json.", "command": ["amplitude-wizard", "--install-dir", "/Users/me/myapp"] }, "resumeCommand": "amplitude-wizard --install-dir /Users/me/myapp" } }Schemas
All persisted shapes have a runtime Zod validator in
src/lib/orchestration/schemas.ts. The on-disk envelope carries an explicitversion: 1literal — a version-mismatched file returnskind: 'corrupt'fromloadStoreso readers can distinguish "no store yet" from "found a store but couldn't parse it." Seedocs/orchestration.mdfor the full schema reference.Exit codes
The new commands extend the existing
ExitCodecontract — seedocs/exit-codes.md:--statevalue, malformedtask_<id>/session_<id>prefix, unknown idThese commands are read-only and never trigger auth flows or network calls, so codes 3 / 4 / 10 / 11 / 12 / 13 / 20 are not reachable.
Performance / cost
Each task transition triggers ≤ 1 atomic write of a small JSON file (temp-file + rename via
atomicWriteJSON, mode0o600). A typical wizard run touches a few hundred transitions; the orchestration file therefore stays well under the I/O budget the wizard already spends onruns/<hash>/log.ndjsonwrites. PR 3 will add a debounced in-memory cache for high-frequency call sites if needed.Backward compatibility
WizardSessionis untouched. Every existing call site continues to read from / write to the in-memory snapshot.src/run.ts). Mirror failures are logged and swallowed — they cannot block the wizard run.v: 1), and MCP server behavior are preserved.--cache-dir) was added so.strict()accepts the existingAMPLITUDE_WIZARD_CACHE_DIRenv-var auto-mapping on every command (the env var was already read bysrc/utils/storage-paths.ts:getCacheRoot; this PR just unblocks.strict()).Tests added (48 total, all passing)
src/lib/orchestration/__tests__/lifecycle.test.ts— 10 tests; exhaustive transition matrix, identity rejection, terminal-state outbound rejection, illegal-transition error message contract.src/lib/orchestration/__tests__/schemas.test.ts— 14 tests; round-trip tests forTask,Session,Subagent,Ownership,TaskResult,OrchestrationStoreFile,StatusEnvelope. Negative cases for malformed ids, unknown lifecycle states, broken discriminated unions, wrong version literal.src/lib/orchestration/__tests__/store.test.ts— 11 tests; create/list/transition round-trips, illegal-transition rejection, terminal result stamping, idempotent ownership, atomic-write durability (a thrown write leaves the prior file intact and orphan-free), corrupt store handling for invalid JSON and schema mismatch.src/lib/orchestration/__tests__/last-stopping-point.test.ts— 5 tests; empty store snapshot, populated-store grouping, auth-blocked → fix_auth next-action, 24-hour window expiry, ownership aggregation.src/commands/__tests__/orchestration.test.ts— 8 CLI smoke tests spawning the realbin.tsagainst a tmp install dir. Validates JSON output of every new command against its Zod envelope schema. Checks exit codes for both happy-path and error cases (bad state filter, nonexistent ids).Test plan
pnpm exec vitest run --pool=forks --maxWorkers=1 src/lib/orchestration/ src/commands/__tests__/orchestration.test.ts— 48/48 passingpnpm exec vitest run --pool=forks --maxWorkers=1— full suite 3828/3828 passingpnpm test:bdd— 100 scenarios / 445 steps passingpnpm lint— clean (one pre-existing warning inEventPlanFullScreen.test.tsx, untouched)pnpm build— compiles + smoke test passesManual verification
node dist/bin.js orchestration status --jsonin a fresh project — empty store,nextAction.kind === "none".orchestration status— observe a populated store,currentSessionIdset.wizard tasks --state running— filter takes effect.wizard task <bogus-id>— exit code 2.wizard resume <session-id>— prints (does not execute) a resume command. Pass--executeto actually invoke it.~/.amplitude/wizard/runs/<hash>/orchestration.json— file mode is0o600, JSON validates againstOrchestrationStoreFileSchema.Known limitations (deferred)
src/ui/tui/is unchanged. (PR 3.)amplitude-wizard mcp serve) hasn't gained read-only tools that wrap the store. (PR 3.)tasks.push(...)sites insrc/ui/tui/store.tsandsrc/lib/wizard-session.tscontinue to use the legacy in-memory shape. The orchestration store is mirrored only from session start insrc/run.ts. (PR 2 widens this; PR 3 retires the duplicate.)pendingChoices/pendingMcpActions/pendingManualVerificationsare stub arrays in PR 1 — the schema is stable but the producer sites land in PR 2.Follow-up roadmap
PendingCheckpointschemas; route existing user-choice / event-plan-confirm / MCP-app prompt sites through the store; checkpoints + MCP-app lifecycle.🤖 Generated with Claude Code
Note
Medium Risk
Adds a new file-backed orchestration store plus multiple new CLI commands and JSON contracts; while largely read-only, it introduces new persistence and exit-path behavior that could affect tooling and session startup in edge cases (permissions/corrupt state).
Overview
Introduces a new durable v2 orchestration store under
src/lib/orchestration/(file-backed JSON, atomic writes, Zod-validated schemas) with an explicitTaskLifecyclestate machine andcomputeLastStoppingPointderivation for “what to do next”/resume hints.Adds six new read-only inspection commands (
tasks,task <id>,sessions,session <id>,resume <session-id> [--execute],orchestration status) that emit Zod-validated JSON envelopes (auto-JSON when piped) and standardized exit codes; wires them intobin.tsand exports.Mirrors wizard session start into the orchestration store from
src/run.ts(best-effort, errors logged and swallowed), makesCLI_INVOCATIONa constantnpx @amplitude/wizardfor stable resume commands, adds a hidden--cache-dirpassthrough forAMPLITUDE_WIZARD_CACHE_DIR, and includes extensive new unit + CLI smoke tests plus new docs (docs/orchestration.md,docs/exit-codes.md).Reviewed by Cursor Bugbot for commit be4dbdd. Bugbot is set up for automated code reviews on this repo. Configure here.