Skip to content

[Spec 786] Multi-architect lifecycle, persistence, and UX#822

Merged
waleedkadous merged 65 commits into
mainfrom
builder/spir-786
May 22, 2026
Merged

[Spec 786] Multi-architect lifecycle, persistence, and UX#822
waleedkadous merged 65 commits into
mainfrom
builder/spir-786

Conversation

@waleedkadous
Copy link
Copy Markdown
Contributor

Summary

Multi-architect feature was underbaked after Specs 755/761/774. This PR closes the lifecycle, persistence, and UX gaps so a user can add, manage, evict, and recover sibling architects with the same fluency as builders.

Closes #786
Closes #764 (folded into Phase 4 — solo-architect tab label restored to 'Architect' at N=1)

Changes

Seven phases, each committed independently:

  1. Foundation utilities (validateArchitectName reserved 'main', removeArchitect helper, clearRuntime split)
  2. Identity preservation on shellper auto-restart (CODEV_ARCHITECT_NAME re-injection in tower-terminals.ts reconciliation paths)
  3. Graceful-stop persistence (intentional-stop flag, 6 exit handlers, launchInstance reconciliation, stop.tsclearRuntime)
  4. remove-architect + dashboard UX + Mobile: solo-architect tab label regresses from 'Architect' to 'main' after #762 #764 (CLI + REST DELETE + Tower handler + dashboard close button + confirmation modal + active-tab fallback + solo-architect label)
  5. Surface enumeration (v1 collapse removed, per-architect emission with architectName/pid/port/terminalId, loadState collection-aware, afx status shows all architects)
  6. VSCode multi-architect surface (expandable Architects tree, per-name terminal slots, parameterised codev.openArchitectTerminal, new codev.removeArchitect + right-click context menu)
  7. Documentation + verify scaffolding (agent-farm.md, arch.md, CHANGELOG, verify-scenarios.md with 12 manual round-trip scenarios)

Testing

  • All unit tests passing: codev 3016 (+ 16 from Spec 786), dashboard 295 (+ 11), vscode 21 (new vitest setup)
  • Pre-existing scrollController flake noted in review; not Spec 786 related
  • Manual verify scenarios scripted in codev/projects/786-.../verify-scenarios.md (12 scenarios covering headline round-trip, persistence, crash recovery, permanent exit, naming, architect-to-architect, surface enumeration, dashboard UX, VSCode UX, stop-all)

CMAP iterations

  • Spec: 5 rounds to convergence (Codex's findings narrowed each round — initial repo-state diagnoses wrong; iter-5 COMMENT)
  • Plan: 2 rounds to convergence (Gemini APPROVE both rounds; Codex REQUEST_CHANGES iter-1 with 4 findings, all addressed; iter-2 all COMMENT or APPROVE)
  • Each implementation phase: 1-2 rounds; substantive findings addressed via iter-1 rebuttal+fix commits

Spec / Plan / Review

  • Spec: `codev/specs/786-multi-architect-feature-is-und.md`
  • Plan: `codev/plans/786-multi-architect-feature-is-und.md`
  • Review: `codev/reviews/786-multi-architect-feature-is-und.md`
  • Verify scenarios: `codev/projects/786-multi-architect-feature-is-und/verify-scenarios.md`

Breaking-ish change

`afx workspace stop` no longer wipes the architect registry (now uses `clearRuntime()`). Callers that wanted the full-wipe behaviour should switch to the dashboard's stop-all (or POST the workspace stop API) or call `clearState()` directly. See CHANGELOG `[Unreleased]` for the full list.

…erved 'main', removeArchitect helper, clearRuntime split
…tTerminalsForWorkspace reconnect path + fix stale comment
…n + loadState collection-aware + afx status enumeration
…t entries + shared TerminalEntry type + status-naming Phase 5 tests
…ree + per-name terminal slots + right-click remove
…ace + afx-open + afx-tower-stop CHANGELOG + afx status section
…oveArchitect + openArchitect handles stale terminalId
…llyStopping flag

Architect's integration CMAP found that killTerminalWithShellper only sends
SIGTERM; node-pty's 'exit' event fires later on its own tick. The finally
block in stopInstance was clearing the intentionallyStopping flag before the
cascaded exit handlers ran, so they read the flag as false and deleted the
persisted state.db.architect rows — defeating Phase 3's persistence story.

Fix: register a waitForTerminalExit(terminalId) promise BEFORE each kill,
then await Promise.all before letting finally clear the flag. 5s per-terminal
timeout safety so a stuck process can't block shutdown indefinitely. Applied
symmetrically to removeArchitect (race was harmless there due to idempotent
setArchitectByName, but keeping the paths symmetric prevents future divergence).

Behavioral test exercises real async timing via an EventEmitter mock that
emits 'exit' on the next tick — captures the flag at exit-time, asserts it's
still true. Without the fix this test would observe false, catching the
production failure mode that unit tests with synchronous mocks miss.
…FORE kill loop

Codex PR iter-2 caught a second-order race in the same family as the
stopInstance one. handleWorkspaceStopAll kills then clears the registry
synchronously, but the cascaded architect exit handlers run later and
recover the architect name by scanning currentEntry.architects — which was
already cleared. So the name lookup returns null, setArchitectByName(name,
null) never runs, and stale state.db.architect rows survive what's supposed
to be a full wipe.

Fix: iterate entry.architects.keys() and explicitly call
setArchitectByName(name, null) BEFORE the kill loop. Idempotent — even if
the exit handler somehow runs first, the second deletion is a no-op.

Regression test pins the ordering at the source level: brace-match
handleWorkspaceStopAll, assert the delete loop exists, assert it comes
before the kill loop. The existing 'no intentionallyStopping reference'
sentinel was insufficient — it only covered half the full-wipe property.
@waleedkadous waleedkadous merged commit d8e2307 into main May 22, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant