Summary
In v3.1.1, opening a workspace other than the one that registered sibling architects causes those siblings to be re-spawned in the new workspace. Sibling architects leak across workspaces.
Reported by user immediately after v3.1.1 installation: creating workspace manazil showed bug-backlog and ob-refine architects from shannon running in manazil's terminal list (with NEW PIDs, meaning they were actually spawned, not just listed).
Repro
- Tower is started from
/Users/mwk/Development/cluesmith/codev
- Shannon adds sibling architects:
afx workspace add-architect --name ob-refine (in shannon)
- Shannon's state.db.architect row gets written, BUT the row lives in
codev/.agent-farm/state.db (the singleton from getDb(), anchored to Tower's CWD)
- User opens manazil workspace
launchInstance(manazil) runs
launchInstance iterates state.db.architect (the global codev-side table) and finds ob-refine
launchInstance re-spawns ob-refine as a manazil architect with a new PID
- Manazil's
/api/state now reports architects: [main, ob-refine, bug-backlog] — all running, all real PTYs, none of them legitimately belonging to manazil
Diagnosis (verified)
$ sqlite3 /Users/mwk/Development/cluesmith/codev/.agent-farm/state.db "SELECT id FROM architect;"
bug-backlog
main
ob-refine
$ sqlite3 /Users/mwk/Development/cluesmith/shannon/.agent-farm/state.db "SELECT id FROM architect;"
(empty — shannon's siblings live in codev/.agent-farm/state.db because state.db is anchored to Tower's CWD)
$ sqlite3 /Users/mwk/Development/cluesmith/codev/.agent-farm/state.db ".schema architect"
CREATE TABLE IF NOT EXISTS "architect" (
id TEXT PRIMARY KEY,
pid INTEGER NOT NULL,
port INTEGER NOT NULL,
cmd TEXT NOT NULL,
started_at TEXT NOT NULL DEFAULT (datetime('now')),
terminal_id TEXT
);
No workspace_path column. The table is global per Tower daemon, not per workspace.
Meanwhile terminal_sessions (in ~/.agent-farm/global.db) DOES have a workspace_path column:
CREATE TABLE terminal_sessions (
id TEXT PRIMARY KEY,
workspace_path TEXT NOT NULL,
type TEXT NOT NULL CHECK(type IN ('architect', 'builder', 'shell')),
role_id TEXT,
...
)
So the workspace-scoping data exists; it's just not joined into the architect reconcile path.
Why this is new in v3.1.1
This issue was flagged by Codex at #786 plan-iter-3 Co1 ("workspace-scoping") and explicitly accepted-as-out-of-scope by the architect because Spec 786 defers cross-workspace concerns:
"Cross-workspace routing. Architects in workspace A cannot address architects in workspace B. Deferred previously; stays deferred."
But the bug ISN'T about routing — it's about persistence leaking architects across workspaces via the launchInstance reconcile loop that #786 added to deliver graceful-stop persistence. Pre-#786, launchInstance only created main; the global table was harmless because nothing iterated it. #786's new iterate-and-respawn loop turns the pre-existing global table into an active cross-workspace leak.
Fix shape
Two options:
Option A (proper, schema migration)
Add workspace_path TEXT NOT NULL column to state.db.architect. Migrate existing rows by joining with terminal_sessions on id = role_id to populate workspace_path. Update setArchitect, setArchitectByName, removeArchitect, loadState, and the launchInstance reconcile to scope by workspace. Most correct but a schema migration with multiple touch points.
Option B (quick, uses existing data)
Modify the launchInstance reconcile loop to only re-spawn architects that have a matching terminal_sessions row WITH workspace_path = <this workspace>. The terminal_sessions table already has the workspace scoping; join on terminal_sessions.role_id = architect.id AND terminal_sessions.type = 'architect' AND terminal_sessions.workspace_path = <this workspace>. No schema change; reconcile is correctly scoped.
Recommend Option B for v3.1.2 (urgent hotfix). Option A is the right long-term fix and can ship as part of a follow-up architectural cleanup.
Severity
Critical. v3.1.1 just shipped (~30 min ago). Anyone with multiple workspaces and a non-main sibling architect anywhere will see leaks the moment they open a different workspace. The leaked architects are real running processes consuming resources (~50MB+ RAM per claude session, plus their claude --dangerously-skip-permissions shell harnesses) AND they could interfere with the actual workspace's workflow if the user accidentally talks to them.
Workaround until fix lands
Don't open additional workspaces in Tower if you have non-main siblings registered anywhere. Or kill the leaked terminals via the dashboard sidebar X button (which kills the PTY but leaves the state.db row intact for the legitimate workspace).
Cleanup of leaked architects
Direct afx workspace remove-architect <name> from a leaked workspace would also delete the state.db row, which corrupts the legitimate owner workspace's state. Per-PTY kill via the dashboard is the safer manual cleanup until the proper fix lands.
Related
Summary
In v3.1.1, opening a workspace other than the one that registered sibling architects causes those siblings to be re-spawned in the new workspace. Sibling architects leak across workspaces.
Reported by user immediately after v3.1.1 installation: creating workspace
manazilshowedbug-backlogandob-refinearchitects fromshannonrunning in manazil's terminal list (with NEW PIDs, meaning they were actually spawned, not just listed).Repro
/Users/mwk/Development/cluesmith/codevafx workspace add-architect --name ob-refine(in shannon)codev/.agent-farm/state.db(the singleton fromgetDb(), anchored to Tower's CWD)launchInstance(manazil)runslaunchInstanceiteratesstate.db.architect(the global codev-side table) and findsob-refinelaunchInstancere-spawns ob-refine as a manazil architect with a new PID/api/statenow reportsarchitects: [main, ob-refine, bug-backlog]— all running, all real PTYs, none of them legitimately belonging to manazilDiagnosis (verified)
No
workspace_pathcolumn. The table is global per Tower daemon, not per workspace.Meanwhile
terminal_sessions(in~/.agent-farm/global.db) DOES have aworkspace_pathcolumn:So the workspace-scoping data exists; it's just not joined into the architect reconcile path.
Why this is new in v3.1.1
This issue was flagged by Codex at #786 plan-iter-3 Co1 ("workspace-scoping") and explicitly accepted-as-out-of-scope by the architect because Spec 786 defers cross-workspace concerns:
But the bug ISN'T about routing — it's about persistence leaking architects across workspaces via the launchInstance reconcile loop that #786 added to deliver graceful-stop persistence. Pre-#786,
launchInstanceonly createdmain; the global table was harmless because nothing iterated it. #786's new iterate-and-respawn loop turns the pre-existing global table into an active cross-workspace leak.Fix shape
Two options:
Option A (proper, schema migration)
Add
workspace_path TEXT NOT NULLcolumn tostate.db.architect. Migrate existing rows by joining withterminal_sessionsonid = role_idto populateworkspace_path. UpdatesetArchitect,setArchitectByName,removeArchitect,loadState, and the launchInstance reconcile to scope by workspace. Most correct but a schema migration with multiple touch points.Option B (quick, uses existing data)
Modify the launchInstance reconcile loop to only re-spawn architects that have a matching
terminal_sessionsrow WITHworkspace_path = <this workspace>. Theterminal_sessionstable already has the workspace scoping; join onterminal_sessions.role_id = architect.id AND terminal_sessions.type = 'architect' AND terminal_sessions.workspace_path = <this workspace>. No schema change; reconcile is correctly scoped.Recommend Option B for v3.1.2 (urgent hotfix). Option A is the right long-term fix and can ship as part of a follow-up architectural cleanup.
Severity
Critical. v3.1.1 just shipped (~30 min ago). Anyone with multiple workspaces and a non-main sibling architect anywhere will see leaks the moment they open a different workspace. The leaked architects are real running processes consuming resources (~50MB+ RAM per claude session, plus their
claude --dangerously-skip-permissionsshell harnesses) AND they could interfere with the actual workspace's workflow if the user accidentally talks to them.Workaround until fix lands
Don't open additional workspaces in Tower if you have non-
mainsiblings registered anywhere. Or kill the leaked terminals via the dashboard sidebar X button (which kills the PTY but leaves the state.db row intact for the legitimate workspace).Cleanup of leaked architects
Direct
afx workspace remove-architect <name>from a leaked workspace would also delete the state.db row, which corrupts the legitimate owner workspace's state. Per-PTY kill via the dashboard is the safer manual cleanup until the proper fix lands.Related