Spawn Refactor & MCP Fixes#379
Conversation
Write ~/.agent-relay/active-daemon.json on daemon start so the MCP server can find the running socket even when process.cwd() differs. Also persists messages for recently-disconnected CLI users and updates CLI sender name to __cli_sender__ for dashboard visibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ternal agent Guard active daemon marker's projectRoot derivation to only extract project root when socket is in .agent-relay/ directory, avoiding incorrect "/" for /tmp/agent-relay.sock. Remove stale 'cli' from INTERNAL_AGENTS after renaming to __cli_sender__. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ntSpawner Fixes relay-pty binary resolution when installed via npx by routing all spawn/release operations through the daemon's SpawnManager (which has proper access to relay-pty), rather than having the dashboard-server create its own AgentSpawner. Changes: - Protocol: Add SEND_INPUT, LIST_WORKERS message types and payloads - SDK: Add sendWorkerInput(), listWorkers() methods; extend spawn() opts - Daemon: Add handleSendInput/handleListWorkers to SpawnManager; add getSpawnManager() getter for co-located dashboard - Wrapper: Rewrite spawn/release to use daemon-first fallback chain; only fall back to dashboard API on transport errors (not policy rejections) - CLI: Pass daemon.getSpawnManager() to startDashboard() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- SDK: Accept optional spawnerName in spawn() options, falling back to client's agentName. Fixes dashboard losing the original requester's spawnerName when routing through SDK. - Docs: Update protocol.md with spawn/worker management message types, daemon.md with SEND_INPUT/LIST_WORKERS handlers and getSpawnManager(), SDK README with sendWorkerInput()/listWorkers() API docs, and OpenAPI spec with new request/response schemas. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Records decisions and rationale for routing spawn/release through SDK/daemon instead of dashboard AgentSpawner (github-374). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| // Also clean up stale recently-disconnected user entries | ||
| for (const [name, timestamp] of this.recentlyDisconnectedUsers) { | ||
| if (now - timestamp > Router.RECENT_USER_TTL_MS) { | ||
| this.recentlyDisconnectedUsers.delete(name); | ||
| routerLog.debug(`Cleaned up stale recently-disconnected user: ${name}`); | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 Memory leak: recentlyDisconnectedUsers cleanup only runs during agent spawning
The recentlyDisconnectedUsers map tracks disconnected user connections for 5 minutes to ensure messages to them are persisted. However, cleanup of stale entries only occurs in cleanupStaleSpawning(), which is exclusively called from markSpawning() (packages/daemon/src/router.ts:216).
Root Cause
The cleanup logic at packages/daemon/src/router.ts:254-260 runs inside cleanupStaleSpawning(), but this method is only invoked when an agent is being spawned. If no agents are spawned for an extended period, stale entries accumulate indefinitely.
// router.ts:210-217
markSpawning(agentName: string): void {
this.spawningAgents.set(agentName, Date.now());
routerLog.info(`Agent marked as spawning: ${agentName}`, {
currentSpawning: Array.from(this.spawningAgents.keys()),
});
// Clean up stale spawning entries
this.cleanupStaleSpawning(); // ← Only called here
}
// router.ts:254-260
private cleanupStaleSpawning(): void {
// ... spawning cleanup ...
// Also clean up stale recently-disconnected user entries
for (const [name, timestamp] of this.recentlyDisconnectedUsers) {
if (now - timestamp > Router.RECENT_USER_TTL_MS) {
this.recentlyDisconnectedUsers.delete(name);
routerLog.debug(`Cleaned up stale recently-disconnected user: ${name}`);
}
}
}Impact: In long-running daemon instances with infrequent agent spawning, the recentlyDisconnectedUsers map grows unbounded with stale entries. Each CLI command that connects as __cli_sender__ and disconnects adds an entry that won't be cleaned up until the next spawn operation.
Example scenario: A daemon running for hours with agents only sending/receiving messages (no spawning) will accumulate every disconnected user entry from CLI commands.
Prompt for agents
Add a periodic cleanup mechanism for recentlyDisconnectedUsers that runs independently of spawning operations. Options:
1. Create a separate periodic timer in the Router constructor that runs cleanupStaleSpawning() every 5 minutes
2. Add cleanup logic to other frequently-called methods like register() or unregister()
3. Extract the recentlyDisconnectedUsers cleanup into a separate method and call it from both cleanupStaleSpawning() and from a periodic interval
Recommended approach is option 3 - create a cleanupStaleRecentUsers() method and call it both from cleanupStaleSpawning() and set up a periodic interval (every 5 minutes) in the Router constructor that calls both cleanup methods. Remember to clear the interval in a destructor/cleanup method if one exists.
Was this helpful? React with 👍 or 👎 to provide feedback.
| // 3. Final fallback: onRelease callback | ||
| if (this.config.onRelease) { | ||
| await this.config.onRelease(name); | ||
| return; | ||
| } |
There was a problem hiding this comment.
🟡 Missing error handling when no release mechanism available in executeReleaseWithFallbacks
The executeReleaseWithFallbacks() method silently returns without error when no release mechanism is available (daemon not connected, no dashboard, no callback). This is inconsistent with the equivalent spawn method which throws an error.
Root Cause
At packages/wrapper/src/relay-pty-orchestrator.ts:1484-1488, if all release mechanisms fail or are unavailable, the function silently returns:
// 3. Final fallback: onRelease callback
if (this.config.onRelease) {
await this.config.onRelease(name);
return;
}
// ← Function ends here with implicit undefined returnCompare this to executeSpawnWithFallbacks() at line 1432 which explicitly throws:
if (this.config.onSpawn) {
this.log(`Spawning ${name} via onSpawn callback`);
await this.config.onSpawn(name, cli, task);
return;
}
throw new Error(`No spawn mechanism available (client=${this.client.state}, dashboardPort=${this.config.dashboardPort}, onSpawn=${!!this.config.onSpawn})`);Impact: When an agent attempts to release another agent but:
- Daemon is not connected (
client.state !== 'READY') - Dashboard API is not available (
config.dashboardPortis undefined) - No
onReleasecallback is configured
The release silently fails with no indication to the calling agent. The calling code at line 1448 catches and logs the error, but since no error is thrown, it appears the release succeeded when it actually did nothing.
Actual behavior: Release command is silently ignored
Expected behavior: Throw an error indicating no release mechanism is available, matching the spawn behavior
| // 3. Final fallback: onRelease callback | |
| if (this.config.onRelease) { | |
| await this.config.onRelease(name); | |
| return; | |
| } | |
| // 3. Final fallback: onRelease callback | |
| if (this.config.onRelease) { | |
| await this.config.onRelease(name); | |
| return; | |
| } | |
| throw new Error(`No release mechanism available (client=${this.client.state}, dashboardPort=${this.config.dashboardPort}, onRelease=${!!this.config.onRelease})`); | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
- Extract recentlyDisconnectedUsers cleanup into its own method and call it from unregister() so entries are cleaned on every disconnect, not only during spawn operations - Throw error when no release mechanism is available, matching the existing spawn fallback behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Uh oh!
There was an error while loading. Please reload this page.