Skip to content

Spawn Refactor & MCP Fixes#379

Merged
khaliqgant merged 9 commits into
mainfrom
sdk-spawn-refactor
Feb 6, 2026
Merged

Spawn Refactor & MCP Fixes#379
khaliqgant merged 9 commits into
mainfrom
sdk-spawn-refactor

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

@khaliqgant khaliqgant commented Feb 5, 2026

khaliqgant and others added 5 commits February 5, 2026 11:42
Write ~/.agent-relay/active-daemon.json on daemon start so the MCP
server can find the running socket even when process.cwd() differs.
Also persists messages for recently-disconnected CLI users and updates
CLI sender name to __cli_sender__ for dashboard visibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ternal agent

Guard active daemon marker's projectRoot derivation to only extract
project root when socket is in .agent-relay/ directory, avoiding
incorrect "/" for /tmp/agent-relay.sock. Remove stale 'cli' from
INTERNAL_AGENTS after renaming to __cli_sender__.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ntSpawner

Fixes relay-pty binary resolution when installed via npx by routing all
spawn/release operations through the daemon's SpawnManager (which has
proper access to relay-pty), rather than having the dashboard-server
create its own AgentSpawner.

Changes:
- Protocol: Add SEND_INPUT, LIST_WORKERS message types and payloads
- SDK: Add sendWorkerInput(), listWorkers() methods; extend spawn() opts
- Daemon: Add handleSendInput/handleListWorkers to SpawnManager; add
  getSpawnManager() getter for co-located dashboard
- Wrapper: Rewrite spawn/release to use daemon-first fallback chain;
  only fall back to dashboard API on transport errors (not policy rejections)
- CLI: Pass daemon.getSpawnManager() to startDashboard()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- SDK: Accept optional spawnerName in spawn() options, falling back to
  client's agentName. Fixes dashboard losing the original requester's
  spawnerName when routing through SDK.
- Docs: Update protocol.md with spawn/worker management message types,
  daemon.md with SEND_INPUT/LIST_WORKERS handlers and getSpawnManager(),
  SDK README with sendWorkerInput()/listWorkers() API docs, and OpenAPI
  spec with new request/response schemas.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Records decisions and rationale for routing spawn/release through
SDK/daemon instead of dashboard AgentSpawner (github-374).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread packages/daemon/src/router.ts Outdated
Comment on lines +254 to +260
// Also clean up stale recently-disconnected user entries
for (const [name, timestamp] of this.recentlyDisconnectedUsers) {
if (now - timestamp > Router.RECENT_USER_TTL_MS) {
this.recentlyDisconnectedUsers.delete(name);
routerLog.debug(`Cleaned up stale recently-disconnected user: ${name}`);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Memory leak: recentlyDisconnectedUsers cleanup only runs during agent spawning

The recentlyDisconnectedUsers map tracks disconnected user connections for 5 minutes to ensure messages to them are persisted. However, cleanup of stale entries only occurs in cleanupStaleSpawning(), which is exclusively called from markSpawning() (packages/daemon/src/router.ts:216).

Root Cause

The cleanup logic at packages/daemon/src/router.ts:254-260 runs inside cleanupStaleSpawning(), but this method is only invoked when an agent is being spawned. If no agents are spawned for an extended period, stale entries accumulate indefinitely.

// router.ts:210-217
markSpawning(agentName: string): void {
  this.spawningAgents.set(agentName, Date.now());
  routerLog.info(`Agent marked as spawning: ${agentName}`, {
    currentSpawning: Array.from(this.spawningAgents.keys()),
  });
  // Clean up stale spawning entries
  this.cleanupStaleSpawning();  // ← Only called here
}

// router.ts:254-260
private cleanupStaleSpawning(): void {
  // ... spawning cleanup ...
  // Also clean up stale recently-disconnected user entries
  for (const [name, timestamp] of this.recentlyDisconnectedUsers) {
    if (now - timestamp > Router.RECENT_USER_TTL_MS) {
      this.recentlyDisconnectedUsers.delete(name);
      routerLog.debug(`Cleaned up stale recently-disconnected user: ${name}`);
    }
  }
}

Impact: In long-running daemon instances with infrequent agent spawning, the recentlyDisconnectedUsers map grows unbounded with stale entries. Each CLI command that connects as __cli_sender__ and disconnects adds an entry that won't be cleaned up until the next spawn operation.

Example scenario: A daemon running for hours with agents only sending/receiving messages (no spawning) will accumulate every disconnected user entry from CLI commands.

Prompt for agents
Add a periodic cleanup mechanism for recentlyDisconnectedUsers that runs independently of spawning operations. Options:

1. Create a separate periodic timer in the Router constructor that runs cleanupStaleSpawning() every 5 minutes
2. Add cleanup logic to other frequently-called methods like register() or unregister()
3. Extract the recentlyDisconnectedUsers cleanup into a separate method and call it from both cleanupStaleSpawning() and from a periodic interval

Recommended approach is option 3 - create a cleanupStaleRecentUsers() method and call it both from cleanupStaleSpawning() and set up a periodic interval (every 5 minutes) in the Router constructor that calls both cleanup methods. Remember to clear the interval in a destructor/cleanup method if one exists.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +1484 to 1488
// 3. Final fallback: onRelease callback
if (this.config.onRelease) {
await this.config.onRelease(name);
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Missing error handling when no release mechanism available in executeReleaseWithFallbacks

The executeReleaseWithFallbacks() method silently returns without error when no release mechanism is available (daemon not connected, no dashboard, no callback). This is inconsistent with the equivalent spawn method which throws an error.

Root Cause

At packages/wrapper/src/relay-pty-orchestrator.ts:1484-1488, if all release mechanisms fail or are unavailable, the function silently returns:

// 3. Final fallback: onRelease callback
if (this.config.onRelease) {
  await this.config.onRelease(name);
  return;
}
// ← Function ends here with implicit undefined return

Compare this to executeSpawnWithFallbacks() at line 1432 which explicitly throws:

if (this.config.onSpawn) {
  this.log(`Spawning ${name} via onSpawn callback`);
  await this.config.onSpawn(name, cli, task);
  return;
}

throw new Error(`No spawn mechanism available (client=${this.client.state}, dashboardPort=${this.config.dashboardPort}, onSpawn=${!!this.config.onSpawn})`);

Impact: When an agent attempts to release another agent but:

  • Daemon is not connected (client.state !== 'READY')
  • Dashboard API is not available (config.dashboardPort is undefined)
  • No onRelease callback is configured

The release silently fails with no indication to the calling agent. The calling code at line 1448 catches and logs the error, but since no error is thrown, it appears the release succeeded when it actually did nothing.

Actual behavior: Release command is silently ignored

Expected behavior: Throw an error indicating no release mechanism is available, matching the spawn behavior

Suggested change
// 3. Final fallback: onRelease callback
if (this.config.onRelease) {
await this.config.onRelease(name);
return;
}
// 3. Final fallback: onRelease callback
if (this.config.onRelease) {
await this.config.onRelease(name);
return;
}
throw new Error(`No release mechanism available (client=${this.client.state}, dashboardPort=${this.config.dashboardPort}, onRelease=${!!this.config.onRelease})`);
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

khaliqgant and others added 4 commits February 5, 2026 13:36
- Extract recentlyDisconnectedUsers cleanup into its own method and
  call it from unregister() so entries are cleaned on every disconnect,
  not only during spawn operations
- Throw error when no release mechanism is available, matching the
  existing spawn fallback behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@khaliqgant khaliqgant merged commit 33317f2 into main Feb 6, 2026
32 checks passed
@khaliqgant khaliqgant deleted the sdk-spawn-refactor branch February 6, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant