Skip to content

fix: MCP tools unavailable for agents spawned via agent_add#591

Merged
khaliqgant merged 21 commits intomainfrom
fix/mcp-agent-add-spawn
Mar 19, 2026
Merged

fix: MCP tools unavailable for agents spawned via agent_add#591
khaliqgant merged 21 commits intomainfrom
fix/mcp-agent-add-spawn

Conversation

@khaliqgant
Copy link
Member

@khaliqgant khaliqgant commented Mar 19, 2026

Summary

  • Claude: Removed --strict-mcp-config from --mcp-config injection. The flag blocked .mcp.json loading, preventing MCP tools from being discovered. --mcp-config is now additive — only passes relaycast config while Claude loads user MCP servers from .mcp.json independently.
  • All CLIs (claude, codex, gemini, etc.): Added broker-side agent pre-registration to the WS AgentSpawnRequested handler. The AgentSpawnRequestedPayload struct doesn't include a token field, so relaycast_ws_spawn_token() always returned None. Without a pre-registered token, the MCP server failed to authenticate at startup. Now the broker calls register_agent_token() before spawning (matching the SDK spawn_agent path that already worked).

Root Cause

Two independent issues combined to break MCP for agents spawned via agent_add:

  1. For Claude: --strict-mcp-config told Claude "only use this inline config, ignore .mcp.json". If the inline MCP server failed to start for any reason, no MCP tools were available at all.

  2. For all CLIs: The WS spawn path (used by agent_add) never pre-registered agents with the Relaycast API. The SDK spawn_agent path called http.register_agent_token() before spawning, giving the MCP server a valid token. The WS path skipped this, expecting the WS event to carry a token — but AgentSpawnRequestedPayload has no token field.

Test plan

  • All 220 lib + 8 e2e Rust tests pass (cargo test)
  • New integration test agent-spawns-agent.test.ts exercises the exact agent_add flow
  • Claude test: agent spawned via Relaycast API successfully uses MCP tools to send DM
  • Codex test: agent spawned via Relaycast API successfully uses MCP tools (1 relay_inbound)
  • Gemini test (requires gemini CLI on PATH)
  • Existing mcp-injection tests still pass (SDK spawnPty path unaffected)

🤖 Generated with Claude Code


Open with Devin

khaliqgant and others added 19 commits March 18, 2026 13:22
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… types

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…colors

- Replace plain console.log progress in cli.ts with listr2 task list
- Per-step spinners show owner, retry, nudge, force-release, and review events
- chalk colors: cyan for timestamps, green/red/yellow for status, dim for metadata
- logRunSummary() and broker stderr use chalk for visual hierarchy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…flows

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…steps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dering

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… path

- cli.ts: installOutputFilter() in runWithListr so YAML workflows also
  suppress [broker]/[workflow HH:MM] noise during listr rendering
- cli.ts: done.catch()/workflowDone.catch() guards for fast-failing steps
- listr-renderer.ts: workflowDone.catch() guard for instant run:failed
- listr-renderer.ts: add renderer.unmount() to JSDoc example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Output filter: use regex .test() instead of startsWith/^ anchor so
  chalk-colored [broker] and [workflow HH:MM] lines are properly suppressed
- Resume mode: add event listener for step progress reporting
- GHA workflow: fix grep exit code with || true, use env var instead of
  raw ${{ }} interpolation (script injection), use npx tsx instead of
  non-existent 'run' subcommand, only validate/dry-run YAML files
- Workflow: fix incorrect CJS assumption (SDK is ESM), add final
  type-check gate after review step
- Add chalk and listr2 to root package.json (Build & Validate requires them)
- Dynamic import listr2 so SDK loads on Node 18 (styleText not available)
- Show steps skipped without prior start event in listr output
- Remove unused ListrType import
* fix: detect claude CLI with inline args for MCP injection

* fix: extract executable from cli string in gemini/droid mcp setup

When cli contains inline args (e.g. 'gemini --model foo'),
Command::new(cli) fails because it treats the entire string as
an executable path. Now extract just the binary via shlex::split
before passing to Command::new and manual_cmd.
* bump versions

* fix: refresh lockfile for relaycast sdk 1.0.0 bump

* fix: bump gemini relay extension to relaycast mcp 1.0.0
…colors

- Replace plain console.log progress in cli.ts with listr2 task list
- Per-step spinners show owner, retry, nudge, force-release, and review events
- chalk colors: cyan for timestamps, green/red/yellow for status, dim for metadata
- logRunSummary() and broker stderr use chalk for visual hierarchy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two root causes prevented agents spawned via the Relaycast API
(agent_add MCP tool) from loading MCP tools:

1. Claude: --strict-mcp-config blocked .mcp.json loading. Removed it
   so --mcp-config is additive — only passes relaycast config while
   Claude loads user MCP servers from .mcp.json independently.

2. All CLIs: The WS AgentSpawnRequested handler had no agent
   pre-registration. The AgentSpawnRequestedPayload struct doesn't
   include a token field, so relaycast_ws_spawn_token() always
   returned None. Added broker-side register_agent_token() calls
   (matching the SDK spawn_agent path) to both WS spawn handlers.

Tests:
- New integration test (agent-spawns-agent.test.ts) exercises the
  exact agent_add flow for claude, codex, and gemini
- Updated unit tests and e2e tests for new --mcp-config behavior
- All 220 lib + 8 e2e Rust tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

khaliqgant and others added 2 commits March 19, 2026 15:28
Resolve package manifest/lockfile conflicts, fix workflow validation
flag order, and queue listr renderer tasks until lazy init completes.
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 6 additional findings in Devin Review.

Open in Devin Review

return;
}
}
if (/\[broker\]/.test(str) || /\[workflow\s+\d{2}:\d{2}\]/.test(str)) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Output filter regex cannot match chalk-colored [workflow HH:MM] lines due to interleaved ANSI escape codes

The installOutputFilter() function in both cli.ts:77 and listr-renderer.ts:18 uses the regex /\[workflow\s+\d{2}:\d{2}\]/ to suppress noisy [workflow HH:MM] timing lines while the listr2 renderer owns the terminal. However, runner.ts:994 now formats these lines using three separate chalk.dim.cyan() calls:

console.log(`${chalk.dim.cyan('[workflow')} ${chalk.dim.cyan(ts)}${chalk.dim.cyan(']')} ${msg}`);

Each chalk call wraps its text in ANSI open/close escape sequences (e.g., \x1b[2m\x1b[36m[workflow\x1b[39m\x1b[22m). The regex expects [workflow to be immediately followed by \s+ (whitespace), but the actual string has ANSI reset codes (\x1b[39m\x1b[22m) between [workflow and the space character. Since \x1b is not a whitespace character, the \s+ quantifier fails and the regex never matches. As a result, all workflow timing lines leak through the filter into the listr2 output, creating cluttered/broken progress display — directly undermining the PR's goal of polished CLI output.

Prompt for agents
Fix the output filter in both packages/sdk/src/workflows/cli.ts (line 77) and packages/sdk/src/workflows/listr-renderer.ts (line 18) to strip ANSI escape codes before testing the regex, OR change runner.ts line 994 to wrap the entire `[workflow HH:MM]` prefix in a single chalk call so the literal text remains contiguous.

Option A (preferred — fix runner.ts:994):
Change from:
  console.log(`${chalk.dim.cyan('[workflow')} ${chalk.dim.cyan(ts)}${chalk.dim.cyan(']')} ${msg}`);
To:
  console.log(`${chalk.dim.cyan(`[workflow ${ts}]`)} ${msg}`);

This keeps `[workflow 00:05]` as contiguous text inside a single chalk call, so the existing regex matches.

Option B (fix the filters in cli.ts:77 and listr-renderer.ts:18):
Strip ANSI codes from `str` before testing:
  const plain = str.replace(/\x1b\[[0-9;]*m/g, '');
  if (/\[broker\]/.test(plain) || /\[workflow\s+\d{2}:\d{2}\]/.test(plain)) return;
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@khaliqgant khaliqgant merged commit 61a2ea4 into main Mar 19, 2026
1 check passed
@khaliqgant khaliqgant deleted the fix/mcp-agent-add-spawn branch March 19, 2026 15:13
khaliqgant added a commit that referenced this pull request Mar 20, 2026
PR #591 added a synchronous register_agent_token() HTTP call with a 15s
timeout in the WS event loop before spawning agents. This blocked the
event loop and delayed Codex agent spawns by up to 15s (on top of the
existing 25s boot marker timeout), causing apparent spawn failures.

Reduce the timeout to 3s so the spawn proceeds quickly. On timeout or
failure, the agent self-registers via its MCP server (pre-#591 behavior).

Also adds ~/.local/bin, ~/.opencode/bin, ~/.claude/local to the fallback
PATH in pty.rs so CLIs installed in user-local directories are found.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
khaliqgant added a commit that referenced this pull request Mar 20, 2026
* fix: reduce WS spawn pre-registration timeout from 15s to 3s

PR #591 added a synchronous register_agent_token() HTTP call with a 15s
timeout in the WS event loop before spawning agents. This blocked the
event loop and delayed Codex agent spawns by up to 15s (on top of the
existing 25s boot marker timeout), causing apparent spawn failures.

Reduce the timeout to 3s so the spawn proceeds quickly. On timeout or
failure, the agent self-registers via its MCP server (pre-#591 behavior).

Also adds ~/.local/bin, ~/.opencode/bin, ~/.claude/local to the fallback
PATH in pty.rs so CLIs installed in user-local directories are found.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: skip pre-registration for Claude agents (self-registers via MCP)

Claude bakes the API key into --mcp-config JSON and self-registers
reliably, so the blocking HTTP registration call is unnecessary.
Non-Claude CLIs still get a 3s registration attempt since they need
the token injected into their CLI args at spawn time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Devin review — CLI arg parsing and dedup-after-spawn

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: seed dedup before spawn with cleanup on failure

Issue 1: Keep dedup seeding before spawn (so WS echoes during spawn are
deduplicated) but remove the dedup entry if spawn fails, preventing
failed spawns from blocking retries for the 5-minute dedup window.
Adds DedupCache::remove() and remove_local_spawn_control_dedup().

Issue 2: Already fixed in prior commit (parse_cli_command before
normalize_cli_name for is_claude check).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: preserve dedup entries when spawn fails with already-exists

When a second spawn request for an already-running agent fails with
"already exists", we must not remove the dedup entry from the first
successful spawn. Doing so would allow WebSocket echoes through.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants