Skip to content

fix(broker): pre-register Claude PTY workers so Relaycast MCP boots fast (#926)#927

Merged
willwashburn merged 3 commits into
mainfrom
claude/fix-issue-926-LMTJo
May 20, 2026
Merged

fix(broker): pre-register Claude PTY workers so Relaycast MCP boots fast (#926)#927
willwashburn merged 3 commits into
mainfrom
claude/fix-issue-926-LMTJo

Conversation

@willwashburn
Copy link
Copy Markdown
Member

Summary

Fixes #926. Claude PTY workers were receiving the "Relaycast MCP tools are available" system reminder but ToolSearch could not find any mcp__relaycast__* tools, forcing the worker to fall back to inline replies.

Root cause

The WS spawn path in crates/broker/src/runtime/relaycast_events.rs explicitly skipped pre-registration for Claude with the comment "Claude self-registers via its MCP server — skip blocking call". That assumption was wrong: when RELAY_AGENT_TOKEN is not provided in the relaycast MCP env, @relaycast/mcp runs a blocking bootstrap (network registration) before responding to the MCP initialize/tools-list handshake. Claude treats the relaycast server as pending until the handshake completes, then drops it when the deferred-tools snapshot lands empty — so no mcp__relaycast__* names ever become callable.

A working session (e.g. Codex) hits the other branch of the same function, which pre-registers and passes both RELAY_AGENT_TOKEN and RELAY_SKIP_BOOTSTRAP=1 into the MCP config. With those set, @relaycast/mcp skips the network round trip and exposes tools immediately.

Fix

Remove the Claude special-case in both WS spawn entries (primary and fallback paths) so all CLIs are pre-registered uniformly. The existing 3-second timeout still bounds spawn latency on registration failure; on failure the worker falls back to self-registration exactly as it does today for non-Claude CLIs. The relaycast MCP env wiring (snippets.rs::relaycast_server_config) already adds RELAY_SKIP_BOOTSTRAP=1 whenever RELAY_AGENT_TOKEN is present, so no changes are needed downstream.

Other spawn paths (runtime/api.rs, runtime/maintenance.rs, wrap.rs) already pre-register Claude — this brings the WS spawn path in line with them.

Test plan

  • cargo check -p agent-relay-broker passes
  • cargo test -p agent-relay-broker --lib — 625 passed, 0 failed
  • Verify on a real Pear/broker launch that a Claude PTY worker sees mcp__relaycast__* tools in ToolSearch after spawn

Generated by Claude Code

The WS spawn path skipped pre-registration for Claude on the assumption
that Claude would self-register via its MCP server. In practice
@relaycast/mcp performs blocking network bootstrap before responding to
the MCP initialize/tools-list handshake, so Claude drops the pending
relaycast server before any relaycast tool names land in deferred_tools.
The worker then receives the "Relaycast MCP tools are available" reminder
with no tools to call and has to fall back to inline replies (#926).

Pre-register all CLIs uniformly. With RELAY_AGENT_TOKEN +
RELAY_SKIP_BOOTSTRAP=1 in the relaycast MCP env, @relaycast/mcp skips
bootstrap and exposes tools immediately, matching the working Codex path.
@willwashburn willwashburn requested a review from khaliqgant as a code owner May 20, 2026 04:11
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR removes Claude-specific bypass logic and unifies token pre-registration: when a relaycast WS spawn token is absent, both WS and JSON spawn paths perform bounded agent-token pre-registration for all CLIs. Also removes an unused CLI-name parsing import.

Changes

Token Pre-registration Unification

Layer / File(s) Summary
Remove Claude-specific bypass and unify token pre-registration
crates/broker/src/runtime/mod.rs, crates/broker/src/runtime/relaycast_events.rs
Deleted unused normalize_cli_name / parse_cli_command import. Updated primary (WsEvent::AgentSpawnRequested) and JSON fallback spawn paths in BrokerRuntime::handle_relaycast_message to derive worker_relay_key from the relaycast WS token when present, otherwise perform a bounded agent token pre-registration for all CLIs (removed CLI-name-based Claude skip).

Sequence Diagram(s)

sequenceDiagram
  participant RelaycastWS
  participant BrokerRuntime
  participant AgentRegistry
  participant SpawnedAgent
  RelaycastWS->>BrokerRuntime: AgentSpawnRequested (may include relaycast WS spawn token)
  BrokerRuntime->>BrokerRuntime: derive worker_relay_key from WS token or nil
  alt no WS spawn token
    BrokerRuntime->>AgentRegistry: register_agent (bounded timeout) -> worker_relay_key
  end
  BrokerRuntime->>SpawnedAgent: spawn worker with worker_relay_key
  SpawnedAgent->>BrokerRuntime: spawn ack / state persisted
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • khaliqgant

Poem

🐰 In the burrow the tokens once split,
Now every CLI gets its bit.
No Claude shortcut, no lonely wait,
All pre-register, steady the gate.
Spawn hops on—unified and lit!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly matches the primary change: pre-registering Claude PTY workers for Relaycast MCP, aligning with issue #926.
Description check ✅ Passed The description includes a clear summary of the fix, root cause analysis, and test plan with completed checks matching the required template sections.
Linked Issues check ✅ Passed The code changes remove Claude-specific skip logic in WS spawn paths to ensure pre-registration, directly addressing all objectives in issue #926.
Out of Scope Changes check ✅ Passed All changes (removing unused import and updating relaycast spawn logic) are directly scoped to resolving the Claude PTY Relaycast MCP issue.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-issue-926-LMTJo

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/broker/src/runtime/relaycast_events.rs (1)

442-470: 💤 Low value

Consider extracting pre-registration logic into a shared helper.

This block duplicates the primary path (lines 241-277). Minor inconsistency: the fallback path omits the tracing::info on successful registration that the primary path includes.

Extracting a helper would ensure consistent behavior and simplify future changes:

async fn try_preregister_agent_token(
    workspace_http: &WorkspaceHttp,
    ws_value: &Value,
    name: &str,
    cli: &str,
) -> Option<String> {
    if let Some(token) = relaycast_ws_spawn_token(ws_value) {
        return Some(token);
    }
    const REG_TIMEOUT: Duration = Duration::from_secs(3);
    match tokio::time::timeout(REG_TIMEOUT, workspace_http.register_agent_token(name, Some(cli)))
        .await
    {
        Ok(Ok(token)) => {
            tracing::info!(worker = %name, "pre-registered agent via broker for WS spawn");
            Some(token)
        }
        Ok(Err(error)) => {
            tracing::warn!(worker = %name, error = %error, "WS spawn pre-registration failed; agent will self-register");
            None
        }
        Err(_) => {
            tracing::warn!(worker = %name, "WS spawn pre-registration timed out (3s); agent will self-register");
            None
        }
    }
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/broker/src/runtime/relaycast_events.rs` around lines 442 - 470,
Extract the duplicated pre-registration logic into a shared async helper (e.g.
try_preregister_agent_token) that takes &WorkspaceHttp, &Value, &str name, &str
cli and encapsulates the relaycast_ws_spawn_token check, the 3s
tokio::time::timeout wrapping workspace_http.register_agent_token(...), and the
match-arm handling; return Option<String>. Replace both duplicated blocks with a
call to this helper (use the same identifier worker_relay_key). Ensure the
helper logs tracing::info!(worker = %name, "pre-registered agent via broker for
WS spawn") on Ok(Ok(token)) and produce the same tracing::warn messages as in
both current paths for Ok(Err(error)) and timeout to keep behavior consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/broker/src/runtime/relaycast_events.rs`:
- Around line 442-470: Extract the duplicated pre-registration logic into a
shared async helper (e.g. try_preregister_agent_token) that takes
&WorkspaceHttp, &Value, &str name, &str cli and encapsulates the
relaycast_ws_spawn_token check, the 3s tokio::time::timeout wrapping
workspace_http.register_agent_token(...), and the match-arm handling; return
Option<String>. Replace both duplicated blocks with a call to this helper (use
the same identifier worker_relay_key). Ensure the helper logs
tracing::info!(worker = %name, "pre-registered agent via broker for WS spawn")
on Ok(Ok(token)) and produce the same tracing::warn messages as in both current
paths for Ok(Err(error)) and timeout to keep behavior consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8c87c7a8-e036-4506-9613-9397b847dd39

📥 Commits

Reviewing files that changed from the base of the PR and between 9c1849b and 70920d6.

📒 Files selected for processing (2)
  • crates/broker/src/runtime/mod.rs
  • crates/broker/src/runtime/relaycast_events.rs
💤 Files with no reviewable changes (1)
  • crates/broker/src/runtime/mod.rs

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

Rewrite the relaycast pre-register comment to explain why the token must
be injected at spawn time without referencing prior behavior or the
originating issue. The comment now reads as a current-state explanation
for any future reader.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/broker/src/runtime/relaycast_events.rs (1)

233-277: ⚠️ Potential issue | 🟠 Major

Add cleanup on spawn failure to prevent orphaned pre-registrations.

Pre-registration via register_agent_token at lines 250 and 452 succeeds, but spawn failures at lines 351–358 and 545–548 do not invoke forget_agent_registration. When spawn fails after registration succeeds, the cached token remains orphaned—especially problematic for "already exists" errors, which indicate another spawn attempt may collide with or invalidate the already-running worker's relay token.

Call workspace_http.forget_agent_registration(&name) in the spawn failure branches to match the cleanup pattern used elsewhere (e.g., lines 79, 90, 140, 390).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/broker/src/runtime/relaycast_events.rs` around lines 233 - 277, The
pre-registration code stores a token in worker_relay_key after calling
workspace_http.register_agent_token(&name, Some(cli.as_str())); if the
subsequent spawn fails the orphaned token must be cleaned up: in each
spawn-failure branch that currently returns/continues after a failed spawn (the
branches that handle the spawn errors following the pre-registration path), call
workspace_http.forget_agent_registration(&name) before returning/continuing so
you remove the cached pre-registration; ensure you only call
forget_agent_registration when worker_relay_key.is_some() (i.e., when
register_agent_token succeeded) and do this in the same spawn failure branches
referenced near the WS and non-WS spawn code paths to mirror existing cleanup
patterns.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/broker/src/runtime/relaycast_events.rs`:
- Around line 233-277: The pre-registration code stores a token in
worker_relay_key after calling workspace_http.register_agent_token(&name,
Some(cli.as_str())); if the subsequent spawn fails the orphaned token must be
cleaned up: in each spawn-failure branch that currently returns/continues after
a failed spawn (the branches that handle the spawn errors following the
pre-registration path), call workspace_http.forget_agent_registration(&name)
before returning/continuing so you remove the cached pre-registration; ensure
you only call forget_agent_registration when worker_relay_key.is_some() (i.e.,
when register_agent_token succeeded) and do this in the same spawn failure
branches referenced near the WS and non-WS spawn code paths to mirror existing
cleanup patterns.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 193bc853-5acb-49bc-8c42-e48ce594180c

📥 Commits

Reviewing files that changed from the base of the PR and between 70920d6 and a30d941.

📒 Files selected for processing (1)
  • crates/broker/src/runtime/relaycast_events.rs

The delivery_retry_transient_blip_emits_failed_event_for_present_worker
test is timing-sensitive on macOS runners (writes to a killed
subprocess's stdin must surface as errors within a bounded retry
window). It passed in Rust Tests (agent-relay-broker) (macos-latest)
on the same SHA and passes locally (625/625), so this is a flake
unrelated to the pre-registration change.
@willwashburn willwashburn merged commit cf4cdc5 into main May 20, 2026
39 checks passed
@willwashburn willwashburn deleted the claude/fix-issue-926-LMTJo branch May 20, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Claude PTY spawn can advertise Relaycast MCP after tools fail to load

2 participants