fix(broker): pre-register Claude PTY workers so Relaycast MCP boots fast (#926)#927
Conversation
The WS spawn path skipped pre-registration for Claude on the assumption that Claude would self-register via its MCP server. In practice @relaycast/mcp performs blocking network bootstrap before responding to the MCP initialize/tools-list handshake, so Claude drops the pending relaycast server before any relaycast tool names land in deferred_tools. The worker then receives the "Relaycast MCP tools are available" reminder with no tools to call and has to fall back to inline replies (#926). Pre-register all CLIs uniformly. With RELAY_AGENT_TOKEN + RELAY_SKIP_BOOTSTRAP=1 in the relaycast MCP env, @relaycast/mcp skips bootstrap and exposes tools immediately, matching the working Codex path.
📝 WalkthroughWalkthroughThis PR removes Claude-specific bypass logic and unifies token pre-registration: when a relaycast WS spawn token is absent, both WS and JSON spawn paths perform bounded agent-token pre-registration for all CLIs. Also removes an unused CLI-name parsing import. ChangesToken Pre-registration Unification
Sequence Diagram(s)sequenceDiagram
participant RelaycastWS
participant BrokerRuntime
participant AgentRegistry
participant SpawnedAgent
RelaycastWS->>BrokerRuntime: AgentSpawnRequested (may include relaycast WS spawn token)
BrokerRuntime->>BrokerRuntime: derive worker_relay_key from WS token or nil
alt no WS spawn token
BrokerRuntime->>AgentRegistry: register_agent (bounded timeout) -> worker_relay_key
end
BrokerRuntime->>SpawnedAgent: spawn worker with worker_relay_key
SpawnedAgent->>BrokerRuntime: spawn ack / state persisted
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/broker/src/runtime/relaycast_events.rs (1)
442-470: 💤 Low valueConsider extracting pre-registration logic into a shared helper.
This block duplicates the primary path (lines 241-277). Minor inconsistency: the fallback path omits the
tracing::infoon successful registration that the primary path includes.Extracting a helper would ensure consistent behavior and simplify future changes:
async fn try_preregister_agent_token( workspace_http: &WorkspaceHttp, ws_value: &Value, name: &str, cli: &str, ) -> Option<String> { if let Some(token) = relaycast_ws_spawn_token(ws_value) { return Some(token); } const REG_TIMEOUT: Duration = Duration::from_secs(3); match tokio::time::timeout(REG_TIMEOUT, workspace_http.register_agent_token(name, Some(cli))) .await { Ok(Ok(token)) => { tracing::info!(worker = %name, "pre-registered agent via broker for WS spawn"); Some(token) } Ok(Err(error)) => { tracing::warn!(worker = %name, error = %error, "WS spawn pre-registration failed; agent will self-register"); None } Err(_) => { tracing::warn!(worker = %name, "WS spawn pre-registration timed out (3s); agent will self-register"); None } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/broker/src/runtime/relaycast_events.rs` around lines 442 - 470, Extract the duplicated pre-registration logic into a shared async helper (e.g. try_preregister_agent_token) that takes &WorkspaceHttp, &Value, &str name, &str cli and encapsulates the relaycast_ws_spawn_token check, the 3s tokio::time::timeout wrapping workspace_http.register_agent_token(...), and the match-arm handling; return Option<String>. Replace both duplicated blocks with a call to this helper (use the same identifier worker_relay_key). Ensure the helper logs tracing::info!(worker = %name, "pre-registered agent via broker for WS spawn") on Ok(Ok(token)) and produce the same tracing::warn messages as in both current paths for Ok(Err(error)) and timeout to keep behavior consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/broker/src/runtime/relaycast_events.rs`:
- Around line 442-470: Extract the duplicated pre-registration logic into a
shared async helper (e.g. try_preregister_agent_token) that takes
&WorkspaceHttp, &Value, &str name, &str cli and encapsulates the
relaycast_ws_spawn_token check, the 3s tokio::time::timeout wrapping
workspace_http.register_agent_token(...), and the match-arm handling; return
Option<String>. Replace both duplicated blocks with a call to this helper (use
the same identifier worker_relay_key). Ensure the helper logs
tracing::info!(worker = %name, "pre-registered agent via broker for WS spawn")
on Ok(Ok(token)) and produce the same tracing::warn messages as in both current
paths for Ok(Err(error)) and timeout to keep behavior consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 8c87c7a8-e036-4506-9613-9397b847dd39
📒 Files selected for processing (2)
crates/broker/src/runtime/mod.rscrates/broker/src/runtime/relaycast_events.rs
💤 Files with no reviewable changes (1)
- crates/broker/src/runtime/mod.rs
Rewrite the relaycast pre-register comment to explain why the token must be injected at spawn time without referencing prior behavior or the originating issue. The comment now reads as a current-state explanation for any future reader.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/broker/src/runtime/relaycast_events.rs (1)
233-277:⚠️ Potential issue | 🟠 MajorAdd cleanup on spawn failure to prevent orphaned pre-registrations.
Pre-registration via
register_agent_tokenat lines 250 and 452 succeeds, but spawn failures at lines 351–358 and 545–548 do not invokeforget_agent_registration. When spawn fails after registration succeeds, the cached token remains orphaned—especially problematic for "already exists" errors, which indicate another spawn attempt may collide with or invalidate the already-running worker's relay token.Call
workspace_http.forget_agent_registration(&name)in the spawn failure branches to match the cleanup pattern used elsewhere (e.g., lines 79, 90, 140, 390).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/broker/src/runtime/relaycast_events.rs` around lines 233 - 277, The pre-registration code stores a token in worker_relay_key after calling workspace_http.register_agent_token(&name, Some(cli.as_str())); if the subsequent spawn fails the orphaned token must be cleaned up: in each spawn-failure branch that currently returns/continues after a failed spawn (the branches that handle the spawn errors following the pre-registration path), call workspace_http.forget_agent_registration(&name) before returning/continuing so you remove the cached pre-registration; ensure you only call forget_agent_registration when worker_relay_key.is_some() (i.e., when register_agent_token succeeded) and do this in the same spawn failure branches referenced near the WS and non-WS spawn code paths to mirror existing cleanup patterns.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/broker/src/runtime/relaycast_events.rs`:
- Around line 233-277: The pre-registration code stores a token in
worker_relay_key after calling workspace_http.register_agent_token(&name,
Some(cli.as_str())); if the subsequent spawn fails the orphaned token must be
cleaned up: in each spawn-failure branch that currently returns/continues after
a failed spawn (the branches that handle the spawn errors following the
pre-registration path), call workspace_http.forget_agent_registration(&name)
before returning/continuing so you remove the cached pre-registration; ensure
you only call forget_agent_registration when worker_relay_key.is_some() (i.e.,
when register_agent_token succeeded) and do this in the same spawn failure
branches referenced near the WS and non-WS spawn code paths to mirror existing
cleanup patterns.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 193bc853-5acb-49bc-8c42-e48ce594180c
📒 Files selected for processing (1)
crates/broker/src/runtime/relaycast_events.rs
The delivery_retry_transient_blip_emits_failed_event_for_present_worker test is timing-sensitive on macOS runners (writes to a killed subprocess's stdin must surface as errors within a bounded retry window). It passed in Rust Tests (agent-relay-broker) (macos-latest) on the same SHA and passes locally (625/625), so this is a flake unrelated to the pre-registration change.
Summary
Fixes #926. Claude PTY workers were receiving the "Relaycast MCP tools are available" system reminder but
ToolSearchcould not find anymcp__relaycast__*tools, forcing the worker to fall back to inline replies.Root cause
The WS spawn path in
crates/broker/src/runtime/relaycast_events.rsexplicitly skipped pre-registration for Claude with the comment "Claude self-registers via its MCP server — skip blocking call". That assumption was wrong: whenRELAY_AGENT_TOKENis not provided in the relaycast MCP env,@relaycast/mcpruns a blocking bootstrap (network registration) before responding to the MCP initialize/tools-list handshake. Claude treats the relaycast server as pending until the handshake completes, then drops it when the deferred-tools snapshot lands empty — so nomcp__relaycast__*names ever become callable.A working session (e.g. Codex) hits the other branch of the same function, which pre-registers and passes both
RELAY_AGENT_TOKENandRELAY_SKIP_BOOTSTRAP=1into the MCP config. With those set,@relaycast/mcpskips the network round trip and exposes tools immediately.Fix
Remove the Claude special-case in both WS spawn entries (primary and fallback paths) so all CLIs are pre-registered uniformly. The existing 3-second timeout still bounds spawn latency on registration failure; on failure the worker falls back to self-registration exactly as it does today for non-Claude CLIs. The relaycast MCP env wiring (
snippets.rs::relaycast_server_config) already addsRELAY_SKIP_BOOTSTRAP=1wheneverRELAY_AGENT_TOKENis present, so no changes are needed downstream.Other spawn paths (
runtime/api.rs,runtime/maintenance.rs,wrap.rs) already pre-register Claude — this brings the WS spawn path in line with them.Test plan
cargo check -p agent-relay-brokerpassescargo test -p agent-relay-broker --lib— 625 passed, 0 failedmcp__relaycast__*tools inToolSearchafter spawnGenerated by Claude Code