Bug: shared RELAY_API_KEY across machines — broker exits at WebSocket subscription with "Unable to connect"
Summary
In v6.0.2, the documented "set the same RELAY_API_KEY on multiple machines so brokers join the same workspace" pattern (which #789 builds on as a precondition) silently fails. The HTTP registration step succeeds — the cloud accepts the key and registers the agent — but the WebSocket subscription step that follows exits with "Unable to connect. Is the computer able to access the url?", killing the broker.
This blocks every multi-machine use case, including the messaging premise that #789 assumes works.
Environment
- agent-relay: v6.0.2 (standalone binary install via
install.sh)
- Mac: macOS arm64, agent_name
sc, launchd-managed broker, ran agent-relay cloud login (token in ~/.agent-relay/cloud-auth.json)
- Linux VPS: Ubuntu aarch64, agent_name
ubuntu, systemd-user managed broker, same cloud-auth.json copied via scp from Mac
agent-relay cloud whoami returns identical user/org/workspace on both: sr4001@gmail.com, Scot Campbell's Workspace, Default
Reproduction
- On Mac,
agent-relay up --no-dashboard — broker starts, creates workspace, prints Workspace Key: rk_live_4165c2f458fc2976c0dd0ad092050afb
- Verify the key is live from VPS:
curl -H "Authorization: Bearer rk_live_4165c2f..." https://api.relaycast.dev/v1/channels
→ 200 OK, channels listed (`general`, `engineering`)
- On VPS, set the env var and start:
RELAY_API_KEY=rk_live_4165c2f... agent-relay up --no-dashboard
- Observed:
Broker started. is logged, then the broker exits within ~1s with status code 1 and stderr Failed to start broker: Unable to connect. Is the computer able to access the url?
- Cross-check on the cloud during step 4 (via api.relaycast.dev
/v1/agents):
ubuntu agent does appear in the list, status offline, with a released_at timestamp matching the broker exit
- i.e., the agent registration HTTP call succeeded — the failure is downstream
Why I think this is the WS-subscription step
- VPS network is fine (HTTP 200 to
api.relaycast.dev/v1/channels)
- Bearer auth works (the same key returns 200 from VPS via curl)
- The broker logs
Broker started. (after connect_relay completed) before the error appears
- The "Unable to connect" message is identical to what surfaces from
tokio_tungstenite failures elsewhere in the binary's strings (tokio_tungstenite::tls::encryption::rustls)
What works (control)
Without RELAY_API_KEY, both brokers run fine (each auto-creates its own workspace). Each can talk to its own agents on its own host. The bug is only triggered when a broker is told to join a workspace it didn't create.
Hypothesis on root cause
The WebSocket subscription requires an auth context tied to the creator of the workspace — perhaps the cloud token used at workspace creation gets bound to subsequent WS connections. A broker that joins via RELAY_API_KEY only has its own cloud token, which the WS endpoint won't accept for that workspace's channel.
If correct, the fix is either (a) accept any cloud-auth token from the same org for WS auth on any workspace owned by that org, or (b) provide a workspace.export / workspace.invite flow that issues a transferable WS-eligible credential.
Why this matters for #789
#789 ("remote spawn") explicitly states: "multiple brokers sharing a workspace key can exchange messages, DMs, and channel posts in real-time." That assumption is currently false in v6.0.2 — sharing a workspace key crashes the broker. Spawning across machines presupposes the messaging fabric works first; this bug must be fixed (or worked around) before #789 becomes meaningful.
Asks
- Confirm whether multi-machine
RELAY_API_KEY sharing is an intended supported configuration in v6 or a regression from v5.
- If supported: surface the actual WS error (current "Unable to connect" is misleading — HTTP auth is fine).
- If not yet supported: document this clearly in the README and add a
workspace export / workspace join <key> flow that produces the right credential bundle.
- Consider whether
RELAY_WORKSPACES_JSON / RELAY_DEFAULT_WORKSPACE (visible in binary strings, undocumented) is the intended path here.
Logs / evidence available
Happy to attach:
journalctl --user -u agent-relay from VPS during the failing window
~/Library/Logs/agent-relay.{out,err}.log from Mac (success case)
~/.agent-relay/identity-debug.txt from both machines showing distinct agent_id / default_workspace values that converge when RELAY_API_KEY is shared (proving HTTP-side resolution works)
api.relaycast.dev/v1/agents listing from before/after the VPS attempt
Bug: shared
RELAY_API_KEYacross machines — broker exits at WebSocket subscription with "Unable to connect"Summary
In v6.0.2, the documented "set the same
RELAY_API_KEYon multiple machines so brokers join the same workspace" pattern (which #789 builds on as a precondition) silently fails. The HTTP registration step succeeds — the cloud accepts the key and registers the agent — but the WebSocket subscription step that follows exits with "Unable to connect. Is the computer able to access the url?", killing the broker.This blocks every multi-machine use case, including the messaging premise that #789 assumes works.
Environment
install.sh)sc, launchd-managed broker, ranagent-relay cloud login(token in~/.agent-relay/cloud-auth.json)ubuntu, systemd-user managed broker, samecloud-auth.jsoncopied viascpfrom Macagent-relay cloud whoamireturns identical user/org/workspace on both:sr4001@gmail.com,Scot Campbell's Workspace,DefaultReproduction
agent-relay up --no-dashboard— broker starts, creates workspace, printsWorkspace Key: rk_live_4165c2f458fc2976c0dd0ad092050afbBroker started.is logged, then the broker exits within ~1s with status code 1 and stderrFailed to start broker: Unable to connect. Is the computer able to access the url?/v1/agents):ubuntuagent does appear in the list, statusoffline, with areleased_attimestamp matching the broker exitWhy I think this is the WS-subscription step
api.relaycast.dev/v1/channels)Broker started.(afterconnect_relay completed) before the error appearstokio_tungstenitefailures elsewhere in the binary's strings (tokio_tungstenite::tls::encryption::rustls)What works (control)
Without
RELAY_API_KEY, both brokers run fine (each auto-creates its own workspace). Each can talk to its own agents on its own host. The bug is only triggered when a broker is told to join a workspace it didn't create.Hypothesis on root cause
The WebSocket subscription requires an auth context tied to the creator of the workspace — perhaps the cloud token used at workspace creation gets bound to subsequent WS connections. A broker that joins via
RELAY_API_KEYonly has its own cloud token, which the WS endpoint won't accept for that workspace's channel.If correct, the fix is either (a) accept any cloud-auth token from the same org for WS auth on any workspace owned by that org, or (b) provide a
workspace.export/workspace.inviteflow that issues a transferable WS-eligible credential.Why this matters for #789
#789 ("remote spawn") explicitly states: "multiple brokers sharing a workspace key can exchange messages, DMs, and channel posts in real-time." That assumption is currently false in v6.0.2 — sharing a workspace key crashes the broker. Spawning across machines presupposes the messaging fabric works first; this bug must be fixed (or worked around) before #789 becomes meaningful.
Asks
RELAY_API_KEYsharing is an intended supported configuration in v6 or a regression from v5.workspace export/workspace join <key>flow that produces the right credential bundle.RELAY_WORKSPACES_JSON/RELAY_DEFAULT_WORKSPACE(visible in binary strings, undocumented) is the intended path here.Logs / evidence available
Happy to attach:
journalctl --user -u agent-relayfrom VPS during the failing window~/Library/Logs/agent-relay.{out,err}.logfrom Mac (success case)~/.agent-relay/identity-debug.txtfrom both machines showing distinctagent_id/default_workspacevalues that converge whenRELAY_API_KEYis shared (proving HTTP-side resolution works)api.relaycast.dev/v1/agentslisting from before/after the VPS attempt