fix(cli): stop cloud connect hangs and re-auth loops#743
Conversation
The cloud connect flow appeared to hang because provider CLIs (claude v2.1.19, codex) enter alt-screen immediately on first run — users saw no feedback while the remote CLI was booting, Ctrl+C'd, and the torn-down alt buffer snapped back to pre-launch text. Real failures (missing binary, sandbox image drift) also produced identical silent output. Adds a dim "Waiting for provider CLI to launch…" hint cleared on first byte, a zero-byte close diagnostic with an AGENT_RELAY_DEBUG_SSH=1 breadcrumb, and env-gated debug instrumentation for shell-request/open/write/first-byte/close events. formatShellInvocation JSDoc now documents why shell() is used instead of exec() (Daytona's sshd strips login-shell env on exec channels, losing the nvm PATH). Tests rewritten with typed fake SSH2 client covering handler-order regression, payload format, success pattern gating, clean-exit false-positive, and the zero-byte diagnostic path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, ensureAuthenticated forced a fresh browser login on every cloud command whenever the stored apiUrl differed from the caller's apiUrl (typically defaultApiUrl()). This fired constantly for users whose stored host didn't match the current default — e.g. someone linked against origin.agentrelay.cloud but defaultApiUrl() returned agentrelay.com, or CLOUD_API_URL env was set during link and unset afterward. Now stored auth is authoritative on its own host. Host mismatch alone no longer triggers re-login; only --force re-links to a different host. Refresh failures now fall back to loginWithBrowser(stored.apiUrl) so the recovery path stays on the user's actual host instead of redirecting to the default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0de4383d1e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!stored) { | ||
| return loginWithBrowser(apiUrl); |
There was a problem hiding this comment.
Honor requested API host when cached auth exists
ensureAuthenticated now ignores the caller-supplied apiUrl whenever any stored auth is present, so commands that pass an explicit host can silently use credentials for a different origin. This breaks flows like postWorkspaceApi in src/cli/commands/on/start.ts, which retries a 401 by calling ensureAuthenticated(targetOrigin) and expects a token for that origin; with this change it can return stale auth for another host and keep failing 401. The same mismatch also prevents one-off --api-url usage from switching hosts unless users run a separate forced login first.
Useful? React with 👍 / 👎.
| // to force a fresh browser login — the user already linked, and the default | ||
| // may have drifted (e.g. CLOUD_API_URL env set/unset between sessions). | ||
| // Only `--force` re-links to a different host. | ||
| if (!stored) { |
There was a problem hiding this comment.
🔴 ensureAuthenticated silently ignores apiUrl parameter, breaking --api-url across all callers
Removing the stored.apiUrl !== apiUrl guard means the apiUrl parameter is now silently ignored whenever stored auth exists. Multiple callers pass an explicit --api-url value (from user CLI flags) that is now discarded.
For example, relay cloud connect --api-url https://staging.example/cloud displays Cloud: staging.example in the UI (cloud.ts:255) but the actual authorizedApiFetch call goes to the stored auth's host (e.g. prod.example) because auth.apiUrl comes from the stored file. Similarly, relay cloud login --api-url staging bypasses the early-return guard at cloud.ts:143 (URLs differ) but then ensureAuthenticated returns the stored prod auth silently — no browser login, no warning.
Affected callers that pass explicit user-provided apiUrl
cloud.ts:152—cloud login --api-urlsilently returns wrong authcloud.ts:193—cloud whoami --api-urlqueries the wrong servercloud.ts:259—cloud connect --api-urlcreates sandbox on wrong serverworkflows.ts:197,302,324,355,394— workflow commands operate against wrong serverstart.ts:315— sends stored auth token to a different host
Prompt for agents
The removal of the `stored.apiUrl !== apiUrl` guard from ensureAuthenticated means the apiUrl parameter is silently ignored whenever stored auth exists. This breaks --api-url semantics across all callers (cloud login, cloud connect, cloud whoami, workflow commands, start.ts).
The intent of the PR is to stop forcing a browser re-login when the default API URL has drifted (e.g. CLOUD_API_URL env var set/unset between sessions). However, the same code path is used when the user explicitly passes --api-url, which should be honored.
Possible approaches:
1. Accept an additional option like `{ matchHost?: boolean }` in ensureAuthenticated. Callers that pass an explicit --api-url set matchHost: true, while callers that use defaultApiUrl() leave it false.
2. Re-add the URL check but have the login command handle the mismatch differently (e.g. warn and prompt for --force).
3. Keep the stored auth but return it only when the hosts match OR when no explicit override was provided. The function signature could accept a flag indicating whether apiUrl is user-specified or just a default.
The catch block at line 337 (loginWithBrowser(stored.apiUrl) instead of loginWithBrowser(apiUrl)) has the same issue: on refresh failure it re-authenticates to the stored host instead of the user-requested host.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Two fixes to cloud connect and cloud command auth:
Cloud connect no longer looks hung. Provider CLIs (claude v2.1.19, codex) enter alt-screen immediately on first run — users saw zero feedback, Ctrl+C'd, and the torn-down alt buffer snapped back to pre-launch text. Adds a dim "Waiting for provider CLI to launch…" hint cleared on first byte, a zero-byte close diagnostic (with
AGENT_RELAY_DEBUG_SSH=1breadcrumbs for real failures), and env-gated debug instrumentation for shell-request/open/write/first-byte/close events.formatShellInvocationJSDoc now documents whyshell()is used instead ofexec()(Daytona's sshd strips login-shell env on exec channels, losing the nvm PATH).Cloud commands stop forcing a browser login on every run.
ensureAuthenticatedpreviously forced a fresh login wheneverstored.apiUrl !== apiUrl. This fired constantly for any user whose stored host drifted fromdefaultApiUrl()(e.g. linked againstorigin.agentrelay.cloudbut default isagentrelay.com, orCLOUD_API_URLenv set/unset between sessions). Stored auth is now authoritative on its own host; only--forcere-links to a different host. Refresh failures fall back tologinWithBrowser(stored.apiUrl)so recovery stays on the user's actual host.Test plan
npx vitest run src/cli/lib/ssh-interactive.test.ts— 10/10 passing (handler-order regression, payload format, success pattern gating, clean-exit false-positive, zero-byte diagnostic)npx vitest run packages/cloud/src/auth.test.ts— 10/10 passing including new regression tests (stored-host mismatch returns stored, refresh-near-expiry uses stored host,--forceescape hatch)agent-relay cloud connect anthropicto verify end-to-end token persistence (not automated — expect test can't select a login method)agent-relay cloud connect openaito verify codex path shares the fixcloudsubcommand twice in a row against a stored login with a different host — confirm no browser redirect on the second call🤖 Generated with Claude Code