Skip to content

fix(cli): stop cloud connect hangs and re-auth loops#743

Merged
khaliqgant merged 2 commits into
mainfrom
fix-connect-anthropic
Apr 15, 2026
Merged

fix(cli): stop cloud connect hangs and re-auth loops#743
khaliqgant merged 2 commits into
mainfrom
fix-connect-anthropic

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

@khaliqgant khaliqgant commented Apr 15, 2026

Summary

Two fixes to cloud connect and cloud command auth:

  • Cloud connect no longer looks hung. Provider CLIs (claude v2.1.19, codex) enter alt-screen immediately on first run — users saw zero feedback, Ctrl+C'd, and the torn-down alt buffer snapped back to pre-launch text. Adds a dim "Waiting for provider CLI to launch…" hint cleared on first byte, a zero-byte close diagnostic (with AGENT_RELAY_DEBUG_SSH=1 breadcrumbs for real failures), and env-gated debug instrumentation for shell-request/open/write/first-byte/close events. formatShellInvocation JSDoc now documents why shell() is used instead of exec() (Daytona's sshd strips login-shell env on exec channels, losing the nvm PATH).

  • Cloud commands stop forcing a browser login on every run. ensureAuthenticated previously forced a fresh login whenever stored.apiUrl !== apiUrl. This fired constantly for any user whose stored host drifted from defaultApiUrl() (e.g. linked against origin.agentrelay.cloud but default is agentrelay.com, or CLOUD_API_URL env set/unset between sessions). Stored auth is now authoritative on its own host; only --force re-links to a different host. Refresh failures fall back to loginWithBrowser(stored.apiUrl) so recovery stays on the user's actual host.

Test plan

  • npx vitest run src/cli/lib/ssh-interactive.test.ts — 10/10 passing (handler-order regression, payload format, success pattern gating, clean-exit false-positive, zero-byte diagnostic)
  • npx vitest run packages/cloud/src/auth.test.ts — 10/10 passing including new regression tests (stored-host mismatch returns stored, refresh-near-expiry uses stored host, --force escape hatch)
  • Live expect test against Daytona sandbox: shell opens, bytes flow (6158 bytes at 243ms), Claude v2.1.19 welcome + theme picker + login method picker all render, UX hint clears on first byte
  • Manual agent-relay cloud connect anthropic to verify end-to-end token persistence (not automated — expect test can't select a login method)
  • Manual agent-relay cloud connect openai to verify codex path shares the fix
  • Manual: run any cloud subcommand twice in a row against a stored login with a different host — confirm no browser redirect on the second call

🤖 Generated with Claude Code


Open with Devin

khaliqgant and others added 2 commits April 15, 2026 20:41
The cloud connect flow appeared to hang because provider CLIs (claude v2.1.19,
codex) enter alt-screen immediately on first run — users saw no feedback while
the remote CLI was booting, Ctrl+C'd, and the torn-down alt buffer snapped
back to pre-launch text. Real failures (missing binary, sandbox image drift)
also produced identical silent output.

Adds a dim "Waiting for provider CLI to launch…" hint cleared on first byte,
a zero-byte close diagnostic with an AGENT_RELAY_DEBUG_SSH=1 breadcrumb, and
env-gated debug instrumentation for shell-request/open/write/first-byte/close
events. formatShellInvocation JSDoc now documents why shell() is used instead
of exec() (Daytona's sshd strips login-shell env on exec channels, losing the
nvm PATH).

Tests rewritten with typed fake SSH2 client covering handler-order regression,
payload format, success pattern gating, clean-exit false-positive, and the
zero-byte diagnostic path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, ensureAuthenticated forced a fresh browser login on every cloud
command whenever the stored apiUrl differed from the caller's apiUrl
(typically defaultApiUrl()). This fired constantly for users whose stored
host didn't match the current default — e.g. someone linked against
origin.agentrelay.cloud but defaultApiUrl() returned agentrelay.com, or
CLOUD_API_URL env was set during link and unset afterward.

Now stored auth is authoritative on its own host. Host mismatch alone no
longer triggers re-login; only --force re-links to a different host. Refresh
failures now fall back to loginWithBrowser(stored.apiUrl) so the recovery
path stays on the user's actual host instead of redirecting to the default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0de4383d1e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +322 to 323
if (!stored) {
return loginWithBrowser(apiUrl);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor requested API host when cached auth exists

ensureAuthenticated now ignores the caller-supplied apiUrl whenever any stored auth is present, so commands that pass an explicit host can silently use credentials for a different origin. This breaks flows like postWorkspaceApi in src/cli/commands/on/start.ts, which retries a 401 by calling ensureAuthenticated(targetOrigin) and expects a token for that origin; with this change it can return stale auth for another host and keep failing 401. The same mismatch also prevents one-off --api-url usage from switching hosts unless users run a separate forced login first.

Useful? React with 👍 / 👎.

@khaliqgant khaliqgant merged commit 2ad9c7e into main Apr 15, 2026
44 checks passed
@khaliqgant khaliqgant deleted the fix-connect-anthropic branch April 15, 2026 18:58
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

// to force a fresh browser login — the user already linked, and the default
// may have drifted (e.g. CLOUD_API_URL env set/unset between sessions).
// Only `--force` re-links to a different host.
if (!stored) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 ensureAuthenticated silently ignores apiUrl parameter, breaking --api-url across all callers

Removing the stored.apiUrl !== apiUrl guard means the apiUrl parameter is now silently ignored whenever stored auth exists. Multiple callers pass an explicit --api-url value (from user CLI flags) that is now discarded.

For example, relay cloud connect --api-url https://staging.example/cloud displays Cloud: staging.example in the UI (cloud.ts:255) but the actual authorizedApiFetch call goes to the stored auth's host (e.g. prod.example) because auth.apiUrl comes from the stored file. Similarly, relay cloud login --api-url staging bypasses the early-return guard at cloud.ts:143 (URLs differ) but then ensureAuthenticated returns the stored prod auth silently — no browser login, no warning.

Affected callers that pass explicit user-provided apiUrl
  • cloud.ts:152cloud login --api-url silently returns wrong auth
  • cloud.ts:193cloud whoami --api-url queries the wrong server
  • cloud.ts:259cloud connect --api-url creates sandbox on wrong server
  • workflows.ts:197,302,324,355,394 — workflow commands operate against wrong server
  • start.ts:315 — sends stored auth token to a different host
Prompt for agents
The removal of the `stored.apiUrl !== apiUrl` guard from ensureAuthenticated means the apiUrl parameter is silently ignored whenever stored auth exists. This breaks --api-url semantics across all callers (cloud login, cloud connect, cloud whoami, workflow commands, start.ts).

The intent of the PR is to stop forcing a browser re-login when the default API URL has drifted (e.g. CLOUD_API_URL env var set/unset between sessions). However, the same code path is used when the user explicitly passes --api-url, which should be honored.

Possible approaches:
1. Accept an additional option like `{ matchHost?: boolean }` in ensureAuthenticated. Callers that pass an explicit --api-url set matchHost: true, while callers that use defaultApiUrl() leave it false.
2. Re-add the URL check but have the login command handle the mismatch differently (e.g. warn and prompt for --force).
3. Keep the stored auth but return it only when the hosts match OR when no explicit override was provided. The function signature could accept a flag indicating whether apiUrl is user-specified or just a default.

The catch block at line 337 (loginWithBrowser(stored.apiUrl) instead of loginWithBrowser(apiUrl)) has the same issue: on refresh failure it re-authenticates to the stored host instead of the user-requested host.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant