feat(workflows): cloud-connect fix workflows (claude hang + utils bundling) by khaliqgant · Pull Request #738 · AgentWorkforce/relay

khaliqgant · 2026-04-13T20:41:02Z

Summary

Adds two agent-relay workflows under workflows/cloud-connect/ that fix regressions in agent-relay cloud connect <provider> and install regression guards so they don't reship.

fix-cloud-connect-claude-hang.ts — agent-relay cloud connect anthropic prints "Starting interactive authentication..." and hangs. Root cause is in src/cli/lib/ssh-interactive.ts: (1) stream.on('data', ...) is registered AFTER stream.write, so bytes emitted synchronously on shell open are dropped; (2) ; exit $? wrapper races with the Ink TUI's final alternate-screen-buffer flush. The workflow extracts a pure formatShellInvocation helper, reorders the shell callback, replaces the payload with exec <cmd>, and writes vitest unit tests with a mock ssh2 Client that exercise the handler-order regression, the exec prefix, and synchronous early data capture.
fix-agent-relay-utils-bundling.ts — npx agent-relay cloud connect openai fails with ERR_MODULE_NOT_FOUND: Cannot find package '@agent-relay/utils', but agent-relay --version / --help pass because the existing post-publish verifier never exercises a code path that imports @agent-relay/utils. bundledDependencies lists @agent-relay/utils, but at npm pack time node_modules/@agent-relay/utils is a workspace symlink and npm pack doesn't follow symlinks out of the package root, so the tarball silently ships nothing under node_modules/@agent-relay/. The workflow adds scripts/prepack-materialize-workspaces.mjs to replace those symlinks with real directories before pack, scripts/verify-bundled-deps.mjs wired into prepack + prepublishOnly as a fail-fast guard, and a new Test 6 in scripts/post-publish-verify/verify-install.sh that require.resolves @agent-relay/utils from the installed package and dynamic-imports the cloud-connect code path.

Both workflows follow the 80-100 pattern (test-fix-rerun, verify gates, typecheck, regression suite) so they stay useful as permanent regression-detection workflows, not one-shot fixers. The companion server-side fix for the openai PATH-propagation hang landed in AgentWorkforce/cloud#151.

Also unignores /workflows/cloud-connect/ in .gitignore alongside the existing ci/, refactor/, and relayauth-integration/ exceptions.

Test plan

cd relay && agent-relay run workflows/cloud-connect/fix-cloud-connect-claude-hang.ts runs to completion with all steps green
cd relay && agent-relay run workflows/cloud-connect/fix-agent-relay-utils-bundling.ts runs to completion with all steps green
agent-relay cloud connect anthropic successfully opens the claude TUI instead of hanging
npx agent-relay cloud connect openai reaches codex login without ERR_MODULE_NOT_FOUND
Post-publish verification Test 6 fails cleanly if @agent-relay/utils is ever unresolvable from the installed package

🤖 Generated with Claude Code

Two agent-relay workflows that fix regressions in `agent-relay cloud connect <provider>` and add regression guards to prevent them shipping again. 1. fix-cloud-connect-claude-hang.ts — claude TUI hang - Refactors src/cli/lib/ssh-interactive.ts to (a) register stream.on('data') BEFORE stream.write so synchronous early output is not dropped, and (b) use `exec <cmd>` instead of `<cmd>; exit $?` so the shell is replaced with the CLI and there is no teardown race against the Ink alternate-screen-buffer final flush. - Extracts formatShellInvocation() as a pure helper and adds vitest unit tests with a mock ssh2 Client (handler-order regression, exec-prefix assertion, synchronous-early-data capture). 2. fix-agent-relay-utils-bundling.ts — @agent-relay/utils not shipped - Adds scripts/prepack-materialize-workspaces.mjs to replace node_modules/@agent-relay/* symlinks with real directories so bundledDependencies actually copies them into the tarball. - Adds scripts/verify-bundled-deps.mjs and wires it into prepack + prepublishOnly as a fail-fast guard. - Extends scripts/post-publish-verify/verify-install.sh with a new Test 6 that resolves @agent-relay/utils from the installed package and dynamic-imports the cloud-connect code path, so an ERR_MODULE_NOT_FOUND regression is caught on the next publish instead of in user reports. Both workflows follow the 80-100 pattern (test-fix-rerun, verify gates, typecheck, regression suite) so they stay useful beyond the initial landing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

xkonjin

Nice work on the cloud connect bundling fix. A few observations:

Potential issue in prepack-materialize-workspaces.mjs:
The script uses a mix of sync and async realpath APIs. The comment mentions using realpathSync but the code shows await import('node:fs/promises')).realpath(dst). Consider standardizing on fs.realpathSync for consistency since the rest of the script uses sync APIs.

Race condition concern in ssh-interactive.ts:
The fix moves stream event handlers before the write call, which is good. However, the timer is now started after stdin.resume() but before the stream.write(). If the remote side responds very quickly (within microseconds), the timeout might not be active yet. Consider starting the timer before or immediately alongside stdin setup.

Test coverage:
The new ssh-interactive.test.ts covers the handler-order regression well. Consider adding a test case for the timeout timer initialization order to prevent future regressions.

vitest.config.ts changes:
The new workspaceAliases approach is cleaner than the hardcoded alias list. Make sure all packages listed in workspacePackages are actually built before this config is used, or you may get resolve errors during test runs if a package has not been compiled yet.

Overall solid fix for a tricky bundling issue.

xkonjin

Test review

xkonjin

Code Review: Fix cloud connect claude hang

Overall: This is a well-structured fix for a race condition in the SSH interactive session handling. The reordering of event handlers and the switch from ; exit $? to exec pattern eliminates the data-listener race (H1) and shell-exit race (H2).

What looks good

Handler ordering fix: Moving stream.on('data', ...) before stream.write() is the correct fix for the race where early TUI output could be dropped before the listener attached.
exec pattern: Using exec <command> instead of <command>; exit $? eliminates the shell-teardown race. The PTY now closes when the CLI exits naturally.
Exported testable helper: formatShellInvocation is pure, simple, and directly testable. Good separation of concerns.
Test coverage: The new unit tests verify the handler-order regression (H1) and the shell invocation format. Mocking ssh2 is appropriate here.

Suggestions

1. Comment the handler-order invariant
Consider adding a code comment explaining why stream.on('data') must precede stream.write():

// Attach data listener BEFORE writing to avoid dropping early shell output
// that may be emitted synchronously when the shell spawns.
stream.on('data', ...);
stream.write(formatShellInvocation(command));

2. Edge case: command with shell metacharacters
formatShellInvocation blindly wraps the command. If command contains shell metacharacters (e.g., claude; rm -rf /), the behavior changes with exec. Consider whether input validation is needed upstream.

3. Verify no other callers need updating
Search for other stream.write patterns with exit $? in the codebase — the same bug may exist elsewhere in SSH-related code.

Security

No new security issues. The change eliminates a shell wrapper, which is slightly safer.

Test coverage

Good regression tests. Consider adding:

Test for command with quotes/special characters
Test for the timeout path (timer firing before completion)

Verdict: Approve.

- Wrap command in `exec sh -c '…'` so zsh's exec builtin doesn't treat leading `VAR=…` prefix assignments as the command name. - Escape single quotes in the shell-wrapped command body. - Ignore pre-command shell MOTD when matching success/error patterns by resetting the output buffer and gating pattern matching until after the command is written. Some sandbox images print "Last logged in: …" which falsely matched the broad /logged\s*in/i pattern. - Drop the `exitCode === 0` fallback for authDetected — zsh stays alive after a failed `exec` in interactive mode, and a user closing the session with Ctrl+D exits 0 without authenticating. - Update tests to cover the new payload shape, env-var prefix handling, quote escaping, MOTD filtering, and post-write pattern matching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The "ignores pre-command shell MOTD" test was failing because the fake emitted its early data AFTER stream.write, which meant the buffer reset had already enabled pattern matching and the /READY/ pattern matched. Replace it with a test that reproduces the actual cloud-connect regression this branch fixes: a clean shell exit (exit code 0) without any pattern match must not be reported as authDetected. This is the behavior that caused `cloud connect claude` / `cloud connect codex` to always print "Authentication Complete!" even when the remote exec failed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 13 additional findings in Devin Review.

devin-ai-integration · 2026-04-14T08:08:31Z

+  const distSrc = join(realPath, 'dist');
+  if (existsSync(distSrc)) {
+    cpSync(distSrc, join(dst, 'dist'), { recursive: true });
+  }
+
+  const readmeSrc = join(realPath, 'README.md');
+  if (existsSync(readmeSrc)) {
+    cpSync(readmeSrc, join(dst, 'README.md'));
+  }


🔴 Materializer omits bin/ directory, losing @agent-relay/sdk broker binary

The prepack-materialize-workspaces.mjs script hardcodes copying only package.json, dist/, and README.md from each symlinked workspace package. However, @agent-relay/sdk declares "files": ["dist", "bin", "README.md", "package.json"] (packages/sdk/package.json), and its bin/ directory contains the platform-specific agent-relay-broker binary (populated by the build:rust script that runs before the materializer in the prepack chain). The getBrokerBinaryPath() resolver in packages/sdk/src/broker-path.ts:68 looks for the binary at resolve(currentModuleDir, '..', 'bin') — i.e. the SDK's own bin/ directory. Because the materializer doesn't copy bin/, the bundled node_modules/@agent-relay/sdk/bin/ will be absent in the published tarball, and the primary broker binary resolution path will silently fail. The root package.json files array does not include packages/sdk/bin either, so the binary has no other packaging route into the tarball.

Suggested change

const distSrc = join(realPath, 'dist');

if (existsSync(distSrc)) {

cpSync(distSrc, join(dst, 'dist'), { recursive: true });

}

const readmeSrc = join(realPath, 'README.md');

if (existsSync(readmeSrc)) {

cpSync(readmeSrc, join(dst, 'README.md'));

}

const distSrc = join(realPath, 'dist');

if (existsSync(distSrc)) {

cpSync(distSrc, join(dst, 'dist'), { recursive: true });

}

const binSrc = join(realPath, 'bin');

if (existsSync(binSrc)) {

cpSync(binSrc, join(dst, 'bin'), { recursive: true });

}

const readmeSrc = join(realPath, 'README.md');

if (existsSync(readmeSrc)) {

cpSync(readmeSrc, join(dst, 'README.md'));

}

Was this helpful? React with 👍 or 👎 to provide feedback.

xkonjin

Solid, well-scoped PR — the bundling fix addresses a real regression and the ssh-interactive refactor closes a race condition. Feedback below:

Bundling / packaging:

prepack-materialize-workspaces.mjs idempotency edge case: The script skips already-materialized dirs via .materialized marker, but if a developer runs npm install after materialization, npm may restore symlinks while the .materialized marker is still present. The next prepack would silently skip the stale real-dir. Consider checking lstatSync(dst).isSymbolicLink() even when the marker exists, or documenting that npm install after prepack requires manual git checkout node_modules/.
verify-bundled-deps.mjs symlink check: Good. But it only checks root-level symlinks. If node_modules/@agent-relay/utils/node_modules/.bin/ contains symlinks (common), they are irrelevant to bundledDependencies but worth keeping in mind. Not blocking.
Coverage thresholds: The vitest.config.ts excludes packages/cloud/src/workflows.ts, packages/cloud/src/api-client.ts, and packages/telemetry/** from coverage. Make sure these exclusions are intentional and temporary — if they're permanently dead code, consider deleting them instead.

SSH interactive / claude hang:

formatShellInvocation escaping bug: The implementation handles single quotes via replace(/'/g, "'\\''"), which is correct for POSIX sh -c. But it does not escape backslashes before single quotes. A command like echo 'it\'s working' would break the quoting. This is a minor edge case, but since the function is now exported and tested, consider documenting the limitation or handling \\' sequences.
authDetected logic change: The return expression now requires execResult?.authDetected === true and no longer falls back to exitCode === 0. This is the correct fix for the zsh exec failure case, but verify that all callers supply non-empty successPatterns — otherwise a successful auth with no pattern will now report authDetected: false. The tests show callers do supply patterns, but worth a quick audit.
Missing test for cleanup() on stream.error: The runInteractiveSession tests are great for the H1 race, but there's no test verifying that a stream error mid-session calls cleanup(), restores raw mode, and rejects. A future refactor could regress this. Consider adding one.

Approve with nits — the bundling and claude-hang fixes are both high-value and low-risk.

khaliqgant requested a review from willwashburn as a code owner April 13, 2026 20:41

devin-ai-integration Bot reviewed Apr 13, 2026

View reviewed changes

kjgbot added 4 commits April 13, 2026 22:56

fix: harden cloud-connect workflow verification

8acdc9a

fix(workflows): harden cloud-connect reruns

e5b6cb4

fix(cli): harden ssh interactive startup

e341a92

fix(publish): materialize bundled relay packages

7a24b98

This comment was marked as resolved.

Sign in to view

kjgbot added 2 commits April 13, 2026 23:43

fix(publish): strengthen bundled dep smoke test

0c0e7cb

fix(test): use source aliases in vitest

bbc8cee

xkonjin reviewed Apr 14, 2026

View reviewed changes

khaliqgant and others added 3 commits April 14, 2026 09:49

install instruction

73e8f9d

devin-ai-integration Bot reviewed Apr 14, 2026

View reviewed changes

xkonjin reviewed Apr 14, 2026

View reviewed changes

khaliqgant merged commit fb90632 into main Apr 14, 2026
52 checks passed

khaliqgant deleted the fix/cloud-connect-workflows branch April 14, 2026 08:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workflows): cloud-connect fix workflows (claude hang + utils bundling)#738

feat(workflows): cloud-connect fix workflows (claude hang + utils bundling)#738
khaliqgant merged 10 commits into
mainfrom
fix/cloud-connect-workflows

khaliqgant commented Apr 13, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

xkonjin left a comment

Uh oh!

xkonjin left a comment

Uh oh!

xkonjin left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 14, 2026

Uh oh!

xkonjin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

khaliqgant commented Apr 13, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

xkonjin left a comment

Choose a reason for hiding this comment

Uh oh!

xkonjin left a comment

Choose a reason for hiding this comment

Uh oh!

xkonjin left a comment

Choose a reason for hiding this comment

Code Review: Fix cloud connect claude hang

What looks good

Suggestions

Security

Test coverage

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

xkonjin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

khaliqgant commented Apr 13, 2026 •

edited by devin-ai-integration Bot

Loading