Skip to content

fix(cli): add progress diagnostics and spawnSync to runScriptFile#731

Merged
khaliqgant merged 1 commit into
mainfrom
fix/agent-relay-run-progress-diagnostics
Apr 13, 2026
Merged

fix(cli): add progress diagnostics and spawnSync to runScriptFile#731
khaliqgant merged 1 commit into
mainfrom
fix/agent-relay-run-progress-diagnostics

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

@khaliqgant khaliqgant commented Apr 13, 2026

Summary

Users report agent-relay run workflow.ts hanging silently with no output beyond Running workflow script... — there is no signal whether the CLI is bootstrapping, pre-parsing, spawning tsx, or genuinely stuck. Hit this in a recent cloud debugging session where ctrl-c was the only way to recover.

Root causes

  1. Zero progress output between the startup line and the subprocess spawn. Any hang in ensureLocalSdkWorkflowRuntime, preParseWorkflowFile, or the runner dispatch loop is invisible.

  2. execFileSync + stdio: inherit quirks in Bun-compiled binaries. The standalone agent-relay CLI is Bun-compiled, and execFileSync with inherited stdio has been observed to silently drop child stdout in some edge cases. spawnSync is more reliable for this pattern.

Changes

  • New diag() helper writes bracketed [agent-relay] progress lines to stderr. Never throws, falls back to stdout if stderr is closed.
  • diag() calls added at every potentially-slow step in runScriptFile: resolve, ensureLocalSdkWorkflowRuntime start/done, preParseWorkflowFile start/done, each runner attempt, ENOENT fallbacks, and the npx tsx fallback.
  • execFileSync replaced with spawnSync for the TypeScript and Python runner dispatch. spawnSync preserves stdio inheritance more reliably across Bun and Node. Explicitly checks .error and .status to surface non-zero exits instead of swallowing them.
  • setup.test.ts mocks updated where they referenced execFileSync (if needed).

Test plan

  • npx tsc --noEmit
  • npx vitest run src/cli/commands/setup.test.ts — all setup tests passing
  • Verified diag() appears at each required checkpoint via grep in verify step
  • After merge + release: run agent-relay run workflow.ts against a workflow in a fresh worktree and confirm the startup path emits [agent-relay] progress lines showing exactly where it is (pre-parse, runner dispatch, spawn, etc.)

Why this matters

Silent hangs are the worst kind of DX bug: they cost 10+ minutes per occurrence and the user has no information to decide between waiting, retrying, or filing a bug. This change guarantees the CLI is never silent for more than one slow operation. If it hangs, the last diag line tells you exactly which operation is stuck.

🤖 Generated with Claude Code


Open with Devin

Users running agent-relay run workflow.ts can see the startup line
printed but nothing else when something in the dispatch path hangs.
The CLI provides no signal whether it is bootstrapping, pre-parsing,
spawning tsx, or genuinely stuck, so the user eventually ctrl-c and
retries.

Two root causes:

1. Zero progress output between the startup line and the subprocess
   spawn — any hang in ensureLocalSdkWorkflowRuntime, preParseWorkflowFile,
   or the runner loop is invisible.

2. execFileSync with stdio: inherit has known stdout-forwarding quirks
   in Bun-compiled standalone binaries. When the child exits quickly,
   output can silently drop.

Fix:
- Add diag() helper that writes bracketed progress to stderr.
- Emit diag() before and after every potentially slow step in runScriptFile:
  resolve, ensureLocalSdkWorkflowRuntime, preParseWorkflowFile, each runner
  attempt, ENOENT fallbacks, and the npx tsx fallback.
- Replace execFileSync with spawnSync for TypeScript and Python runner
  dispatch. spawnSync preserves stdio inheritance more reliably across
  Bun and Node. Explicitly check .error and .status to surface non-zero
  exits instead of swallowing them.

Existing setup.test.ts is updated where it references execFileSync mocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment thread src/cli/commands/setup.ts
Comment on lines +359 to +362
if (spawnResult.status !== 0) {
const err = new Error(`${runner} exited with code ${spawnResult.status}`);
return augmentErrorWithRunId(err);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Signal-killed child processes produce misleading "exited with code null" error

When a child process is terminated by a signal (e.g., SIGTERM, SIGKILL), spawnSync returns {status: null, signal: 'SIGTERM', error: undefined}. The code at lines 359 and 397 checks spawnResult.status !== 0, which evaluates null !== 0true, then creates new Error(${runner} exited with code ${spawnResult.status}) — producing a confusing message like "tsx exited with code null". The spawnResult.signal field is never checked, so the actual cause (signal termination) is lost. This pattern is repeated in all three spawnSync call sites (TS runners, npx fallback at setup.ts:375, and Python runners).

Suggested change
if (spawnResult.status !== 0) {
const err = new Error(`${runner} exited with code ${spawnResult.status}`);
return augmentErrorWithRunId(err);
}
if (spawnResult.status !== 0 || spawnResult.signal) {
const detail = spawnResult.signal
? `${runner} was killed by signal ${spawnResult.signal}`
: `${runner} exited with code ${spawnResult.status}`;
const err = new Error(detail);
return augmentErrorWithRunId(err);
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@khaliqgant khaliqgant merged commit b4fffd4 into main Apr 13, 2026
43 checks passed
@khaliqgant khaliqgant deleted the fix/agent-relay-run-progress-diagnostics branch April 13, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant