fix(cli): add progress diagnostics and spawnSync to runScriptFile#731
Conversation
Users running agent-relay run workflow.ts can see the startup line printed but nothing else when something in the dispatch path hangs. The CLI provides no signal whether it is bootstrapping, pre-parsing, spawning tsx, or genuinely stuck, so the user eventually ctrl-c and retries. Two root causes: 1. Zero progress output between the startup line and the subprocess spawn — any hang in ensureLocalSdkWorkflowRuntime, preParseWorkflowFile, or the runner loop is invisible. 2. execFileSync with stdio: inherit has known stdout-forwarding quirks in Bun-compiled standalone binaries. When the child exits quickly, output can silently drop. Fix: - Add diag() helper that writes bracketed progress to stderr. - Emit diag() before and after every potentially slow step in runScriptFile: resolve, ensureLocalSdkWorkflowRuntime, preParseWorkflowFile, each runner attempt, ENOENT fallbacks, and the npx tsx fallback. - Replace execFileSync with spawnSync for TypeScript and Python runner dispatch. spawnSync preserves stdio inheritance more reliably across Bun and Node. Explicitly check .error and .status to surface non-zero exits instead of swallowing them. Existing setup.test.ts is updated where it references execFileSync mocks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| if (spawnResult.status !== 0) { | ||
| const err = new Error(`${runner} exited with code ${spawnResult.status}`); | ||
| return augmentErrorWithRunId(err); | ||
| } |
There was a problem hiding this comment.
🟡 Signal-killed child processes produce misleading "exited with code null" error
When a child process is terminated by a signal (e.g., SIGTERM, SIGKILL), spawnSync returns {status: null, signal: 'SIGTERM', error: undefined}. The code at lines 359 and 397 checks spawnResult.status !== 0, which evaluates null !== 0 → true, then creates new Error(${runner} exited with code ${spawnResult.status}) — producing a confusing message like "tsx exited with code null". The spawnResult.signal field is never checked, so the actual cause (signal termination) is lost. This pattern is repeated in all three spawnSync call sites (TS runners, npx fallback at setup.ts:375, and Python runners).
| if (spawnResult.status !== 0) { | |
| const err = new Error(`${runner} exited with code ${spawnResult.status}`); | |
| return augmentErrorWithRunId(err); | |
| } | |
| if (spawnResult.status !== 0 || spawnResult.signal) { | |
| const detail = spawnResult.signal | |
| ? `${runner} was killed by signal ${spawnResult.signal}` | |
| : `${runner} exited with code ${spawnResult.status}`; | |
| const err = new Error(detail); | |
| return augmentErrorWithRunId(err); | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Users report agent-relay run workflow.ts hanging silently with no output beyond Running workflow script... — there is no signal whether the CLI is bootstrapping, pre-parsing, spawning tsx, or genuinely stuck. Hit this in a recent cloud debugging session where ctrl-c was the only way to recover.
Root causes
Zero progress output between the startup line and the subprocess spawn. Any hang in ensureLocalSdkWorkflowRuntime, preParseWorkflowFile, or the runner dispatch loop is invisible.
execFileSync + stdio: inherit quirks in Bun-compiled binaries. The standalone agent-relay CLI is Bun-compiled, and execFileSync with inherited stdio has been observed to silently drop child stdout in some edge cases. spawnSync is more reliable for this pattern.
Changes
Test plan
Why this matters
Silent hangs are the worst kind of DX bug: they cost 10+ minutes per occurrence and the user has no information to decide between waiting, retrying, or filing a bug. This change guarantees the CLI is never silent for more than one slow operation. If it hangs, the last diag line tells you exactly which operation is stuck.
🤖 Generated with Claude Code