Skip to content

fix(run): exit non-zero on session.error so CI wrappers see failures#71

Merged
byapparov merged 1 commit into
mainfrom
70-exit-on-auth-failure
May 21, 2026
Merged

fix(run): exit non-zero on session.error so CI wrappers see failures#71
byapparov merged 1 commit into
mainfrom
70-exit-on-auth-failure

Conversation

@byapparov
Copy link
Copy Markdown
Contributor

Summary

  • aictrl run's session.error handler in run.ts logged the error and stashed it into a local string but never set process.exitCode — so when the model session subsequently emitted session.status: idle, the run loop broke into the success branch and the process exited 0.
  • Result: any CI workflow wrapping aictrl run shows step conclusion: success while the actual work (review, agent invocation, etc.) silently failed.
  • Fix: set process.exitCode = 1 when session.error fires for the primary session (sub-agent failures keep their existing behavior). Uses exitCode rather than process.exit() so session_complete still emits before termination.

Root cause

packages/cli/src/cli/cmd/run.ts:559-572 — the handler block only updated the local error accumulator string and called UI.error(). It did not propagate to the process exit code, so the success branch of loopDone.then(...) (line 703) ran on every error.

Trace for the auth-failure case observed in production:

  1. Provider returns 401/403.
  2. session/processor.ts:388 publishes session.error, sets status idle.
  3. run.ts:559 handler logs but doesn't touch exitCode.
  4. run.ts:614 session.status idle breaks the loop.
  5. run.ts:703 success branch — emits session_complete, returns.
  6. Process exits 0.

The neighboring error paths (promptResult.catch at line 768, loopDone.catch at line 721) already set exitCode = 1 for synchronous failures; the session.error event path was the only gap.

Reproduction (from aictrl-dev/aictrl#2058 investigation)

ZHIPU_API_KEY=invalid aictrl run --model zai-coding-plan/glm-5.1 -f some-skill.md -- "hello"
# stdout: "Error: Authentication Failed"
# echo $?      → 0   ← bug

After the fix: exit 1.

Test plan

  • New regression test (packages/cli/test/cli/run-session-error-exit.test.ts) — static lock asserting the handler block sets a non-zero exit code, mirroring the existing run-event-race.test.ts anti-pattern-lock style. Fails on origin/main (verified locally), passes on this branch.
  • Existing CLI command tests still pass (bun test test/cli/ — 69/69).
  • Existing tool/skill tests pass in isolation; full-suite flakes are pre-existing and unrelated (verified by stashing the change).
  • bun typecheck passes.

Acceptance criteria (from #70)

  • aictrl run with an invalid ZHIPU_API_KEY exits non-zero — flows through session.error for primary session → exitCode = 1
  • aictrl run with a network failure to the provider exits non-zero — same path (provider-side error → MessageV2.fromErrorSession.Event.Error)
  • Existing happy-path tests still pass — no change to the success branch
  • Regression test asserting non-zero exit on auth failure

Related

Closes #70

🤖 Generated with Claude Code

The `aictrl run` command's session.error event handler logged the error
and accumulated it into a local string but never set `process.exitCode`.
On auth/provider failures, the subsequent `session.status: idle` broke
the run loop into the success branch and the process exited with code 0
— while stdout printed "Authentication Failed" or similar.

This silently broke CI workflows wrapping `aictrl run` (observed on
aictrl-dev/aictrl's "AI Review" workflow on 2026-05-20): step
conclusion `success`, no actual work, badge stays green.

Fix: set `process.exitCode = 1` when the session.error event fires for
the primary session. Sub-agent failures keep their existing behavior of
flowing into the parent agent. Use `exitCode` (not `process.exit`) so
the event loop drains naturally — session_complete still emits before
the process terminates.

Regression test (test/cli/run-session-error-exit.test.ts) statically
asserts the handler block sets a non-zero exit code, mirroring the
existing run-event-race.test.ts style.

Closes #70

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@byapparov
Copy link
Copy Markdown
Contributor Author

Note on the failing `Aictrl AI Review` check: it's failing in the `Install Aictrl CLI` step (`npm install @aictrl/cli@latest` → `EUNSUPPORTEDPROTOCOL: Unsupported URL Type "workspace:": workspace:*`), not on this PR's code change. The `verify` check (typecheck + tests) passes.

The published `@aictrl/cli@0.3.3` manifest ships unresolved `workspace:*` deps that plain `npm install` can't handle. Same failure affects every PR opened on this repo since 0.3.3 published — filed as #72 with reproduction, root cause hypothesis, and acceptance criteria.

PR #71's change has been verified locally:

  • New regression test (`run-session-error-exit.test.ts`) fails on `origin/main`, passes on this branch.
  • `bun test test/cli/` — 69/69 pass.
  • `bun typecheck` — clean.

Suggest merging on `verify` ✓ alone, or waiting for #72 to land if you'd rather have AI Review running first.

@byapparov byapparov self-assigned this May 21, 2026
@byapparov byapparov merged commit 9d19eb5 into main May 21, 2026
1 of 2 checks passed
@byapparov byapparov deleted the 70-exit-on-auth-failure branch May 21, 2026 20:35
byapparov added a commit that referenced this pull request May 21, 2026
v0.3.4 release (#71 + #72 fixes) couldn't actually publish @aictrl/cli
because the publish workflow doesn't auto-stamp AICTRL_VERSION into
the main package.json `version` fields — only platform binaries get
the release version. On v0.3.4 the workflow tolerated "already-published"
errors for cli 0.3.3 / util 1.2.16 / sdk 0.1.2 and the smoke test
correctly caught it (ETARGET: No matching version found for
@aictrl/cli@0.3.4).

Bump cli and util so v0.3.5 actually publishes new versions carrying
the resolver + exit-code fixes. Skip sdk: its catalog: deps are only
in devDependencies, ignored by npm consumers, so 0.1.2 is functionally
installable.

Follow-up worth filing: publish.yml should stamp AICTRL_VERSION into
all package manifests, not just platform binaries, so future releases
don't need manual version bumps.

Refs #71 #72

Co-authored-by: Bulat Yapparov <by4pparov@yandex.ru>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix: aictrl run exits 0 on auth failure, masking broken CI workflows

1 participant