Skip to content

Fix: aictrl run exits 0 on auth failure, masking broken CI workflows #70

@byapparov

Description

@byapparov

Context

aictrl run returns exit code 0 even when the underlying model provider rejects authentication, causing CI workflows that wrap it to be marked green while doing nothing. This silently broke aictrl-dev/aictrl's "AI Review" workflow (.github/workflows/opencode.yml) for ~6 hours on 2026-05-20 — the workflow step showed conclusion: success on every PR while no review was actually being posted, hiding the breakage until someone went looking.

Reproduction

  1. Set ZHIPU_API_KEY to an invalid or expired value
  2. Run aictrl run --model zai-coding-plan/glm-5.1 -f <some-skill.md> -- "<prompt>"
  3. Observe stdout: Error: Authentication Failed
  4. Observe exit code: echo $?0

Expected: non-zero exit (1 for runtime error, or a structured code per #4 if the convention is being adopted)
Actual: 0 — CI step is marked successful

Evidence

GitHub Actions run https://github.com/aictrl-dev/aictrl/actions/runs/26184653129 (PR aictrl-dev/aictrl#2061, 2026-05-20):

2026-05-20T19:21:14.5Z  Starting review with GLM-5.1 for SHA c5ccfcf3...
2026-05-20T19:21:17.0Z  Error: Authentication Failed
2026-05-20T19:21:17.0Z  Post job cleanup.

Per-step conclusion: success. No review posted to the PR.

Same pattern affected 5+ PRs in aictrl-dev/aictrl between 14:55 and 21:00 UTC on 2026-05-20 — the workflow ran, "succeeded", and never posted a review. The provider-side cause (expired ZHIPUAI_API_KEY secret) is being fixed separately by rotating the key; this issue is about the CLI's failure to surface that failure to its caller.

Impact

  • Any CI workflow that wraps aictrl run is vulnerable to silent rot the moment an API key expires, rotates, or hits a quota wall.
  • Workflow status badges show green while the actual work isn't happening — a worse failure mode than a loud red, because nobody notices.
  • Operators have to dig into per-step stdout to discover the failure; the workflow-level signal is actively misleading.

Proposed Fix

aictrl run should exit non-zero when:

  • The model provider returns an auth/permission error (401, 403, "Authentication Failed", etc.)
  • The model session never starts because of a provider error
  • The model returns no content because of an upstream error

Specifically, the "Authentication Failed" branch in the provider-client error handler should propagate to process.exit(N) where N is non-zero. If #4 lands a structured exit-code convention first, align with it; otherwise 1 is a reasonable baseline that unblocks CI today.

Acceptance Criteria

  • aictrl run with an invalid ZHIPU_API_KEY exits non-zero
  • aictrl run with a network failure to the provider exits non-zero
  • Existing happy-path tests still pass (exit 0 when model returns content)
  • Regression test asserting non-zero exit on auth failure (mock the provider client)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions