Skip to content

Handle GitHub billing limit errors in CI check steps without triggering retry loops #166

@Trecek

Description

@Trecek

Problem

Related: #108 (CI status check and auto-remediation after PR creation)

When #108 lands, recipes will have ci_watchresolve_cire_pushci_watch retry loops for CI failures. This works for genuine test/lint failures, but a GitHub billing/spending limit error is fundamentally different:

  • CI failures are actionable — resolve-failures can diagnose and fix them.
  • Billing limit errors are not actionable by any skill — they require waiting or human intervention (increasing the spending limit).

A billing limit hit would cause gh run watch --exit-status to either fail (the run never starts) or the run itself to fail with a billing-related cancellation. This would route to resolve_ci, which would attempt to "fix" an unfixable problem, re-push, and re-watch — burning tokens on a loop that can never succeed. With retries: 2 on resolve_ci, that's 3 full cycles before on_exhausted finally stops it.

Desired Behavior

  1. Detection: The ci_watch step (or a wrapper around it) should distinguish billing/spending limit errors from genuine CI failures. Indicators include:

    • Workflow runs cancelled or not started due to spending limits
    • gh run list returning runs with conclusion: cancelled and billing-related annotations
    • GitHub API 402/403 responses referencing spending limits
  2. Stop and escalate to human: Billing limit errors should bypass the resolve_ci retry loop entirely. The orchestrator should:

    • Stop the pipeline (no retries, no fix attempts)
    • Surface a clear message explaining the billing limit was hit
    • Wait for human input on how to proceed (e.g. increase the spending limit, skip CI, or abort)

    This is distinct from escalate_stop (which terminates) — the orchestrator should pause and hold context so the human can resume the pipeline after resolving the billing issue.

  3. No retry loops: The error must never be routed to resolve_ci or any fix skill.

Context

We already handle Anthropic API quota limits (execution/quota.py, hooks/quota_check.py) with a sleep-and-resume strategy. GitHub billing limits are different — they don't reset on a short timer, and there's no API to check remaining Actions minutes proactively. This needs a detect-stop-and-ask approach rather than detect-and-wait.

Acceptance Criteria

  • ci_watch step can distinguish billing limit errors from test/lint failures
  • Billing limit errors halt the pipeline and escalate to human intervention
  • Orchestrator preserves context so the human can resume after resolving the billing issue
  • Billing errors never route to resolve_ci or trigger retry loops
  • Integration with CI status check and auto-remediation after PR creation #108's ci_watchresolve_cire_push loop

Pattern Mining Corroboration (2026-03-08)

Mining 736 run_cmd invocations across 47 sessions found 39 total CI-related queries using raw gh commands with jq parsing. The get_ci_status and wait_for_ci MCP tools proposed in #237 would expose billing cancellation as a distinct conclusion value, enabling early-exit rather than the current retry loop on unfixable conditions.

Cross-ref: #237 (MCP tool candidates)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions