Skip to content

Drop --max-budget-usd flag for claude backend (Max-plan subscription) #390

@dcellison

Description

@dcellison

Summary

The --max-budget-usd flag passed to every claude --print subprocess (inner Claude session, stage-1 extraction, stage-2 episode generation) enforces a computed-cost ceiling that has no relation to actual billing for operators on a Max-plan OAuth subscription. Sessions get terminated at $10 of phantom cost despite no money being charged, penalizing subscription users for "spending" money they aren't spending.

agent_backend == "claude" is already a single source of truth: in Kai's deployment shape, anyone using the Claude Code CLI is on a Max-plan subscription. (Running the Claude Code CLI under API-key billing is an unusual setup that has no real reason to use Kai over the Anthropic SDK directly.) This issue proposes formalizing that semantic and dropping the budget-flag emission for the claude backend.

Current behavior

Three call sites emit --max-budget-usd to claude --print:

  1. The inner Claude session subprocess (driven by BUDGET_CEILING, default $10).
  2. Stage-1 memory extraction (driven by MEMORY_EXTRACTION_BUDGET_USD, default $0.01, production-tuned to $0.05+).
  3. Stage-2 episode generation (driven by MEMORY_EPISODE_BUDGET_USD, default $0.15).

For Max-plan OAuth users, the CLI tracks computed cost-per-token regardless of billing mode and exits with Reached maximum budget ($N) when crossed. The user sees a session terminate, runs /new to recover, but no actual money was charged — the ceiling enforced a phantom cost.

Proposed behavior

When agent_backend == "claude", omit --max-budget-usd from all three call sites. Runaway-loop protection remains in place via the existing per-call timeouts (extraction 90s, episode 120s, inner Claude claude_timeout_seconds). The dollar-amount env vars (BUDGET_CEILING, MEMORY_EXTRACTION_BUDGET_USD, MEMORY_EPISODE_BUDGET_USD) stay defined; the flag-emission gate is what changes.

For other backends (currently no production users; reserved for future API-key-billed setups), keep the existing flag-emission and ceiling behavior. The dollar-amount env vars continue to mean what they mean today on those backends.

Why omit the flag rather than pass a sentinel

Passing --max-budget-usd 999999.0 works but leaks a magic number into the log and the subprocess argv. Omitting the flag entirely is cleaner — the CLI's default behavior (no cap) is exactly what we want for subscription users. The claude --help output documents --max-budget-usd as optional, and absence means "no limit."

Acceptance criteria

  • For agent_backend == "claude", no --max-budget-usd flag is present in the subprocess argv for any of the three call sites.
  • For non-claude backends, the flag is emitted with the configured dollar value as today.
  • The dollar-amount env vars (BUDGET_CEILING, MEMORY_EXTRACTION_BUDGET_USD, MEMORY_EPISODE_BUDGET_USD) remain defined in Config and load_config() so operators who hand-edited their env files don't see a startup error. Wizard continues to prompt for them under non-claude backends; for the claude backend, the prompts can either be hidden (preferred) or kept with a note that the value is informational only.
  • Tests pin both behaviors: a agent_backend="claude" test asserts the flag is absent; a agent_backend="goose" (or other) test asserts the flag is present with the configured value.
  • Existing runaway-loop protection (per-call timeouts) is verified unchanged.

Why now

Operators on Max-plan have hit BUDGET_CEILING recently and seen sessions terminate for no operational cost. The recent error-recovery PR makes the failure visible (real error string + recovery directive) but doesn't address the underlying "the ceiling shouldn't fire at all for subscription users" question.

Out of scope

  • Auto-detection of billing mode at runtime (e.g., probing whether the CLI returns a billed cost). Brittle; explicit operator config via the existing agent_backend setting is the safer shape.
  • Changing the runaway-loop protection mechanism. Per-call timeouts already cover this; no need to redesign.
  • Adding a separate CLAUDE_BILLING_MODE env var. The agent_backend setting is sufficient.
  • Token-count-based ceilings as a substitute for dollar-amount ceilings. Could be a future evaluation if Anthropic exposes a --max-tokens flag, but unrelated to this fix.
  • Changing the default values of the existing dollar-amount env vars on non-claude backends.

Notes

The semantic decision being formalized here is "Kai's claude backend is Max-plan-only by design." That commits us to NOT supporting Kai with claude --print under API-key billing on the claude backend going forward. Anyone with that hypothetical setup would need to fork or open a follow-up issue. Worth recording the decision explicitly so it isn't surprising later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions