Summary
The --max-budget-usd flag passed to every claude --print subprocess (inner Claude session, stage-1 extraction, stage-2 episode generation) enforces a computed-cost ceiling that has no relation to actual billing for operators on a Max-plan OAuth subscription. Sessions get terminated at $10 of phantom cost despite no money being charged, penalizing subscription users for "spending" money they aren't spending.
agent_backend == "claude" is already a single source of truth: in Kai's deployment shape, anyone using the Claude Code CLI is on a Max-plan subscription. (Running the Claude Code CLI under API-key billing is an unusual setup that has no real reason to use Kai over the Anthropic SDK directly.) This issue proposes formalizing that semantic and dropping the budget-flag emission for the claude backend.
Current behavior
Three call sites emit --max-budget-usd to claude --print:
- The inner Claude session subprocess (driven by
BUDGET_CEILING, default $10).
- Stage-1 memory extraction (driven by
MEMORY_EXTRACTION_BUDGET_USD, default $0.01, production-tuned to $0.05+).
- Stage-2 episode generation (driven by
MEMORY_EPISODE_BUDGET_USD, default $0.15).
For Max-plan OAuth users, the CLI tracks computed cost-per-token regardless of billing mode and exits with Reached maximum budget ($N) when crossed. The user sees a session terminate, runs /new to recover, but no actual money was charged — the ceiling enforced a phantom cost.
Proposed behavior
When agent_backend == "claude", omit --max-budget-usd from all three call sites. Runaway-loop protection remains in place via the existing per-call timeouts (extraction 90s, episode 120s, inner Claude claude_timeout_seconds). The dollar-amount env vars (BUDGET_CEILING, MEMORY_EXTRACTION_BUDGET_USD, MEMORY_EPISODE_BUDGET_USD) stay defined; the flag-emission gate is what changes.
For other backends (currently no production users; reserved for future API-key-billed setups), keep the existing flag-emission and ceiling behavior. The dollar-amount env vars continue to mean what they mean today on those backends.
Why omit the flag rather than pass a sentinel
Passing --max-budget-usd 999999.0 works but leaks a magic number into the log and the subprocess argv. Omitting the flag entirely is cleaner — the CLI's default behavior (no cap) is exactly what we want for subscription users. The claude --help output documents --max-budget-usd as optional, and absence means "no limit."
Acceptance criteria
- For
agent_backend == "claude", no --max-budget-usd flag is present in the subprocess argv for any of the three call sites.
- For non-claude backends, the flag is emitted with the configured dollar value as today.
- The dollar-amount env vars (
BUDGET_CEILING, MEMORY_EXTRACTION_BUDGET_USD, MEMORY_EPISODE_BUDGET_USD) remain defined in Config and load_config() so operators who hand-edited their env files don't see a startup error. Wizard continues to prompt for them under non-claude backends; for the claude backend, the prompts can either be hidden (preferred) or kept with a note that the value is informational only.
- Tests pin both behaviors: a
agent_backend="claude" test asserts the flag is absent; a agent_backend="goose" (or other) test asserts the flag is present with the configured value.
- Existing runaway-loop protection (per-call timeouts) is verified unchanged.
Why now
Operators on Max-plan have hit BUDGET_CEILING recently and seen sessions terminate for no operational cost. The recent error-recovery PR makes the failure visible (real error string + recovery directive) but doesn't address the underlying "the ceiling shouldn't fire at all for subscription users" question.
Out of scope
- Auto-detection of billing mode at runtime (e.g., probing whether the CLI returns a billed cost). Brittle; explicit operator config via the existing
agent_backend setting is the safer shape.
- Changing the runaway-loop protection mechanism. Per-call timeouts already cover this; no need to redesign.
- Adding a separate
CLAUDE_BILLING_MODE env var. The agent_backend setting is sufficient.
- Token-count-based ceilings as a substitute for dollar-amount ceilings. Could be a future evaluation if Anthropic exposes a
--max-tokens flag, but unrelated to this fix.
- Changing the default values of the existing dollar-amount env vars on non-claude backends.
Notes
The semantic decision being formalized here is "Kai's claude backend is Max-plan-only by design." That commits us to NOT supporting Kai with claude --print under API-key billing on the claude backend going forward. Anyone with that hypothetical setup would need to fork or open a follow-up issue. Worth recording the decision explicitly so it isn't surprising later.
Summary
The
--max-budget-usdflag passed to everyclaude --printsubprocess (inner Claude session, stage-1 extraction, stage-2 episode generation) enforces a computed-cost ceiling that has no relation to actual billing for operators on a Max-plan OAuth subscription. Sessions get terminated at $10 of phantom cost despite no money being charged, penalizing subscription users for "spending" money they aren't spending.agent_backend == "claude"is already a single source of truth: in Kai's deployment shape, anyone using the Claude Code CLI is on a Max-plan subscription. (Running the Claude Code CLI under API-key billing is an unusual setup that has no real reason to use Kai over the Anthropic SDK directly.) This issue proposes formalizing that semantic and dropping the budget-flag emission for the claude backend.Current behavior
Three call sites emit
--max-budget-usdtoclaude --print:BUDGET_CEILING, default$10).MEMORY_EXTRACTION_BUDGET_USD, default$0.01, production-tuned to$0.05+).MEMORY_EPISODE_BUDGET_USD, default$0.15).For Max-plan OAuth users, the CLI tracks computed cost-per-token regardless of billing mode and exits with
Reached maximum budget ($N)when crossed. The user sees a session terminate, runs/newto recover, but no actual money was charged — the ceiling enforced a phantom cost.Proposed behavior
When
agent_backend == "claude", omit--max-budget-usdfrom all three call sites. Runaway-loop protection remains in place via the existing per-call timeouts (extraction 90s, episode 120s, inner Claudeclaude_timeout_seconds). The dollar-amount env vars (BUDGET_CEILING,MEMORY_EXTRACTION_BUDGET_USD,MEMORY_EPISODE_BUDGET_USD) stay defined; the flag-emission gate is what changes.For other backends (currently no production users; reserved for future API-key-billed setups), keep the existing flag-emission and ceiling behavior. The dollar-amount env vars continue to mean what they mean today on those backends.
Why omit the flag rather than pass a sentinel
Passing
--max-budget-usd 999999.0works but leaks a magic number into the log and the subprocess argv. Omitting the flag entirely is cleaner — the CLI's default behavior (no cap) is exactly what we want for subscription users. Theclaude --helpoutput documents--max-budget-usdas optional, and absence means "no limit."Acceptance criteria
agent_backend == "claude", no--max-budget-usdflag is present in the subprocess argv for any of the three call sites.BUDGET_CEILING,MEMORY_EXTRACTION_BUDGET_USD,MEMORY_EPISODE_BUDGET_USD) remain defined inConfigandload_config()so operators who hand-edited their env files don't see a startup error. Wizard continues to prompt for them under non-claude backends; for the claude backend, the prompts can either be hidden (preferred) or kept with a note that the value is informational only.agent_backend="claude"test asserts the flag is absent; aagent_backend="goose"(or other) test asserts the flag is present with the configured value.Why now
Operators on Max-plan have hit
BUDGET_CEILINGrecently and seen sessions terminate for no operational cost. The recent error-recovery PR makes the failure visible (real error string + recovery directive) but doesn't address the underlying "the ceiling shouldn't fire at all for subscription users" question.Out of scope
agent_backendsetting is the safer shape.CLAUDE_BILLING_MODEenv var. Theagent_backendsetting is sufficient.--max-tokensflag, but unrelated to this fix.Notes
The semantic decision being formalized here is "Kai's claude backend is Max-plan-only by design." That commits us to NOT supporting Kai with
claude --printunder API-key billing on the claude backend going forward. Anyone with that hypothetical setup would need to fork or open a follow-up issue. Worth recording the decision explicitly so it isn't surprising later.