Drop --max-budget-usd flag for claude backend (Max-plan subscription)

## Summary

The `--max-budget-usd` flag passed to every `claude --print` subprocess (inner Claude session, stage-1 extraction, stage-2 episode generation) enforces a computed-cost ceiling that has no relation to actual billing for operators on a Max-plan OAuth subscription. Sessions get terminated at $10 of phantom cost despite no money being charged, penalizing subscription users for "spending" money they aren't spending.

`agent_backend == "claude"` is already a single source of truth: in Kai's deployment shape, anyone using the Claude Code CLI is on a Max-plan subscription. (Running the Claude Code CLI under API-key billing is an unusual setup that has no real reason to use Kai over the Anthropic SDK directly.) This issue proposes formalizing that semantic and dropping the budget-flag emission for the claude backend.

## Current behavior

Three call sites emit `--max-budget-usd` to `claude --print`:

1. The inner Claude session subprocess (driven by `BUDGET_CEILING`, default `$10`).
2. Stage-1 memory extraction (driven by `MEMORY_EXTRACTION_BUDGET_USD`, default `$0.01`, production-tuned to `$0.05`+).
3. Stage-2 episode generation (driven by `MEMORY_EPISODE_BUDGET_USD`, default `$0.15`).

For Max-plan OAuth users, the CLI tracks computed cost-per-token regardless of billing mode and exits with `Reached maximum budget ($N)` when crossed. The user sees a session terminate, runs `/new` to recover, but no actual money was charged — the ceiling enforced a phantom cost.

## Proposed behavior

When `agent_backend == "claude"`, omit `--max-budget-usd` from all three call sites. Runaway-loop protection remains in place via the existing per-call timeouts (extraction 90s, episode 120s, inner Claude `claude_timeout_seconds`). The dollar-amount env vars (`BUDGET_CEILING`, `MEMORY_EXTRACTION_BUDGET_USD`, `MEMORY_EPISODE_BUDGET_USD`) stay defined; the flag-emission gate is what changes.

For other backends (currently no production users; reserved for future API-key-billed setups), keep the existing flag-emission and ceiling behavior. The dollar-amount env vars continue to mean what they mean today on those backends.

## Why omit the flag rather than pass a sentinel

Passing `--max-budget-usd 999999.0` works but leaks a magic number into the log and the subprocess argv. Omitting the flag entirely is cleaner — the CLI's default behavior (no cap) is exactly what we want for subscription users. The `claude --help` output documents `--max-budget-usd` as optional, and absence means "no limit."

## Acceptance criteria

- For `agent_backend == "claude"`, no `--max-budget-usd` flag is present in the subprocess argv for any of the three call sites.
- For non-claude backends, the flag is emitted with the configured dollar value as today.
- The dollar-amount env vars (`BUDGET_CEILING`, `MEMORY_EXTRACTION_BUDGET_USD`, `MEMORY_EPISODE_BUDGET_USD`) remain defined in `Config` and `load_config()` so operators who hand-edited their env files don't see a startup error. Wizard continues to prompt for them under non-claude backends; for the claude backend, the prompts can either be hidden (preferred) or kept with a note that the value is informational only.
- Tests pin both behaviors: a `agent_backend="claude"` test asserts the flag is absent; a `agent_backend="goose"` (or other) test asserts the flag is present with the configured value.
- Existing runaway-loop protection (per-call timeouts) is verified unchanged.

## Why now

Operators on Max-plan have hit `BUDGET_CEILING` recently and seen sessions terminate for no operational cost. The recent error-recovery PR makes the failure visible (real error string + recovery directive) but doesn't address the underlying "the ceiling shouldn't fire at all for subscription users" question.

## Out of scope

- Auto-detection of billing mode at runtime (e.g., probing whether the CLI returns a billed cost). Brittle; explicit operator config via the existing `agent_backend` setting is the safer shape.
- Changing the runaway-loop protection mechanism. Per-call timeouts already cover this; no need to redesign.
- Adding a separate `CLAUDE_BILLING_MODE` env var. The `agent_backend` setting is sufficient.
- Token-count-based ceilings as a substitute for dollar-amount ceilings. Could be a future evaluation if Anthropic exposes a `--max-tokens` flag, but unrelated to this fix.
- Changing the default values of the existing dollar-amount env vars on non-claude backends.

## Notes

The semantic decision being formalized here is "Kai's claude backend is Max-plan-only by design." That commits us to NOT supporting Kai with `claude --print` under API-key billing on the claude backend going forward. Anyone with that hypothetical setup would need to fork or open a follow-up issue. Worth recording the decision explicitly so it isn't surprising later.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop --max-budget-usd flag for claude backend (Max-plan subscription) #390

Summary

Current behavior

Proposed behavior

Why omit the flag rather than pass a sentinel

Acceptance criteria

Why now

Out of scope

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Drop --max-budget-usd flag for claude backend (Max-plan subscription) #390

Description

Summary

Current behavior

Proposed behavior

Why omit the flag rather than pass a sentinel

Acceptance criteria

Why now

Out of scope

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions