Skip to content

burn limits: honor fidelity (mark forecasts low-confidence on partial usage) #105

@willwashburn

Description

@willwashburn

Context

PR #76 (#41 first cut) shipped summarizeFidelity and hasMinimumFidelity in @relayburn/analyze, but stopped short of wiring those helpers into the commands that consume the ledger. Quoting PR #76's deferred-work paragraph:

burn compare, burn waste, burn limits, burn plans behavior gating on fidelity class — the helpers (hasMinimumFidelity, summarizeFidelity) are now in place; wiring them into each command is the natural follow-up.

burn limits reads ledger turns via loadForecastFromLedger (packages/cli/src/commands/limits.ts:46) to project tokensSoFar against the active 5-hour window. Today it sums tokens regardless of whether each contributing turn actually has reliable token data — a Codex turn missing token_count lands usage.input === 0 / usage.output === 0 and silently weights the forecast toward "lots of headroom left." Quoting #41:

burn limits / burn plans (#5, #39) … should permit partial usage data where enough exists for spend totals … should mark projections as low-confidence when the underlying fidelity is partial.

Proposal

In packages/cli/src/commands/limits.ts (and the loadForecastFromLedger helper it calls):

  1. Permissive filter. limits is allowed to consume partial and aggregate-only data — token / cost totals still mean something even when per-turn detail is fuzzy. Do not default-exclude turns the way compare does. Use the entire slice the active 5-hour window covers.
  2. Track contributing fidelity. Run summarizeFidelity over the windowed slice. Compute a confidence flag: high when every contributing turn has class === 'full' or 'usage-only' (with hasInputTokens + hasOutputTokens true); low when any contributing turn is partial / aggregate-only / cost-only / unknown.
  3. Surface confidence in the rendered output. When confidence === 'low', append a notice to the human-readable line: forecast: low-confidence (N of M contributing turns lack per-turn token data). The forecast number itself is unchanged — we are not refusing, we are flagging.
  4. JSON contract. Add a fidelity block to the --json payload: { confidence: 'high' | 'low', summary: FidelitySummary }. Reuse the same shape as summary --json so programmatic consumers don't have to learn a second schema.
  5. --watch mode. Recompute on each tick; the confidence flag may flip mid-window as new full-fidelity turns arrive.

Acceptance criteria

  • burn limits continues to render a forecast even when the windowed slice contains partial or aggregate-only turns (no refusal).
  • The rendered output shows a low-confidence notice when any contributing turn lacks per-turn token coverage; full-fidelity windows show no notice (suppressed in the all-full common case to avoid noise, matching summary's behavior).
  • --json emits a fidelity block with confidence and the underlying FidelitySummary.
  • --watch re-evaluates confidence on each tick.
  • New tests in packages/cli/src/commands/limits.test.ts cover: high-confidence (all full), low-confidence (one partial turn), and the JSON shape.

Out of scope

  • The OAuth usage endpoint side of limits (Anthropic-side window data). That is independent of TurnRecord.fidelity and stays unchanged.
  • Confidence intervals / probabilistic forecasts. The flag is binary high / low for the first cut.

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions