Skip to content

Honor fidelity in burn compare (#95)#135

Open
willwashburn wants to merge 1 commit intomainfrom
feat/compare-honor-fidelity-95
Open

Honor fidelity in burn compare (#95)#135
willwashburn wants to merge 1 commit intomainfrom
feat/compare-honor-fidelity-95

Conversation

@willwashburn
Copy link
Copy Markdown
Member

@willwashburn willwashburn commented Apr 26, 2026

Summary

  • Wire the summarizeFidelity / hasMinimumFidelity helpers from @relayburn/analyze (shipped in 0.14.0 by #41, #76) into burn compare so the aggregate stops silently averaging aggregate-only / cost-only / partial turns with full-fidelity peers from the same model — the "looks more confident than it should" failure mode Coverage and fidelity metadata: distinguish missing, zero, aggregate-only, and partial usage #41 called out.
  • Default minimum is usage-only. Records emitted before TurnRecord.fidelity existed (pre-Coverage and fidelity metadata: distinguish missing, zero, aggregate-only, and partial usage #41 ledgers) still pass for backward compat. New --fidelity <class> flag overrides the floor; --include-partial is shorthand for --fidelity partial and includes every turn.
  • TTY output gains an excluded N turns below <class> fidelity (… aggregate-only, … cost-only, … partial) coverage line whenever the gate dropped anything. Per-model totals render instead of $0.00 when a model survived the filter with zero turns.
  • JSON gains a top-level fidelity block ({ minimum, excluded, summary }) computed against the unfiltered slice so consumers can render their own coverage UI.

Implementation notes

  • runCompare now takes an optional CompareDeps (ingestAll, queryAll, loadPricing), parallel to runLimits. The dispatch site in cli.ts is unchanged — defaults pick up the production implementations. This is what enables the new in-process tests.
  • The FidelityClass ordering in analyze/src/fidelity.ts puts partial strictly above aggregate-only / cost-only, so a literal hasMinimumFidelity(t.fidelity, 'partial') predicate would still drop those two buckets. The issue spec explicitly wants --fidelity partial (and --include-partial) to mean "no filtering at all", so that level is implemented as a runtime bypass rather than an order-table change. Documented in the inline comments next to the predicate and computeExcluded.

Rebase consideration for PR #130

#130 (issue #88) is migrating burn compare to read from archive.sqlite. This PR targets the in-memory path on main. When #130 merges, the rebase needs to apply the same fidelity gating to the SQL-aggregate path — either filter the archive.sqlite query by attribution_fidelity (preferred), or hydrate enough rows to apply hasMinimumFidelity in TS the way this PR does. The exclusion summary, JSON fidelity block, and "excluded N turns" coverage note all need to ride along through whichever path the rebase keeps. The --fidelity / --include-partial flag parsing in runCompare is path-agnostic and should survive verbatim.

Test plan

  • pnpm run build passes
  • pnpm run test:ts passes (525 tests, was 514)
  • New packages/cli/src/commands/compare.test.ts covers:
    • default exclusion of aggregate-only / cost-only / partial
    • backward-compat passthrough of fidelity === undefined
    • --fidelity full strict mode
    • --fidelity partial / --include-partial "include everything"
    • --include-partial + conflicting --fidelity exits 2
    • unknown --fidelity value exits 2
    • JSON fidelity block shape (minimum, excluded, summary)
    • rendered "excluded N turns" coverage note + breakdown by reason
    • singular/plural wording
    • cells collapsing to zero turns post-filter render , not $0.00 / 0%

Refs


Open in Devin Review

Wire the `summarizeFidelity` / `hasMinimumFidelity` helpers from
`@relayburn/analyze` (shipped in 0.14.0 by #41) into `burn compare` so the
aggregate stops silently averaging `aggregate-only` / `cost-only` / `partial`
turns with full-fidelity peers from the same model — the "looks more
confident than it should" failure mode #41 called out.

- Default minimum is `usage-only`. Records emitted before TurnRecord.fidelity
  existed (pre-#41 ledgers) still pass for backward compat.
- `--fidelity <class>` overrides the floor; `--include-partial` is shorthand
  for `--fidelity partial` and includes every turn. Conflicting combinations
  exit 2 with a clear message.
- TTY output adds an `excluded N turns below <class> fidelity (… aggregate-
  only, … cost-only, … partial)` coverage line whenever the gate dropped
  anything. Per-model totals render `—` instead of `$0.00` when a model
  survived the filter with zero turns.
- JSON gains a top-level `fidelity` block (`{ minimum, excluded, summary }`)
  computed against the unfiltered slice so consumers can render their own
  coverage UI.
- `runCompare` now accepts an optional `CompareDeps` for ingest/query/pricing
  injection (parallels `runLimits`), enabling deterministic CLI tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

burn compare: honor fidelity (default-exclude aggregate-only/cost-only, annotate, --fidelity flag)

1 participant