Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)

## [Unreleased]

### Changed

- **Coverage/fidelity command contract completed** (#41). Codex and OpenCode turns now carry normalized fidelity metadata, and downstream commands consume it: `burn compare` defaults to full/usage-only samples and reports fidelity exclusions, `burn summary` marks partial usage/cost fields instead of rendering unknowns as zero, `burn waste` refuses unsupported attribution with missing prerequisites, and plan/limit projections flag partial-fidelity confidence.

## [0.19.0] - 2026-04-26

### Added
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ You can override per-call via `costForUsage(usage, model, pricing, { reasoningMo
```
burn summary [--since 7d] [--project <path>] [--session <id>] [--workflow <id>] [--agent <id>]
burn by-tool [--since 7d] [--project <path>] [--session <id>]
burn compare [--models a,b] [--since 7d] [--project <path>] [--session <id>] [--workflow <id>] [--agent <id>] [--min-sample <n>] [--json|--csv]
burn compare [--models a,b] [--since 7d] [--project <path>] [--session <id>] [--workflow <id>] [--agent <id>] [--min-sample <n>] [--include-partial] [--fidelity full,usage-only] [--json|--csv]
burn claude [--tag k=v ...] [-- <claude args>]
```

Expand All @@ -278,7 +278,7 @@ exploration 118 $0.013 — 52 $0.003 —

One-shot rate = `turns with edits and zero intra-turn retries / edit turns`. It's `—` for categories that don't produce edits (`exploration`, `brainstorming`, etc.). Missing-data cells render as `—`, never `$0.00` or `0%`.

This is observed data, not counterfactual: it tells you what happened when you actually used both models, not what *would have* happened if you'd picked differently. Cells with `turns < --min-sample` (default 5) are flagged as indicative; categories where only one model has data surface a coverage note beneath the table. The JSON cell shape exposes both `noData` (we never saw this combination) and `insufficientSample` (we have data but not much) so consumers can tell them apart cleanly.
This is observed data, not counterfactual: it tells you what happened when you actually used both models, not what *would have* happened if you'd picked differently. By default `burn compare` includes `full` and `usage-only` fidelity turns and excludes `partial`, `aggregate-only`, and `cost-only` turns; use `--include-partial` or `--fidelity <classes>` to opt in. Cells with `turns < --min-sample` (default 5) are flagged as indicative; categories where only one model has data surface a coverage note beneath the table. The JSON cell shape exposes both `noData` (we never saw this combination) and `insufficientSample` (we have data but not much) so consumers can tell them apart cleanly.

Standard filters apply: `--session <id>` limits to a single session, `--agent <id>` limits to a stamped agent ID, `--workflow <id>` to a stamped workflow ID, `--project <path>` to a project path or git-canonical projectKey.

Expand Down
4 changes: 4 additions & 0 deletions packages/analyze/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Changed

- **Fidelity-aware compare and plan calculations** (#41). `buildCompareTable` now defaults to `full` + `usage-only` turns, tracks included/excluded sample counts by fidelity class, skips cost/cache metrics when required coverage is missing, and exposes priced-turn totals so unknown cost is not rendered as free. `computePlanUsage` now reports costed/skipped/partial/unknown fidelity counts and marks projections `partialData` when spend relies on incomplete coverage.

## [0.18.0] - 2026-04-26

### Fixed
Expand Down
110 changes: 109 additions & 1 deletion packages/analyze/src/compare.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ import { strict as assert } from 'node:assert';
import { describe, it } from 'node:test';

import type { EnrichedTurn } from '@relayburn/ledger';
import type { ActivityCategory } from '@relayburn/reader';
import { EMPTY_COVERAGE, makeFidelity } from '@relayburn/reader';
import type { ActivityCategory, Fidelity } from '@relayburn/reader';

import { buildCompareTable } from './compare.js';
import { loadBuiltinPricing } from './pricing.js';
Expand Down Expand Up @@ -35,6 +36,34 @@ function turn(
};
}

const FULL_FIDELITY: Fidelity = makeFidelity('per-turn', {
...EMPTY_COVERAGE,
hasInputTokens: true,
hasOutputTokens: true,
hasCacheReadTokens: true,
hasCacheCreateTokens: true,
hasToolCalls: true,
hasToolResultEvents: true,
hasSessionRelationships: true,
});

const USAGE_ONLY_FIDELITY: Fidelity = makeFidelity('per-turn', {
...EMPTY_COVERAGE,
hasInputTokens: true,
hasOutputTokens: true,
});

const PARTIAL_FIDELITY: Fidelity = makeFidelity('per-turn', {
...EMPTY_COVERAGE,
hasInputTokens: true,
});

const AGGREGATE_FIDELITY: Fidelity = makeFidelity('per-session-aggregate', {
...EMPTY_COVERAGE,
hasInputTokens: true,
hasOutputTokens: true,
});

describe('buildCompareTable', () => {
it('buckets turns by (model, activity) and reports per-cell metrics', async () => {
const pricing = await loadBuiltinPricing();
Expand Down Expand Up @@ -251,4 +280,83 @@ describe('buildCompareTable', () => {
assert.ok(cell.cacheHitRate !== null);
assert.ok(Math.abs(cell.cacheHitRate! - 3000 / 4000) < 1e-9);
});

it('excludes partial and aggregate fidelity by default and reports the sample loss', async () => {
const pricing = await loadBuiltinPricing();
const turns: EnrichedTurn[] = [
turn('claude-sonnet-4-6', 'coding', {
fidelity: FULL_FIDELITY,
hasEdits: true,
retries: 0,
}),
turn('claude-sonnet-4-6', 'coding', {
fidelity: USAGE_ONLY_FIDELITY,
hasEdits: true,
retries: 0,
}),
turn('claude-sonnet-4-6', 'coding', {
fidelity: PARTIAL_FIDELITY,
hasEdits: true,
retries: 0,
}),
turn('claude-sonnet-4-6', 'coding', {
fidelity: AGGREGATE_FIDELITY,
hasEdits: true,
retries: 0,
}),
];
const t = buildCompareTable(turns, { pricing, minSample: 1 });
assert.equal(t.sample.totalTurns, 4);
assert.equal(t.sample.includedTurns, 2);
assert.equal(t.sample.excludedTurns, 2);
assert.equal(t.sample.excludedByClass.partial, 1);
assert.equal(t.sample.excludedByClass['aggregate-only'], 1);
assert.equal(t.cells['claude-sonnet-4-6']!['coding']!.turns, 2);
});

it('includes partial turns when requested but keeps missing-cost fields null', async () => {
const pricing = await loadBuiltinPricing();
const turns: EnrichedTurn[] = [
turn('claude-sonnet-4-6', 'coding', {
fidelity: PARTIAL_FIDELITY,
hasEdits: true,
retries: 0,
usage: {
input: 1000,
output: 0,
reasoning: 0,
cacheRead: 0,
cacheCreate5m: 0,
cacheCreate1h: 0,
},
}),
];
const t = buildCompareTable(turns, {
pricing,
includePartial: true,
minSample: 1,
});
const cell = t.cells['claude-sonnet-4-6']!['coding']!;
assert.equal(t.sample.includedTurns, 1);
assert.equal(cell.turns, 1);
assert.equal(cell.pricedTurns, 0, 'missing output coverage prevents fake $0 cost');
assert.equal(cell.costPerTurn, null);
});

it('honors an explicit fidelity allow-list', async () => {
const pricing = await loadBuiltinPricing();
const turns: EnrichedTurn[] = [
turn('claude-sonnet-4-6', 'coding', { fidelity: FULL_FIDELITY }),
turn('claude-sonnet-4-6', 'coding', { fidelity: USAGE_ONLY_FIDELITY }),
];
const t = buildCompareTable(turns, {
pricing,
fidelity: ['full'],
minSample: 1,
});
assert.deepEqual(t.sample.allowedFidelity, ['full']);
assert.equal(t.sample.includedTurns, 1);
assert.equal(t.sample.excludedByClass['usage-only'], 1);
assert.equal(t.cells['claude-sonnet-4-6']!['coding']!.turns, 1);
});
});
109 changes: 99 additions & 10 deletions packages/analyze/src/compare.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import type { EnrichedTurn } from '@relayburn/ledger';
import type { ActivityCategory } from '@relayburn/reader';
import type { ActivityCategory, Coverage, FidelityClass } from '@relayburn/reader';

import { costForTurn } from './cost.js';
import type { PricingTable } from './pricing.js';
Expand Down Expand Up @@ -36,17 +36,34 @@ export interface CompareTable {
models: string[];
categories: string[];
cells: Record<string, Record<string, CompareCell>>;
totals: Record<string, { turns: number; totalCost: number }>;
totals: Record<string, { turns: number; pricedTurns: number; totalCost: number }>;
minSample: number;
sample: CompareSample;
}

export interface CompareSample {
totalTurns: number;
includedTurns: number;
excludedTurns: number;
allowedFidelity: FidelityClass[];
includeUnknownFidelity: boolean;
unknownFidelityTurns: number;
excludedByClass: Record<FidelityClass, number>;
}

export interface CompareOptions {
pricing: PricingTable;
models?: string[];
minSample?: number;
fidelity?: FidelityClass[];
includePartial?: boolean;
}

export const DEFAULT_MIN_SAMPLE = 5;
export const DEFAULT_COMPARE_FIDELITY: ReadonlyArray<FidelityClass> = [
'full',
'usage-only',
];

interface Accum {
turns: number;
Expand All @@ -62,9 +79,12 @@ interface Accum {
export function buildCompareTable(turns: EnrichedTurn[], opts: CompareOptions): CompareTable {
const minSample = opts.minSample ?? DEFAULT_MIN_SAMPLE;
const modelFilter = opts.models && opts.models.length > 0 ? new Set(opts.models) : null;
const allowedFidelity = normalizeAllowedFidelity(opts);
const allowedSet = new Set<FidelityClass>(allowedFidelity);
const sample = emptySample(allowedFidelity);

const byModelCategory = new Map<string, Map<string, Accum>>();
const modelTotals = new Map<string, { turns: number; totalCost: number }>();
const modelTotals = new Map<string, { turns: number; pricedTurns: number; totalCost: number }>();
const modelSet = new Set<string>();
const categorySet = new Set<string>();

Expand All @@ -75,13 +95,16 @@ export function buildCompareTable(turns: EnrichedTurn[], opts: CompareOptions):
if (modelFilter) {
for (const m of modelFilter) {
modelSet.add(m);
modelTotals.set(m, { turns: 0, totalCost: 0 });
modelTotals.set(m, { turns: 0, pricedTurns: 0, totalCost: 0 });
}
}

for (const t of turns) {
const model = t.model || 'unknown';
if (modelFilter && !modelFilter.has(model)) continue;
sample.totalTurns++;
if (!isTurnIncludedByFidelity(t, allowedSet, sample)) continue;
sample.includedTurns++;
const cat = (t.activity as string | undefined) ?? 'unclassified';
modelSet.add(model);
categorySet.add(cat);
Expand All @@ -97,12 +120,13 @@ export function buildCompareTable(turns: EnrichedTurn[], opts: CompareOptions):
byCat.set(cat, acc);
}
acc.turns++;
const mt = modelTotals.get(model) ?? { turns: 0, totalCost: 0 };
const mt = modelTotals.get(model) ?? { turns: 0, pricedTurns: 0, totalCost: 0 };
mt.turns++;
const c = costForTurn(t, opts.pricing);
const c = hasCostCoverage(t) ? costForTurn(t, opts.pricing) : null;
if (c) {
acc.pricedTurns++;
acc.totalCost += c.total;
mt.pricedTurns++;
mt.totalCost += c.total;
}
modelTotals.set(model, mt);
Expand All @@ -112,10 +136,13 @@ export function buildCompareTable(turns: EnrichedTurn[], opts: CompareOptions):
acc.retriesSamples.push(r);
if (r === 0) acc.oneShotTurns++;
}
acc.cacheRead += t.usage.cacheRead;
acc.tokenDenominator +=
t.usage.input + t.usage.cacheRead + t.usage.cacheCreate5m + t.usage.cacheCreate1h;
if (hasCacheHitCoverage(t)) {
acc.cacheRead += t.usage.cacheRead;
acc.tokenDenominator +=
t.usage.input + t.usage.cacheRead + t.usage.cacheCreate5m + t.usage.cacheCreate1h;
}
}
sample.excludedTurns = sample.totalTurns - sample.includedTurns;

const models = [...modelSet].sort((a, b) => {
const ca = modelTotals.get(a)?.totalCost ?? 0;
Expand Down Expand Up @@ -145,7 +172,69 @@ export function buildCompareTable(turns: EnrichedTurn[], opts: CompareOptions):
const totals: CompareTable['totals'] = {};
for (const [m, v] of modelTotals) totals[m] = v;

return { models, categories, cells, totals, minSample };
return { models, categories, cells, totals, minSample, sample };
}

function normalizeAllowedFidelity(opts: CompareOptions): FidelityClass[] {
const seen = new Set<FidelityClass>();
const out: FidelityClass[] = [];
const requested =
opts.fidelity && opts.fidelity.length > 0
? opts.fidelity
: DEFAULT_COMPARE_FIDELITY;
for (const cls of requested) {
if (!seen.has(cls)) {
seen.add(cls);
out.push(cls);
}
}
if (opts.includePartial && !seen.has('partial')) out.push('partial');
return out;
}

function emptySample(allowedFidelity: FidelityClass[]): CompareSample {
return {
totalTurns: 0,
includedTurns: 0,
excludedTurns: 0,
allowedFidelity,
includeUnknownFidelity: true,
unknownFidelityTurns: 0,
excludedByClass: {
full: 0,
'usage-only': 0,
'aggregate-only': 0,
'cost-only': 0,
partial: 0,
},
};
}

function isTurnIncludedByFidelity(
turn: EnrichedTurn,
allowed: ReadonlySet<FidelityClass>,
sample: CompareSample,
): boolean {
const fidelity = turn.fidelity;
if (!fidelity) {
sample.unknownFidelityTurns++;
return true;
}
if (allowed.has(fidelity.class)) return true;
sample.excludedByClass[fidelity.class]++;
return false;
}

function hasCostCoverage(turn: EnrichedTurn): boolean {
const c = turn.fidelity?.coverage;
if (!c) return true;
return c.hasInputTokens && c.hasOutputTokens;
}

function hasCacheHitCoverage(turn: EnrichedTurn): boolean {
const c = turn.fidelity?.coverage;
if (!c) return true;
return c.hasInputTokens && c.hasCacheReadTokens && c.hasCacheCreateTokens;
}

function toCell(acc: Accum | undefined, minSample: number): CompareCell {
Expand Down
10 changes: 8 additions & 2 deletions packages/analyze/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,14 @@ export { flatten, loadBuiltinPricing, loadPricing } from './pricing.js';
export type { ModelCost, PricingTable, ReasoningMode } from './pricing.js';
export { costForTurn, costForUsage, sumCosts } from './cost.js';
export type { CostBreakdown, CostForUsageOptions } from './cost.js';
export { buildCompareTable, DEFAULT_MIN_SAMPLE } from './compare.js';
export type { CompareCategory, CompareCell, CompareOptions, CompareTable } from './compare.js';
export { buildCompareTable, DEFAULT_COMPARE_FIDELITY, DEFAULT_MIN_SAMPLE } from './compare.js';
export type {
CompareCategory,
CompareCell,
CompareOptions,
CompareSample,
CompareTable,
} from './compare.js';
export {
attributeWaste,
aggregateByFile,
Expand Down
Loading