Skip to content

Coverage and fidelity metadata: distinguish missing, zero, aggregate-only, and partial usage #41

@willwashburn

Description

@willwashburn

Context

Burn's collector backlog now spans multiple fidelity levels:

  • full per-turn token data with tool calls
  • per-message or per-turn usage without tool-result detail
  • per-session aggregate usage (#29 Mux)
  • cost-only sources (#30 Crush)
  • readers with no cache fields
  • readers with reasoning but no cache creation
  • passive sources where subagent / tool-result chronology is lossy

Today most downstream commands implicitly treat absent fields as zero or just ignore the distinction.

That is becoming actively dangerous.

Examples:

Agentsview's useful lesson here is simple: coverage must be explicit. There has to be a machine-readable difference between 0, unknown, and not representable at this source granularity.

Goal

Add first-class coverage / fidelity metadata so every command can answer two questions honestly:

  1. What data do we actually have?
  2. Is this command's output supported at this fidelity level?

Proposed schema additions

The exact naming can change, but the contract needs these concepts.

A. Granularity

Add an explicit granularity enum.

type UsageGranularity =
  | 'per-turn'
  | 'per-message'
  | 'per-session-aggregate'
  | 'cost-only';

This should live on the normalized record or session metadata so downstream consumers can gate behavior.

B. Coverage flags

Add explicit booleans for what is known vs not known.

Suggested shape:

interface Coverage {
  hasInputTokens: boolean;
  hasOutputTokens: boolean;
  hasReasoningTokens: boolean;
  hasCacheReadTokens: boolean;
  hasCacheCreateTokens: boolean;
  hasToolCalls: boolean;
  hasToolResultEvents: boolean;
  hasSessionRelationships: boolean;
  hasRawContent: boolean;
}

Important: these flags are about availability, not value.

hasOutputTokens: false means "we do not know output tokens," not "0 output tokens".

C. Fidelity classification

For convenience, add a higher-level normalized classification.

type FidelityClass =
  | 'full'
  | 'usage-only'
  | 'aggregate-only'
  | 'cost-only'
  | 'partial';

This is a summary derived from granularity + coverage, not a replacement for them.

Command behavior contract

burn summary

  • Always allowed
  • Must surface excluded / partial rows clearly
  • Must distinguish / unknown from numeric zero
  • JSON output must include coverage counts so programmatic callers know how much data was partial

burn compare (#38)

Default behavior should be conservative:

  • exclude aggregate-only and cost-only sources from category comparison unless explicitly overridden
  • exclude turns missing the fields required for a given metric
  • show sample size and excluded-turn counts
  • never print $0.00 or 0% for "no data"

Suggested flags:

burn compare --include-partial
burn compare --fidelity full,usage-only

burn waste (#3, #11)

  • should hard-fail or no-op with a clear message on aggregate-only / cost-only sources
  • should state exactly which prerequisites are missing: tool-result events, content lengths, or session relationships

burn limits / burn plans (#5, #39)

  • should permit partial usage data where enough exists for spend totals
  • should mark projections as low-confidence when the underlying fidelity is partial

Why this should not stay per-collector

Several collector issues already mention ad hoc fields like granularity. That is necessary but not sufficient.

If each collector invents its own partial-data semantics, every downstream command has to learn every collector.

That is backwards.

The reader layer should normalize fidelity once. Commands should consume the normalized contract.

Suggested implementation order

  1. Define the shared types in packages/reader/src/types.ts
  2. Populate them in existing high-value readers first:
    • Claude
      n - Codex
    • OpenCode
  3. Update known low-fidelity collector issue templates / implementations to use the same fields
  4. Make summary, compare, and waste read and honor the metadata
  5. Add JSON-output tests proving unknown is not rendered as zero

Acceptance

  • Normalized records carry explicit granularity and coverage metadata.
  • A collector with no output-token data does not silently produce output = 0; it produces hasOutputTokens = false.
  • burn compare defaults to excluding unsupported fidelity classes and reports that exclusion clearly.
  • burn waste refuses unsupported sources with a clear, source-specific explanation.
  • Mux-style and Crush-style collectors can use the same shared fields instead of one-off notes in issue bodies.
  • JSON output across commands preserves the distinction between zero and unknown.

Depends on

  • none strictly, but should land before or alongside commands that aggregate across heterogeneous sources

Unblocks

  • #38 compare without misleading tables
  • #39 plans with honest confidence notes
  • archive work / derived analytics issue, which needs stable null semantics
  • future search / ranking commands that should know whether they are ranking complete or partial evidence

Priority

High. The collector backlog is already heterogeneous enough that this is now a correctness issue, not just a nice-to-have.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions