Skip to content

Per-tool-call cost attribution: use UserTurnRecord byteLen for proportional allocation in by-tool and waste #102

@willwashburn

Description

@willwashburn

Context

The original goal of #2 — and the reason PR #74 landed UserTurnRecord at all — is per-tool-call cost attribution. From the PR body:

This is the prerequisite the issue calls out: Anthropic only reports usage at message granularity, but the per-tool-call delta is recoverable as context(N+1) − cacheRead(N+1) ≈ output(N) + sum(user-turn block tokens). Per-tool-call cost attribution (e.g. burn waste) can build on this.

Today the consumers do not use this signal:

  • burn by-tool (packages/cli/src/commands/by-tool.ts:71-103) splits the prior turn's (input + cacheRead + cacheCreate) cost evenly across that turn's toolCalls. Comment at line 96: const share = ingestCost / prior.toolCalls.length;. There's no per-result sizing — every Bash output is assumed to cost the same as a 50KB Read.
  • burn waste (packages/analyze/src/waste.ts:172-203) has both modes already: a haveAnySizes proportional path (line 172) and an else even-split fallback (line 187, footer note in Promote 'even-split' note to a prominent warning when it dominates #60). Sizes today come from sizeByToolUseId (a content-sidecar derived map) — when sidecars are absent or hash-only, the entire ledger falls back to even-split (currently 99.7% of sessions per Promote 'even-split' note to a prominent warning when it dominates #60's data).

UserTurnRecord.blocks[].byteLen is a stable, ledger-resident size signal that is independent of content sidecar availability — once #94 persists it, every tool_result has a usable size even when content capture is hash-only or off.

Proposal

Once #94 lands user-turn persistence and the parser issues (#81 / #86) extend coverage to Codex / OpenCode, switch both consumers to prefer UserTurnRecord.blocks as the sizing source:

  1. burn by-tool (packages/cli/src/commands/by-tool.ts):

    • Look up the user turn whose precedingMessageId === prior.messageId and followingMessageId === turn.messageId.
    • Build a Map<toolUseId, byteLen> from its blocks.
    • When a size map is available for the prior turn, allocate ingestCost proportionally to byteLen per toolCall.id instead of evenly. Cap the sized-tool subtotal at ingestCost so a misaccounted free-text block can't over-attribute.
    • Fall back to the existing even-split path for turns that have no matching user-turn record.
    • Add an attributionMethod: 'sized' | 'even-split' | 'unattributed' per row (or per session) to JSON output for transparency, mirroring the pattern burn waste already uses.
  2. burn waste (packages/analyze/src/waste.ts):

    • Add a second source for sizeByToolUseId: alongside the content-sidecar derivation, query the ledger's user turns and merge. Prefer UserTurnRecord.blocks when present — it's exact byte counts; the sidecar path estimates from stored content which may be truncated or hash-only.
    • This should automatically lift the even-split fraction reported in Promote 'even-split' note to a prominent warning when it dominates #60 to a much smaller minority once user turns are persisted across the full ledger.
  3. burn rebuild: ensure rebuild paths re-derive user turns from source session files so historical sessions can be backfilled without losing sized attribution. (See Ledger: persist UserTurnRecord and forward through ingest #94 for the persistence story; rebuild is the migration path.)

Acceptance criteria

  • burn by-tool attributes proportionally to UserTurnRecord.blocks[].byteLen when a matching user turn exists.
  • When no matching user turn exists, the existing even-split path still runs and is reported.
  • JSON output exposes attributionMethod per row/session.
  • burn waste haveAnySizes branch fires for sessions where user turns are persisted but content sidecar is not — no regression for sessions that currently have sidecars.
  • Tests using fixtures that cover: small/large/errored Bash, multi-tool turns where the largest block dominates cost, sessions with no user turns persisted (even-split fallback), mixed sessions.
  • Reconciliation invariant test: sum of attributed costs across a session's tool calls ≤ session grand total within float tolerance.

Out of scope

  • Tokenizer accuracy upgrade beyond bytes/4 — separate follow-up.
  • Changes to the --json schema beyond adding attributionMethod.
  • Subagent / file / bash detector outputs in burn waste — sized inputs naturally improve those without further code changes.
  • Reattribution of synthetic providers (Reattribution layer: Synthetic provider detection #31).

Depends on

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions