Skip to content

Migrate burn compare to grouped SQL aggregates over archive #88

@willwashburn

Description

@willwashburn

Context

burn compare today calls queryAll() and then runs buildCompareTable() over the resulting EnrichedTurn[] — a full ledger scan plus an in-memory grouped reduce by (model, activity) for every invocation (packages/cli/src/commands/compare.ts:8,86, packages/analyze/src/compare.ts). The compare table is exactly the kind of grouped aggregate SQL was designed for, and the archive landed in PR #78 already has the right indexes (idx_turns_model, idx_turns_activity, idx_turns_workflow, idx_turns_project_key).

PR #78 explicitly defers this rewire:

Rewiring burn summary / compare / plans / @relayburn/mcp to read from the archive (each command is a self-contained migration that keeps the in-memory fallback intact).

Issue #40 calls this out specifically as well:

burn compare (#38) and burn plans (#39) can be implemented as SQL-style grouped queries rather than ad hoc in-memory scans.

Proposal

  1. Add a compareFromArchive(query, opts) helper to @relayburn/ledger (or @relayburn/analyze) that issues SELECT model, activity, COUNT(*) AS turns, SUM(...) AS tokens, ... FROM turns WHERE ... GROUP BY model, activity.
  2. Cost-per-turn, 1-shot rate, cache-hit-rate, and median retries either materialize as part of the query (where SQLite supports it) or do a tiny per-cell post-process over already-grouped rows. Median retries needs either a stored quantile or a small per-(model, activity) follow-up query.
  3. --min-sample filtering happens in the SQL HAVING clause or as a post-filter on the cell list.
  4. Keep the queryAll() + buildCompareTable() path behind a fallback flag for parity validation.

Tests:

  • Parity test: same fixture, both code paths, same CompareTable (deep-equal via --json) and same text/CSV output.
  • Filter coverage: --workflow, --agent, --project, --session, --since, --models all work through SQL.
  • Empty / single-cell edge cases.

Acceptance criteria

  • burn compare issues a single grouped SQL query for the cell aggregates rather than streaming the full ledger.
  • Text, CSV, and --json outputs are byte-identical to the pre-migration implementation for the parity fixture.
  • All existing flags (--models, --since, --project, --session, --workflow, --agent, --min-sample) continue to work.
  • Fallback flag preserves the old behavior.
  • Performance: on a ledger with >=100k turns, archive-backed burn compare --since 30d is at least 5x faster than the current implementation.

Out of scope

  • Compare-specific schema additions to the archive.
  • Coverage / fidelity columns.
  • Rewiring summary, plans, or MCP tools.

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions