Skip to content

Derived analytics archive foundation: archive.sqlite + burn archive (#40)#78

Merged
willwashburn merged 3 commits intomainfrom
feat/analytics-archive-40
Apr 26, 2026
Merged

Derived analytics archive foundation: archive.sqlite + burn archive (#40)#78
willwashburn merged 3 commits intomainfrom
feat/analytics-archive-40

Conversation

@willwashburn
Copy link
Copy Markdown
Member

@willwashburn willwashburn commented Apr 25, 2026

Refs #40.

Summary

Lands the foundation for the derived analytics archive: a rebuildable SQLite read model at ~/.relayburn/archive.sqlite, materialized from the canonical ledger.jsonl. This is the architectural gap between burn as an event collector and burn as a usable local analytics system — every non-trivial query today scans the full ledger and re-folds all stamps in memory.

Scoped intentionally to the foundation: schema + build pipeline + CLI surface. Rewiring read commands onto SQL queries lands in follow-up PRs so each rewire stays small and reviewable.

What landed

  • @relayburn/ledger — new archive.ts module exporting buildArchive(), rebuildArchive(), getArchiveStatus(), openArchive(), archivePath(), ARCHIVE_VERSION. Schema:
    • sessions — one row per (source, session_id), derived from turns.
    • turns — one row per ingested TurnRecord, with stamps folded into materialized columns (workflow_id, agent_id, persona, tier) plus a JSON blob for arbitrary keys.
    • tool_calls — one row per ToolCall attached to a turn.
    • compactions — one row per ingested CompactionEvent.
    • tool_result_events — table reserved (created, not populated) for the future content-sidecar bridge (Design: content sidecar store with retention and opt-out #33) and execution-graph work.
    • archive_state — incremental cursor (ledger_offset_bytes), schema version, last-built timestamps.
  • Build is incremental: keyed off byte offset, idempotent, and detects ledger truncation (e.g. burn rebuild --reclassify rewrites the file) by falling back to a clean rebuild.
  • @relayburn/cliburn archive build | rebuild | status, all with --json.
  • Backed by node:sqlite (Node 22 built-in) — no native build step, no extra runtime dep. The ExperimentalWarning is suppressed at the archive boundary so it doesn't pollute every CLI invocation.

What's deferred (separate PRs)

  • Rewiring burn summary / compare / plans / @relayburn/mcp to read from the archive (each command is a self-contained migration that keeps the in-memory fallback intact).
  • Populating tool_result_events from the content sidecar (depends on Design: content sidecar store with retention and opt-out #33 landing the richer write path).
  • A stamp-table approach for very large ledgers (current implementation re-collects every stamp on each build; fine at current scale, obvious next step when it isn't).
  • burn archive vacuum (one-liner; trivial to add when needed).
  • Coverage / fidelity columns from the (still-open) coverage issue — the schema is deliberately additive so they slot in without a rebuild requirement.

Acceptance against the issue body

  • ledger.jsonl remains canonical; deleting archive.sqlite and running burn archive rebuild recreates it. (covered by tests)
  • Materialized enrichment columns mean stamps are folded once at build time, not on every query.
  • Rebuild is deterministic: rebuilding twice from the same ledger yields the same row counts and primary keys. (covered by tests)
  • Archive coexists with no content sidecar / hash-only / full sidecar (no dependency on content tables — the bridge table is reserved but unpopulated).
  • burn summary executes against the archive (deferred to follow-up).
  • burn compare / burn plans as SQL-style grouped queries (deferred to follow-ups).

Broader plan

This PR is step 1 of ~5 to fully resolve #40:

  1. Foundation (this PR): schema, build pipeline, CLI.
  2. Rewire burn summary to read from the archive (with fallback flag).
  3. Rewire burn compare and burn plans.
  4. Wire @relayburn/mcp tools to the archive for low-latency self-query.
  5. Populate tool_result_events once the content-sidecar bridge (Design: content sidecar store with retention and opt-out #33) lands.

Test plan

  • pnpm install && pnpm run test:ts — 355 tests pass (350 baseline + 11 archive unit + 5 CLI -1 dup count). All green.
  • Manual smoke: burn archive build / status / --json against an empty home directory, plus an end-to-end run that appends turns + a stamp and verifies the SQL contents.
  • Schema-version mismatch path: tampering with archive_state.archive_version triggers a clean rebuild on next open.
  • Truncation path: deleting and shrinking the ledger triggers rebuild-from-zero.

🤖 Generated with Claude Code


Open in Devin Review

…ive` (#40)

Lands the rebuildable SQLite read model alongside the canonical ledger.jsonl
so future commands no longer have to scan the whole ledger and refold all
stamps on every query.

This is the foundation PR — schema, build pipeline, and CLI surface only.
Rewiring `burn summary` / `compare` / `plans` and the MCP server onto SQL
queries lands in follow-ups so each rewire stays small and reviewable.

Lands:
- `@relayburn/ledger`: `buildArchive()`, `rebuildArchive()`, `getArchiveStatus()`,
  `openArchive()`, `archivePath()`, `ARCHIVE_VERSION`. Schema covers `sessions`,
  `turns`, `tool_calls`, `compactions`, plus a reserved `tool_result_events`
  table for the future #33 content-sidecar bridge. Stamps are folded into
  materialized columns (`workflow_id`, `agent_id`, `persona`, `tier`) plus a
  JSON blob. Build is incremental keyed off `archive_state.ledger_offset_bytes`;
  rebuild-from-zero is deterministic.
- `@relayburn/cli`: `burn archive build | rebuild | status` with `--json`.
- Backed by `node:sqlite` so no native build step. The experimental warning is
  suppressed at the archive boundary so it doesn't pollute every CLI invocation.

Acceptance against the issue:
- ledger.jsonl remains canonical; deleting `archive.sqlite` and running
  `burn archive rebuild` recreates it.
- Materialized enrichment columns mean stamps are folded once at build time,
  not on every query.
- Rebuild is deterministic: same ledger -> same row counts and primary keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…der (Devin review on #78)

Three Devin findings on the foundation archive PR:

1. (BUG_0001) `buildArchiveLocked` was stamping `ledgerStat.size` as the
   ledger cursor outside the transaction, overwriting the safe newline
   boundary committed by `applyLedgerRange`. If the ledger had a partial
   trailing line (writer interrupted mid-write), the cursor would advance
   past the incomplete bytes; the next build would read from mid-line and
   silently skip the completed turn once it landed. Plumb `safeOffset`
   through `ApplyResult` so the caller writes the parser's actual newline
   boundary as the cursor; remove the redundant in-transaction UPDATE.

2. (BUG_0002) `last_rebuild_at` was schema'd but never written. Add an
   internal `{isRebuild}` option to `buildArchiveLocked` so `rebuildArchive`
   stamps both `last_built_at` and `last_rebuild_at`; `buildArchive` keeps
   updating only `last_built_at`. `burn archive status` now shows the
   "last rebuild" line after `burn archive rebuild`.

3. (BUG_0003) The `## [0.11.0] - 2026-04-25` header was deleted from
   `packages/ledger/CHANGELOG.md` so the published Plans entry was bleeding
   back into `[Unreleased]` (and we had two `### Added` subsections in a
   row). Restore the version header; the new archive entry stays under
   `[Unreleased]` where it belongs until release.

Tests: three new cases in `archive.test.ts`:
- `rebuildArchive populates both last_built_at and last_rebuild_at`
- `buildArchive only updates lastBuiltAt, not lastRebuildAt`
- `partial trailing line: ledger cursor advances only past complete lines`

All 358 tests pass (`pnpm run test:ts`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-40

# Conflicts:
#	packages/cli/src/cli.ts
#	packages/ledger/CHANGELOG.md
@willwashburn willwashburn merged commit 5a057a2 into main Apr 26, 2026
@willwashburn willwashburn deleted the feat/analytics-archive-40 branch April 26, 2026 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Derived analytics archive: materialize the ledger into a local queryable store

1 participant