Derived analytics archive: materialize the ledger into a local queryable store

## Context

Burn's canonical data path today is the append-only JSONL ledger plus ad-hoc query-time folding. That is the right write path, but it is already the wrong read path for the backlog that now exists.

Examples:

- `packages/ledger/src/reader.ts` streams the full ledger, collects all stamps, then folds them onto turns at query time.
- `burn compare` (#38) wants grouped aggregates, coverage-aware filtering, and repeated joins over model / activity / project / workflow.
- `@relayburn/mcp` (#26) wants low-latency self-query from inside a running agent.
- `burn limits` / plans (#5, #39) want repeated spend lookups over rolling windows.
- content-sidecar work (#33) makes the write path richer, but by itself does not make the read path fast or queryable.

Agentsview's main architectural lesson is not its UI. It is that a **local derived archive** unlocks everything else: fast queries, richer joins, durable computed signals, and eventually search.

This issue is for that derived archive.

## Non-goal

Do **not** replace `ledger.jsonl` as the source of truth.

The JSONL ledger should remain the canonical append-only event spine. The archive is a **materialized read model** built from it.

## Why `#4` is not enough

`#4` covers incremental cursors, dedup, and canonical project keys. That improves ingest, but it does not solve the core read-path problem:

- every non-trivial query still has to scan the ledger
- stamps still have to be folded repeatedly
- cross-turn and cross-session joins still happen in memory
- future features will keep re-implementing the same grouping logic

This issue is the missing second half: a stable, local analytics store.

## Proposed shape

Add a derived SQLite database at:

- `~/.relayburn/archive.sqlite`

Built incrementally from the ledger and rebuildable from scratch.

### Canonical principle

- `ledger.jsonl` is the authoritative event log
- `archive.sqlite` is disposable / rebuildable
- if the archive is missing or corrupted, `burn archive rebuild` recreates it from the ledger

## Schema (MVP)

The exact table names can change, but the archive needs at least these logical entities.

### 1. `sessions`

One row per `(source, sessionId)`.

Suggested fields:

- `source`
- `session_id`
- `project`
- `project_key`
- `workflow_id`
- `harness`
- `started_at`
- `ended_at`
- `turn_count`
- `message_count`
- `model_set_json`
- `relationship_type` (root / continuation / fork / subagent once that exists)
- `parent_session_id`
- `has_content`
- fidelity flags from the coverage issue

### 2. `turns`

One row per normalized turn.

Suggested fields:

- `source`
- `session_id`
- `message_id`
- `turn_index`
- `ts`
- `model`
- `activity`
- `stop_reason`
- `input_tokens`
- `output_tokens`
- `reasoning_tokens`
- `cache_read_tokens`
- `cache_create_5m_tokens`
- `cache_create_1h_tokens`
- `cost_input_usd`
- `cost_output_usd`
- `cost_reasoning_usd`
- `cost_cache_read_usd`
- `cost_cache_create_usd`
- `cost_total_usd`
- materialized enrichment columns such as `workflow_id`, `agent_id`, `persona`, `tier`

### 3. `tool_calls`

One row per tool call attached to a turn.

Suggested fields:

- `source`
- `session_id`
- `message_id`
- `call_index`
- `tool_use_id`
- `tool_name`
- `target`
- `args_hash`
- optional normalized category

### 4. `tool_result_events`

Do not block archive work on full content storage, but reserve the table now.

Suggested fields:

- `source`
- `session_id`
- `message_id`
- `tool_use_id`
- `call_index`
- `event_index`
- `status`
- `content_length`
- `content_hash`
- `subagent_session_id`
- `agent_id`
- `event_source` (`tool_result`, `subagent_notification`, etc.)
- `ts`

This table is the bridge to the passive-reader execution-graph issue.

### 5. `archive_state`

Track incremental build progress.

Suggested fields:

- `ledger_offset_bytes`
- `ledger_mtime_ms`
- `archive_version`
- `last_compacted_at`
- `last_rebuild_at`

## Query model

Commands should read from the archive by default once it exists.

Examples:

- `burn summary` → aggregate over `turns`
- `burn compare` (#38) → aggregate over `turns` grouped by `activity`, model, project, workflow
- `burn limits` / `burn plans` (#5, #39) → rolling aggregates over `turns`
- `@relayburn/mcp` (#26) → low-latency lookups against indexed tables
- `burn waste` (#3, #11) → joins between `turns`, `tool_calls`, and later `tool_result_events`

## Build / maintenance commands

Add a small archive command group.

```bash
burn archive build
burn archive rebuild
burn archive status
burn archive vacuum
```

Behavior:

- `build` applies any ledger tail not yet materialized
- `rebuild` drops and recreates the archive from the ledger
- `status` reports schema version, row counts, last sync point, and file sizes
- `vacuum` runs SQLite maintenance if needed

## Incremental update strategy

Prefer append-friendly materialization keyed off the ledger position.

High-level flow:

1. Read archive state
2. Seek the ledger from the last processed byte offset
3. Parse only new lines
4. Upsert session / turn / tool rows
5. Advance archive state only after transaction success

The archive should be safe to rebuild from zero at any time.

## Relationship to `#33`

`#33` proposes a content sidecar. That is compatible with this issue.

Two acceptable architectures:

1. Content stays in sidecar JSONL, and the archive stores only content hashes / lengths / foreign keys
2. Content is optionally copied into archive tables when `content.store=full`

Either is fine. The important thing is that the archive does **not** require raw content to be useful.

## Relationship to coverage / fidelity

The archive should not blur:

- zero tokens
- missing tokens
- aggregate-only sources
- cost-only sources
- fully attributed per-turn sources

This issue should consume the fidelity metadata from the dedicated coverage issue rather than inventing its own ad hoc null semantics.

## Acceptance

- `ledger.jsonl` remains canonical; deleting `archive.sqlite` and running `burn archive rebuild` fully recreates it.
- `burn summary` can execute against the archive without scanning the entire ledger.
- `burn compare` (#38) and `burn plans` (#39) can be implemented as SQL-style grouped queries rather than ad hoc in-memory scans.
- Materialized enrichment columns mean commands no longer have to fold all stamps on every query.
- Archive rebuild is deterministic: rebuilding twice from the same ledger yields the same row counts and primary keys.
- The archive can coexist with no content sidecar, hash-only sidecar, or full sidecar.

## Depends on

- `#4` for stable ingest identity and canonical project keys
- coverage/fidelity issue to define how missing / partial data is represented
- execution-graph issue to define richer relationship and tool-result event rows

## Unblocks

- `#26` MCP self-query without full-ledger scans
- `#38` compare as a real query, not a giant in-memory reducer
- `#39` monthly plan tracking with cheap rolling aggregates
- future search / analytics without committing to a giant application surface

## Priority

**High.** This is the architectural gap between burn as an event collector and burn as a usable local analytics system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derived analytics archive: materialize the ledger into a local queryable store #40

Context

Non-goal

Why `#4` is not enough

Proposed shape

Canonical principle

Schema (MVP)

1. `sessions`

2. `turns`

3. `tool_calls`

4. `tool_result_events`

5. `archive_state`

Query model

Build / maintenance commands

Incremental update strategy

Relationship to `#33`

Relationship to coverage / fidelity

Acceptance

Depends on

Unblocks

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Derived analytics archive: materialize the ledger into a local queryable store #40

Description

Context

Non-goal

Why #4 is not enough

Proposed shape

Canonical principle

Schema (MVP)

1. sessions

2. turns

3. tool_calls

4. tool_result_events

5. archive_state

Query model

Build / maintenance commands

Incremental update strategy

Relationship to #33

Relationship to coverage / fidelity

Acceptance

Depends on

Unblocks

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Why `#4` is not enough

1. `sessions`

2. `turns`

3. `tool_calls`

4. `tool_result_events`

5. `archive_state`

Relationship to `#33`