You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Burn's canonical data path today is the append-only JSONL ledger plus ad-hoc query-time folding. That is the right write path, but it is already the wrong read path for the backlog that now exists.
Examples:
packages/ledger/src/reader.ts streams the full ledger, collects all stamps, then folds them onto turns at query time.
Agentsview's main architectural lesson is not its UI. It is that a local derived archive unlocks everything else: fast queries, richer joins, durable computed signals, and eventually search.
This issue is for that derived archive.
Non-goal
Do not replace ledger.jsonl as the source of truth.
The JSONL ledger should remain the canonical append-only event spine. The archive is a materialized read model built from it.
Why #4 is not enough
#4 covers incremental cursors, dedup, and canonical project keys. That improves ingest, but it does not solve the core read-path problem:
every non-trivial query still has to scan the ledger
stamps still have to be folded repeatedly
cross-turn and cross-session joins still happen in memory
future features will keep re-implementing the same grouping logic
This issue is the missing second half: a stable, local analytics store.
Proposed shape
Add a derived SQLite database at:
~/.relayburn/archive.sqlite
Built incrementally from the ledger and rebuildable from scratch.
Canonical principle
ledger.jsonl is the authoritative event log
archive.sqlite is disposable / rebuildable
if the archive is missing or corrupted, burn archive rebuild recreates it from the ledger
Schema (MVP)
The exact table names can change, but the archive needs at least these logical entities.
1. sessions
One row per (source, sessionId).
Suggested fields:
source
session_id
project
project_key
workflow_id
harness
started_at
ended_at
turn_count
message_count
model_set_json
relationship_type (root / continuation / fork / subagent once that exists)
parent_session_id
has_content
fidelity flags from the coverage issue
2. turns
One row per normalized turn.
Suggested fields:
source
session_id
message_id
turn_index
ts
model
activity
stop_reason
input_tokens
output_tokens
reasoning_tokens
cache_read_tokens
cache_create_5m_tokens
cache_create_1h_tokens
cost_input_usd
cost_output_usd
cost_reasoning_usd
cost_cache_read_usd
cost_cache_create_usd
cost_total_usd
materialized enrichment columns such as workflow_id, agent_id, persona, tier
3. tool_calls
One row per tool call attached to a turn.
Suggested fields:
source
session_id
message_id
call_index
tool_use_id
tool_name
target
args_hash
optional normalized category
4. tool_result_events
Do not block archive work on full content storage, but reserve the table now.
Context
Burn's canonical data path today is the append-only JSONL ledger plus ad-hoc query-time folding. That is the right write path, but it is already the wrong read path for the backlog that now exists.
Examples:
packages/ledger/src/reader.tsstreams the full ledger, collects all stamps, then folds them onto turns at query time.burn compare(burn compare: model comparison by observed activity category #38) wants grouped aggregates, coverage-aware filtering, and repeated joins over model / activity / project / workflow.@relayburn/mcp(@relayburn/mcp: MCP server exposing ledger data for in-session self-query #26) wants low-latency self-query from inside a running agent.burn limits/ plans (burn limits: quota-window tracking across providers #5, Plan-based monthly quota tracking (complement to #5 block forecasting) #39) want repeated spend lookups over rolling windows.Agentsview's main architectural lesson is not its UI. It is that a local derived archive unlocks everything else: fast queries, richer joins, durable computed signals, and eventually search.
This issue is for that derived archive.
Non-goal
Do not replace
ledger.jsonlas the source of truth.The JSONL ledger should remain the canonical append-only event spine. The archive is a materialized read model built from it.
Why
#4is not enough#4covers incremental cursors, dedup, and canonical project keys. That improves ingest, but it does not solve the core read-path problem:This issue is the missing second half: a stable, local analytics store.
Proposed shape
Add a derived SQLite database at:
~/.relayburn/archive.sqliteBuilt incrementally from the ledger and rebuildable from scratch.
Canonical principle
ledger.jsonlis the authoritative event logarchive.sqliteis disposable / rebuildableburn archive rebuildrecreates it from the ledgerSchema (MVP)
The exact table names can change, but the archive needs at least these logical entities.
1.
sessionsOne row per
(source, sessionId).Suggested fields:
sourcesession_idprojectproject_keyworkflow_idharnessstarted_atended_atturn_countmessage_countmodel_set_jsonrelationship_type(root / continuation / fork / subagent once that exists)parent_session_idhas_content2.
turnsOne row per normalized turn.
Suggested fields:
sourcesession_idmessage_idturn_indextsmodelactivitystop_reasoninput_tokensoutput_tokensreasoning_tokenscache_read_tokenscache_create_5m_tokenscache_create_1h_tokenscost_input_usdcost_output_usdcost_reasoning_usdcost_cache_read_usdcost_cache_create_usdcost_total_usdworkflow_id,agent_id,persona,tier3.
tool_callsOne row per tool call attached to a turn.
Suggested fields:
sourcesession_idmessage_idcall_indextool_use_idtool_nametargetargs_hash4.
tool_result_eventsDo not block archive work on full content storage, but reserve the table now.
Suggested fields:
sourcesession_idmessage_idtool_use_idcall_indexevent_indexstatuscontent_lengthcontent_hashsubagent_session_idagent_idevent_source(tool_result,subagent_notification, etc.)tsThis table is the bridge to the passive-reader execution-graph issue.
5.
archive_stateTrack incremental build progress.
Suggested fields:
ledger_offset_bytesledger_mtime_msarchive_versionlast_compacted_atlast_rebuild_atQuery model
Commands should read from the archive by default once it exists.
Examples:
burn summary→ aggregate overturnsburn compare(burn compare: model comparison by observed activity category #38) → aggregate overturnsgrouped byactivity, model, project, workflowburn limits/burn plans(burn limits: quota-window tracking across providers #5, Plan-based monthly quota tracking (complement to #5 block forecasting) #39) → rolling aggregates overturns@relayburn/mcp(@relayburn/mcp: MCP server exposing ledger data for in-session self-query #26) → low-latency lookups against indexed tablesburn waste(burn waste: per-tool-call and per-file spend attribution #3, Waste-pattern detection: retry loops, consecutive failures, compaction loss, edit-revert #11) → joins betweenturns,tool_calls, and latertool_result_eventsBuild / maintenance commands
Add a small archive command group.
Behavior:
buildapplies any ledger tail not yet materializedrebuilddrops and recreates the archive from the ledgerstatusreports schema version, row counts, last sync point, and file sizesvacuumruns SQLite maintenance if neededIncremental update strategy
Prefer append-friendly materialization keyed off the ledger position.
High-level flow:
The archive should be safe to rebuild from zero at any time.
Relationship to
#33#33proposes a content sidecar. That is compatible with this issue.Two acceptable architectures:
content.store=fullEither is fine. The important thing is that the archive does not require raw content to be useful.
Relationship to coverage / fidelity
The archive should not blur:
This issue should consume the fidelity metadata from the dedicated coverage issue rather than inventing its own ad hoc null semantics.
Acceptance
ledger.jsonlremains canonical; deletingarchive.sqliteand runningburn archive rebuildfully recreates it.burn summarycan execute against the archive without scanning the entire ledger.burn compare(burn compare: model comparison by observed activity category #38) andburn plans(Plan-based monthly quota tracking (complement to #5 block forecasting) #39) can be implemented as SQL-style grouped queries rather than ad hoc in-memory scans.Depends on
#4for stable ingest identity and canonical project keysUnblocks
#26MCP self-query without full-ledger scans#38compare as a real query, not a giant in-memory reducer#39monthly plan tracking with cheap rolling aggregatesPriority
High. This is the architectural gap between burn as an event collector and burn as a usable local analytics system.