Skip to content

Epic: rewrite burn in Rust (v2 — supersedes #222) #240

@willwashburn

Description

@willwashburn

Context

Two converging pressures push burn toward a Rust rewrite:

  1. Ingest at scale is too slow. A real user ledger at 1.5GB takes ~60s to do a full archive rebuild today. The hot loop (packages/ledger/src/archive.ts:1237-1266) is createReadStream → JSON.parse per line, single-threaded; a full burn state rebuild all re-streams the entire ledger twice (archive.ts:1237 and index-sidecar.ts:188). Node's parser ceiling is ~50-100 MB/s on JSON-heavy lines, so the wall-clock cost grows linearly with ledger size and there's no incremental fix that closes the gap.
  2. Wash (AgentWorkforce/wash) is going Rust. The Rust SDK is the right embedding surface for it; a relayburn-sdk crate on crates.io is what wash adds to Cargo.toml, plus a thin napi-rs wrapper published as @relayburn/sdk on npm so existing TS/Node consumers keep working without spawning the CLI.

Rust is the source of truth; the npm package is a generated binding layer over it.

This issue supersedes #222 (which was filed before #226 extracted @relayburn/ingest, before #232 made @relayburn/sdk the source of truth, and before #236 lifted overhead/overheadTrim into the SDK). It also supersedes #218, which was a TS-native SDK proposal already closed once @relayburn/sdk shipped.

Goal

Port the eight-package TS workspace to a Rust Cargo workspace that:

  • Cuts burn state rebuild archive --full on a 1.5GB ledger from ~60s to ~5-10s (SQLite-bound, not parser-bound).
  • Ships a single static burn binary per (os, arch) triple.
  • Publishes relayburn-sdk to crates.io as the supported embedding surface for Rust consumers (wash).
  • Publishes @relayburn/sdk to npm as a napi-rs wrapper over relayburn-sdk for TS/Node consumers — first-class deliverable, day one. Replaces the legacy TS SDK in-place at major version 2.0.0.
  • Drops the burn cold-start from ~80ms (Node) to ~5ms.
  • Drops burn ingest --watch RSS from ~80MB to ~10MB. (Watch loop stays poll-based at port time; FSEvents/notify migration is Phase 2 to avoid coupling FS-event behavior changes to the port.)

Capability parity, not new features. No burn command added or removed during the port.

Naming constraint

The crate name burn is already taken on crates.io, so all Rust crate names are prefixed relayburn-*. The user-facing binary name stays burn via Cargo's [[bin]] name = "burn" rename — invocation is unchanged.

# crates/relayburn-cli/Cargo.toml
[package]
name = "relayburn-cli"
version = "1.0.0"

[[bin]]
name = "burn"
path = "src/main.rs"

What gets published vs how the workspace is organized

Distinguishing these matters: the workspace can have many internal crates without burdening external consumers with many packages.

Published artifacts

Artifact Count Channel
burn binary 1 per (os, arch) GitHub Releases (darwin-arm64, darwin-x64, linux-arm64, linux-x64, optionally win32-x64)
relayburn-sdk crate 1 crates.io — embedding API; consumed by wash
relayburn-cli crate 1 crates.io — cargo install relayburn-cli fallback for Rust-toolchain users; produces the burn bin
@relayburn/sdk (umbrella) 1 npm — selects the right native package via optionalDependencies
@relayburn/sdk-<os>-<arch> 1 per (os, arch) npm — each contains a prebuilt .node
relayburn (install shim) 1 npm — postinstall downloads the right burn binary

2 published Rust crates, 1 binary per arch, plus the npm artifacts. That's it.

Internal Cargo workspace (not published)

crates/
  relayburn-reader      # parsers + classifier (was @relayburn/reader)
  relayburn-ledger      # JSONL append + content sidecar + lock + sqlite archive
  relayburn-analyze     # pricing, cost derivation, hotspots, overhead computation
  relayburn-ingest      # session discovery + parse-and-append + pending stamps + watch loop
  relayburn-sdk         # PUBLISHED — re-exports the public surface of the lower crates
  relayburn-cli         # PUBLISHED — binary crate; `[[bin]] name = "burn"` produces the executable
  relayburn-sdk-node    # napi-rs bindings crate (NOT published to crates.io; built in CI to produce .node artifacts)

packages/
  sdk-node              # npm `@relayburn/sdk`: prebuilt .node binaries + generated .d.ts + thin TS facade

Build order: relayburn-reader → -ledger → -analyze → -ingest → -sdk → -cli, with relayburn-sdk-node depending on relayburn-sdk. Mirrors the pnpm graph for the modules that survive.

MCP becomes a burn mcp-server subcommand, not a separate binary or crate. This collapses the standalone-vs-shell debate from #210 — both are the same binary either way. The MCP tool definitions live as a module inside relayburn-cli.

Rationale for option-2 (internal modular crates) over a single mega-crate: the TS workspace already gave evidence that the seams matter — @relayburn/ingest was extracted (#226) because watch-loop + pending-stamp coordination is independently load-bearing. Internal crates preserve that signal, give faster incremental builds, and enforce module boundaries via pub(crate). Cost is a handful of extra Cargo.toml files no external consumer ever sees.

Crate-by-crate translation

TS today Rust target Notes
zod schemas in reader/types.ts serde + serde_json with #[derive(Deserialize)] TurnRecord, ContentRecord, LedgerLine become tagged enums; v schema field stays.
claude.ts, codex.ts, opencode.ts + opencode-stream.ts streaming serde_json::Deserializer::from_reader().into_iter() per line Bulk of the rewrite. Use existing *.test.ts fixtures (8 in reader, 17 in analyze = 25 total) as Rust acceptance tests.
classifier.ts rule tables phf static maps + match arms Keep rule tables flat & data-driven so adding a harness stays a one-PR change.
ledger/file-adapter.ts + ledger/lock.ts + ledger/adapters/file-lock.ts:89-138 Replicate the lock protocol exactly, do not switch to flock(2) The TS implementation is exclusive-file-creation (open(lp, 'wx')), with a two-phase retry (50×20ms then 40×250ms), orphan recovery (unlink lockfiles older than 5s when held lock times out), and AsyncLocalStorage-based re-entrancy on lock name. flock(2) has different cross-process semantics on macOS/Windows and would silently change behavior. Lock names in use: archive, ledger-index, ledger, test-serialize.
sqlite archive (archive.ts:700-914) rusqlite TS already wraps the entire tail in one BEGIN/COMMIT with prepared statements per row. Real perf wins for the Rust port: (a) multi-row VALUES (...), (...) inserts to amortize statement overhead, (b) WAL + synchronous=NORMAL for the rebuild path, (c) prepared-statement reuse across tables instead of per-table .run() loops.
mcp/server.ts + mcp/tools/session-cost.ts rmcp-backed module inside relayburn-cli exposed as burn mcp-server MCP is genuinely a thin wrapper over SDK as of #232 (only sessionCost import). Resolves #210 by collapsing the standalone-server question — there's only one binary.
cli clap v4 derive in relayburn-cli Match existing flag surface byte-for-byte. [[bin]] name = "burn" keeps invocation unchanged.
harnesses/*.ts + harnesses/pending-stamp.ts trait HarnessAdapter with async fn plan/before_spawn/after_exit One file per harness, lazy-registered in a phf table. The createPendingStampAdapter factory shape (codex/opencode reuse) maps to a pending_stamp::adapter constructor in Rust.
watch-loop polling (packages/ingest/src/watch-loop.ts) tokio::time::interval Stays poll-based on day one. notify migration is Phase 2 — it has real behavior differences (event coalescing, no missed-events guarantee on macOS) and shouldn't ride along on the port.

SDK shape — Rust (relayburn-sdk on crates.io)

// crates/relayburn-sdk/src/lib.rs
pub use relayburn_reader::{TurnRecord, ContentRecord, ActivityCategory, Harness};
pub use relayburn_ledger::{Ledger, LedgerHandle};
pub use relayburn_analyze::{
    Summary, HotspotFinding, Pattern,
    SessionCostResult,
    OverheadResult, OverheadSection, OverheadFileSummary, OverheadPerFileEntry,
    OverheadAttributionDetail, OverheadSectionCost,
    OverheadTrimResult, OverheadTrimRecommendation,
    OverheadFileKind,   // "claude-md" | "agents-md"
    OverheadHarness,    // "claude-code" | "codex" | "opencode"
};

// Ledger handle (long-lived)
pub struct LedgerOpenOptions { pub home: Option<PathBuf> }
impl Ledger {
    pub fn open(opts: LedgerOpenOptions) -> Result<LedgerHandle>;
}
impl LedgerHandle {
    pub fn summary(&self, q: SummaryOptions) -> Result<Summary>;
    pub fn session_cost(&self, q: SessionCostOptions) -> Result<SessionCostResult>;
    pub fn overhead(&self, q: OverheadOptions) -> Result<OverheadResult>;
    pub fn overhead_trim(&self, q: OverheadTrimOptions) -> Result<OverheadTrimResult>;
    pub fn hotspots(&self, q: HotspotsOptions) -> Result<Vec<HotspotFinding>>;
}

// One-shot free functions (CLI-style; opens + closes a handle)
pub async fn ingest(opts: IngestOptions) -> Result<IngestReport>;
pub async fn summary(q: SummaryOptions) -> Result<Summary>;
pub async fn session_cost(q: SessionCostOptions) -> Result<SessionCostResult>;
pub async fn overhead(q: OverheadOptions) -> Result<OverheadResult>;
pub async fn overhead_trim(q: OverheadTrimOptions) -> Result<OverheadTrimResult>;
pub async fn hotspots(q: HotspotsOptions) -> Result<Vec<HotspotFinding>>;

Two design knobs:

  • Async boundary: ingest and the watch-loop are async (tokio); summary / session_cost / overhead / overhead_trim / hotspots are sync — they're CPU-bound queries against an open handle. Wash's MCP handlers wrap them in spawn_blocking.
  • Handle vs free fn: offer both, mirroring today's TS SDK. Free fn for one-shot use; LedgerHandle::summary for embedded paths (wash's MCP server keeps a long-lived handle).

Surface mirrors packages/sdk/index.d.ts — six verbs (ingest, summary, sessionCost, overhead, overheadTrim, hotspots) and the full overhead attribution type tree (OverheadAttributionDetailsectionCosts: OverheadSectionCost[], OverheadPerFileEntryattribution: OverheadAttributionDetail, etc.).

SDK shape — TypeScript (@relayburn/sdk on npm, day-1 deliverable)

The relayburn-sdk-node crate (built in CI, never published to crates.io) exposes the same surface via napi-rs. Generated .d.ts matches today's packages/sdk/index.d.ts byte-for-byte modulo bigint for u64 token counts.

// crates/relayburn-sdk-node/src/lib.rs
use napi_derive::napi;

#[napi]
pub struct Ledger { inner: relayburn_sdk::LedgerHandle }

#[napi]
impl Ledger {
    #[napi(factory)]
    pub fn open(opts: Option<LedgerOptions>) -> napi::Result<Ledger> { /* … */ }

    #[napi]
    pub fn summary(&self, q: SummaryQuery) -> napi::Result<Summary> { /* … */ }
}

#[napi]
pub async fn ingest(opts: IngestOptions) -> napi::Result<IngestReport> { /* … */ }

Binding rules:

  • Errors: Result<T, E> → throws on the JS side; E becomes a typed BurnError with code and cause.
  • Numbers: u64 token counts → bigint in TS; cost (f64 USD) → number. Document the bigint boundary in the README.
  • Async: Rust async fnPromise<T> automatically via napi's tokio runtime integration.
  • Codegen: .d.ts is generated by napi-rs; never hand-edited. The TS package is index.js (loader) + index.d.ts (generated) + prebuilt .node binaries. Single direction: Rust → TS, never the reverse.

Conformance gate: a TS test imports the current TS @relayburn/sdk and the napi-rs @relayburn/sdk at the same version against the same fixture ledger and asserts deep-equal results.

Distribution

  • burn binary: GitHub Releases with prebuilt static binaries for darwin-{arm64,x64}, linux-{arm64,x64}, optionally windows-x64. cargo install relayburn-cli as a fallback for users with Rust toolchains.
  • relayburn-sdk: crates.io. Wash adds relayburn-sdk = "1" to Cargo.toml and statically links it.
  • relayburn-cli: crates.io. The cargo install path; produces the burn binary.
  • @relayburn/sdk (npm): prebuilt .node binaries via napi-rs's standard CI recipe — @relayburn/sdk-darwin-arm64, -darwin-x64, -linux-arm64-gnu, -linux-x64-gnu, optionally -win32-x64-msvc. The umbrella @relayburn/sdk package picks the right native package via npm's optionalDependencies selector. esbuild-bundles cleanly. No node-gyp, no compile-on-install. Major bump to 2.0.0 to signal the implementation flip.
  • relayburn (npm): stays as the user-facing npm install path, becomes a postinstall download shim for the burn binary.
  • Wash plugin install (/plugin install relaywash@…): wash binary ships the MCP server; plugin manifest points at the platform-specific binary or a 20-line Node shim that execves it.

npm deprecation plan

Stays published (different impl, same name):

  • @relayburn/sdk — name unchanged; implementation flips from TS-that-imports-the-others to napi-rs loader. 2.0.0 major signals the rewrite.
  • relayburn — stays as install path; flips to binary-download shim.

Deprecated via npm deprecate (don't unpublish — pinned consumers shouldn't break):

  • @relayburn/reader → "merged into @relayburn/sdk 2.0; internal Rust crate now"
  • @relayburn/ledger → same
  • @relayburn/analyze → same
  • @relayburn/ingest → same
  • @relayburn/mcp → "use burn mcp-server from the burn binary, or @relayburn/sdk for embedded MCP"
  • @relayburn/cli → "use the burn binary from the relayburn npm package or GitHub Releases"

Run npm deprecate only after @relayburn/sdk@2.0.0 and relayburn@2.0.0 are live, so deprecation messages can point at a real replacement.

Sequencing

  1. Rust workspace skeleton on a long-lived branch — the option-2 layout (internal relayburn-{reader,ledger,analyze,ingest} + published relayburn-sdk + relayburn-cli).
  2. Port relayburn-reader + relayburn-ledger + relayburn-analyze behind the existing JSON contract. The 25 TS test fixtures (8 reader + 17 analyze) become the conformance gate — every Rust crate must produce byte-identical output to the TS version on the fixture corpus.
  3. Port relayburn-ingest (watch-loop on tokio::time::interval for now; notify migration is Phase 2).
  4. Port relayburn-sdk mirroring today's TS surface (handle + free functions × 6 verbs).
  5. Stand up relayburn-sdk-node + @relayburn/sdk 2.0.0-pre with napi-rs as soon as relayburn-sdk is functional. Don't defer this — landing the napi-rs CI matrix early surfaces bindings issues while the surface is small. Platform packages (@relayburn/sdk-darwin-arm64, etc.) get published in the same workflow.
  6. Port relayburn-cli with clap v4 derive. CLI surface matches today byte-for-byte. MCP folds in as a burn mcp-server subcommand here (resolves mcp: refactor @relayburn/mcp as a thin wrapper over burn <verb> --json so the MCP surface tracks the CLI automatically #210).
  7. Cut a final TS 1.x lockstep release of all 8 packages immediately before cutover, so anyone who can't migrate has a stable pin.
  8. Lockstep 2.0 cutover (single day):
    • Publish relayburn-sdk@1.0.0 and relayburn-cli@1.0.0 to crates.io.
    • Publish @relayburn/sdk@2.0.0 (umbrella + platform packages) to npm.
    • Publish relayburn@2.0.0 as a postinstall download shim.
    • Publish burn binaries to GitHub Releases for all targets.
    • Run npm deprecate on the six retired packages with pointers to 2.0.
    • Wash bumps to relayburn-sdk = "1".
  9. Decommission TS packages in-tree. Only packages/sdk-node survives. Repo becomes a Cargo workspace with one TS subdirectory for the npm facade.
  10. Phase 2 (post-2.0): switch ingest watch loop from polling to notify; revisit pending-stamp protocol for FS-event coalescing.

Risk register

  • Lock-protocol porting risk is front-of-mind. The retry timing (50×20ms then 40×250ms), orphan recovery (5s mtime threshold), and AsyncLocalStorage re-entrancy on lock name (archive, ledger-index, ledger, test-serialize) is load-bearing — replicate the protocol behavior, not just the primitives. Don't reach for fs2::FileExt::lock_exclusive; flock semantics differ across macOS/Windows.
  • Parser surface is the most volatile code in the repo. claude.ts / codex.ts / opencode.ts change with every new harness/tool/skill shape. Run TS and Rust in parallel with a fixture-conformance gate during the port.
  • Activity classifier rule tables are the most likely site of subtle behavior drift. Add property-based tests on the fixture corpus that assert category assignments match TS exactly.
  • Overhead attribution arithmetic (per-file token-share USD math added in Lift runOverhead into @relayburn/sdk (closes #235) #236) is a fresh surface for divergence; add a dedicated overhead-conformance fixture pair.
  • SQLite schema compatibility must be preserved — archive.sqlite files in the wild need to keep working without a forced rebuild, or we ship a one-time migration in burn state rebuild archive --full.
  • napi-rs CI matrix: 4-5 platforms × build × test, plus npm publish per-platform. Use napi-rs's standard GitHub Actions templates as a starting point.
  • Lockstep semver: crates.io relayburn-sdk and npm @relayburn/sdk ship from a single release workflow. No independent npm patches between Rust releases — that path leads to drift.
  • TS ergonomics across the FFI: bigint for u64 token counts is the right call but will surprise consumers; document loudly. Date types pass as ISO strings; the TS facade converts where helpful.
  • Changelog/release machinery: workflow-driven today. Port to cargo-release + napi publish + the same [Unreleased] promotion logic applied to both crates/*/CHANGELOG.md and packages/sdk-node/CHANGELOG.md.

Acceptance

  • burn state rebuild archive --full on a 1.5GB ledger completes in ≤10s on M-series silicon (target 5-10s).
  • Every *.test.ts fixture in packages/reader/ (8 files) and packages/analyze/ (17 files) passes against the Rust crate that owns it (golden conformance gate).
  • Overhead attribution (per-file, per-section USD totals) on the fixture corpus matches TS within 1e-9 USD.
  • relayburn-sdk + relayburn-cli published on crates.io with documented public API; wash builds against relayburn-sdk = "1".
  • @relayburn/sdk@2.0.0 published on npm, esbuild-bundles cleanly, runs on Node ≥ 20.11 across darwin-{arm64,x64} and linux-{arm64,x64} without compile-on-install.
  • A TS consumer can import { ingest, summary } from '@relayburn/sdk', esbuild-bundle, and run with no Node-side native build step.
  • burn available as prebuilt static binary for the four primary (os, arch) targets via GitHub Releases.
  • cargo install relayburn-cli produces a working burn binary on a clean machine.
  • Crates.io and npm versions ship in lockstep from a single release workflow.
  • @relayburn/{reader,ledger,analyze,ingest,mcp,cli} are deprecated on npm with pointers to the 2.0 replacements; not unpublished.

Sub-issues

Filed and tracked individually. Dependency order shown in parens; siblings can run in parallel.


Supersedes #222 (filed 2026-05-01, before the SDK source-of-truth refactor in #232/#236 and the ingest extraction in #226). Supersedes #218 (already closed once @relayburn/sdk shipped).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions