Skip to content

Epic: rewrite burn in Rust (and pivot #218 SDK to a Rust crate) #222

@willwashburn

Description

@willwashburn

Context

Two converging pressures push burn toward a Rust rewrite:

  1. Ingest at scale is too slow. A real user ledger at 1.5GB takes ~60s to do a full archive rebuild today. The hot loop (packages/ledger/src/archive.ts:1224-1248) is createReadStream → JSON.parse per line, single-threaded; a full burn state rebuild all re-streams the entire ledger three times (archive.ts, index-sidecar.ts:189, file-adapter.ts:481). Node's parser ceiling is ~50-100 MB/s on JSON-heavy lines, so the wall-clock cost grows linearly with ledger size and there's no incremental fix that closes the gap.
  2. Wash (AgentWorkforce/wash) is going Rust. Issue Publish relayburn/sdk programmatic surface for embedded use #218 was scoped as a Node-importable SDK so wash could esbuild-bundle it; with wash itself going Rust, that framing is obsolete. The right SDK is a burn-sdk crate on crates.io that wash adds to Cargo.toml, plus a thin napi-rs wrapper published as @relayburn/sdk on npm so existing TS/Node consumers keep working without spawning the CLI.

Rust is the source of truth; the npm package is a generated binding layer over it. This issue supersedes #218 (which proposed a TS-native SDK with better-sqlite3); #218 stays open as the Node-only option of record until this issue lands, then closes.

Goal

Port the six-package TS workspace to a Rust Cargo workspace that:

  • Cuts burn state rebuild archive --full on a 1.5GB ledger from ~60s to ~5-10s (SQLite-bound, not parser-bound).
  • Ships a single static burn binary per (os, arch) triple.
  • Publishes burn-sdk to crates.io as the supported embedding surface for Rust consumers (wash).
  • Publishes @relayburn/sdk to npm as a napi-rs wrapper over burn-sdk for TS/Node consumers — first-class deliverable, day one. Replaces Publish relayburn/sdk programmatic surface for embedded use #218.
  • Drops burn-cli cold-start from ~80ms (Node) to ~5ms.
  • Drops burn ingest --watch RSS from ~80MB to ~10MB; switches the watch loop from stat-polling to notify (inotify/FSEvents).

Capability parity, not new features. No burn command added or removed during the port.

Workspace shape

crates/
  burn-reader     # parsers: claude.rs, codex.rs, opencode.rs, classifier.rs
  burn-ledger     # JSONL append + content sidecar + file lock + sqlite archive
  burn-analyze    # pricing (vendored models.dev), cost derivation, compare aggregator
  burn-mcp        # rmcp stdio server (or thin shell over `burn --json`, per #210)
  burn-cli        # `burn` binary (clap), harness adapters, watch loop
  burn-sdk        # public Rust API: Ledger, ingest(), summary(), hotspots()
  burn-sdk-node   # napi-rs bindings crate; depends on burn-sdk; emits .node + .d.ts

packages/
  sdk-node        # npm package `@relayburn/sdk`: prebuilt .node binaries + generated .d.ts + thin TS facade

Build order: reader → ledger → analyze → mcp → cli, with sdk re-exporting from the lower crates and sdk-node depending on sdk. packages/sdk-node is the only TS package that survives the port — it's mostly generated.

Crate-by-crate translation

TS today Rust target Notes
zod schemas in reader/types.ts serde + serde_json with #[derive(Deserialize)] TurnRecord, ContentRecord, LedgerLine become tagged enums; v schema field stays.
claude.ts (2255), codex.ts (1506), opencode.ts + opencode-stream.ts (1882) streaming serde_json::Deserializer::from_reader().into_iter() per line Bulk of the rewrite. Use existing *.test.ts fixtures as Rust acceptance tests.
classifier.ts rule tables phf static maps + match arms Keep rule tables flat & data-driven so adding a harness stays a one-PR change.
ledger/file-adapter.ts + lock.ts fs2::FileExt::lock_exclusive + BufWriter withLock('ledger', …) becomes a typed LedgerLock guard the type system enforces.
sqlite archive rusqlite Use bigger transaction batches (5-10k inserts/tx) instead of TS's per-row pattern; enable WAL + synchronous=NORMAL for the rebuild path.
mcp/server.ts rmcp crate Decision needed: standalone, or thin shell over burn --json per #210. Recommend folding #210 into this epic.
cli clap v4 derive Match existing flag surface byte-for-byte.
harnesses/*.ts trait HarnessAdapter with async fn plan/before_spawn/after_exit One file per harness, lazy-registered in a phf table.
watch-loop polling notify + interval fallback Real FS events instead of stat-polling — quiet upgrade, lower idle CPU.

SDK shape — Rust (burn-sdk on crates.io)

// crates/burn-sdk/src/lib.rs
pub use burn_reader::{TurnRecord, ContentRecord, ActivityCategory, Harness};
pub use burn_ledger::{Ledger, LedgerHandle};
pub use burn_analyze::{Summary, HotspotFinding, Pattern};

pub struct LedgerOptions { pub home: Option<PathBuf> }
impl Ledger { pub fn open(opts: LedgerOptions) -> Result<LedgerHandle> }

pub struct IngestOptions {
    pub session_id: String,
    pub harness: Option<Harness>,
    pub ledger_home: Option<PathBuf>,
}
pub async fn ingest(opts: IngestOptions) -> Result<IngestReport>;

#[derive(Default)]
pub struct SummaryQuery {
    pub session: Option<String>,
    pub project: Option<String>,
    pub since: Option<TimeRange>,
}
impl LedgerHandle {
    pub fn summary(&self, q: SummaryQuery) -> Result<Summary>;
    pub fn hotspots(&self, q: HotspotsQuery) -> Result<Vec<HotspotFinding>>;
}
// Free `summary(q)` / `hotspots(q)` for one-shot CLI-style use.
pub async fn summary(q: SummaryQuery) -> Result<Summary>;

Two design knobs:

  • Async boundary: ingest and watch-loop are async (tokio); summary / hotspots are sync — they're CPU-bound queries against an open handle. Wash's MCP handlers wrap them in spawn_blocking.
  • Handle vs free fn: offer both. Free fn for one-shot use; LedgerHandle::summary for embedded paths (wash's MCP server keeps a long-lived handle).

SDK shape — TypeScript (@relayburn/sdk on npm, day-1 deliverable)

burn-sdk-node exposes the same surface via napi-rs:

// crates/burn-sdk-node/src/lib.rs
use napi_derive::napi;

#[napi]
pub struct Ledger { inner: burn_sdk::LedgerHandle }

#[napi]
impl Ledger {
    #[napi(factory)]
    pub fn open(opts: Option<LedgerOptions>) -> napi::Result<Ledger> { /* … */ }

    #[napi]
    pub fn summary(&self, q: SummaryQuery) -> napi::Result<Summary> { /* … */ }
}

#[napi]
pub async fn ingest(opts: IngestOptions) -> napi::Result<IngestReport> { /* … */ }

Generated TS surface (auto-emitted .d.ts):

// @relayburn/sdk
export interface LedgerOptions { home?: string }
export interface IngestOptions { sessionId: string; harness?: Harness; ledgerHome?: string }
export interface SummaryQuery { session?: string; project?: string; since?: string }

export class Ledger {
  static open(opts?: LedgerOptions): Ledger;
  summary(q: SummaryQuery): Summary;
  hotspots(q: HotspotsQuery): HotspotFinding[];
}
export function ingest(opts: IngestOptions): Promise<IngestReport>;
export function summary(q: SummaryQuery): Promise<Summary>;

Binding rules:

  • Errors: Result<T, E> → throws on the JS side; E becomes a typed BurnError with code and cause.
  • Numbers: u64 token counts → bigint in TS; cost (f64 USD) → number. Document the bigint boundary in the README.
  • Async: Rust async fnPromise<T> automatically via napi's tokio runtime integration.
  • Codegen: .d.ts is generated by napi-rs; never hand-edited. The TS package is index.js (loader) + index.d.ts (generated) + prebuilt .node binaries. Source-of-truth single direction: Rust → TS, never the reverse.

The TS facade in packages/sdk-node is small enough (loader + re-exports + a few JS-ergonomics helpers like converting ISO strings to Date) that it's effectively zero ongoing maintenance.

Distribution

  • burn binary: GitHub Releases with prebuilt static binaries for darwin-{arm64,x64}, linux-{arm64,x64}, optionally windows-x64. cargo install burn-cli as a fallback for users with Rust toolchains.
  • burn-sdk: crates.io. Wash adds burn-sdk = "1" to Cargo.toml and statically links it.
  • @relayburn/sdk (npm): prebuilt .node binaries via napi-rs's standard CI recipe — @relayburn/sdk-darwin-arm64, -darwin-x64, -linux-arm64-gnu, -linux-x64-gnu, optionally -win32-x64-msvc. The umbrella @relayburn/sdk package picks the right native package via npm's optionalDependencies selector. esbuild-bundles cleanly. No node-gyp, no compile-on-install.
  • relayburn (npm, legacy): keep on npm as a postinstall shim that downloads the right prebuilt burn binary, so npm i -g relayburn keeps working through the transition. Phase out post-1.0 if no one's using it.
  • Wash plugin install (/plugin install relaywash@…): wash binary ships the MCP server; plugin manifest points at the platform-specific binary or a 20-line Node shim that execves it. Same model tokscale uses.

Sequencing

  1. Land the SDK refactor in TS first (the burn-cli → burn-sdk direction, even though the SDK is still TS). Establishes a clean port boundary and keeps users on a stable CLI through the migration.
  2. Port burn-reader + burn-ledger + burn-analyze behind the same JSON contract. Use existing *.test.ts fixtures as conformance tests — every Rust crate has to produce byte-identical output to the TS version on the fixture corpus.
  3. Stand up burn-sdk-node and @relayburn/sdk with napi-rs as soon as burn-sdk is functional. Don't defer this — landing the napi-rs CI matrix early surfaces bindings issues while the surface is small.
  4. Port burn-cli with clap. Cut over the published binary; npm relayburn package becomes a download shim.
  5. Port burn-mcp (and resolve mcp: refactor @relayburn/mcp as a thin wrapper over burn <verb> --json so the MCP surface tracks the CLI automatically #210 — standalone server vs burn --json shell — in this epic).
  6. Publish burn-sdk 1.0 to crates.io and @relayburn/sdk 1.0 to npm in lockstep. Wash bumps to it. Closes Publish relayburn/sdk programmatic surface for embedded use #218.
  7. Decommission TS packages once nothing depends on them. Only packages/sdk-node survives.

Risk register

  • Parser surface is the most volatile code in the repo. claude.ts / codex.ts / opencode.ts change with every new harness/tool/skill shape. Two options: freeze TS feature work during the port (slow), or run TS and Rust in parallel with a fixture-conformance gate (more work, safer). Recommend the latter.
  • Activity classifier rule tables are the most likely site of subtle behavior drift. Add a property-based test on the fixture corpus that asserts category assignments match TS exactly.
  • SQLite schema compatibility must be preserved — archive.sqlite files in the wild need to keep working without a forced rebuild, or we ship a one-time migration in burn state rebuild archive --full.
  • Lock semantics across processes: Node's flock and Rust's fs2::FileExt use the same OS primitive (flock(2) on Linux/mac), so cross-process locks between mid-migration TS and Rust tools work. Verify on Windows.
  • napi-rs CI matrix: 4-5 platforms × build × test, plus npm publish per-platform. Use napi-rs's standard GitHub Actions templates as a starting point — they handle the matrix and the optionalDependencies trick correctly.
  • Lockstep semver: crates.io burn-sdk and npm @relayburn/sdk ship with the same version, always, via a single release workflow. No independent npm patches between Rust releases — that path leads to drift.
  • TS ergonomics across the FFI: bigint for u64 token counts is the right call but will surprise consumers; document loudly. Date types are passed as ISO strings; the TS facade converts where helpful.
  • Changelog/release machinery: workflow-driven today. Port to cargo-release + napi publish + the same [Unreleased] promotion logic applied to both crates/*/CHANGELOG.md and packages/sdk-node/CHANGELOG.md.

Acceptance

  • burn state rebuild archive --full on a 1.5GB ledger completes in ≤10s on M-series silicon (target: 5-10s).
  • Every *.test.ts fixture passes against the Rust crate that owns it (golden conformance gate).
  • burn-sdk published on crates.io with documented public API; wash builds against it.
  • @relayburn/sdk published on npm, esbuild-bundles cleanly, runs on Node ≥ 20.11 across darwin-{arm64,x64} and linux-{arm64,x64} without compile-on-install.
  • A TS consumer can import { ingest, summary } from '@relayburn/sdk', esbuild-bundle, and run with no Node-side native build step. (Closes Publish relayburn/sdk programmatic surface for embedded use #218.)
  • burn available as prebuilt static binary for the four primary (os, arch) targets.
  • Crates.io and npm versions ship in lockstep from a single release workflow.

Sub-issues to file

  • Rust workspace skeleton on claude/rust-rewrite-exploration-RaBtK
  • Port burn-reader parsers with TS fixture conformance gate
  • Port burn-ledger (JSONL + lock + sqlite archive)
  • Port burn-analyze (pricing, cost derivation, compare aggregator)
  • Stand up burn-sdk-node with napi-rs (bindings crate + CI matrix + first prebuilt artifact)
  • Stand up @relayburn/sdk npm package wrapping burn-sdk-node (loader + .d.ts + optionalDependencies)
  • Port burn-cli (clap surface matching today's flags)
  • Port burn-mcp (resolve mcp: refactor @relayburn/mcp as a thin wrapper over burn <verb> --json so the MCP surface tracks the CLI automatically #210 in this scope)
  • Publish burn-sdk 1.0 to crates.io + @relayburn/sdk 1.0 to npm in lockstep (closes Publish relayburn/sdk programmatic surface for embedded use #218)
  • Release pipeline: prebuilt static binaries on GitHub Releases + napi-rs CI matrix + lockstep version workflow
  • npm relayburn package becomes a download-shim
  • Migration note in CHANGELOG + README

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions