Skip to content

Design: content sidecar store with retention and opt-out #33

@willwashburn

Description

@willwashburn

Context

Burn's original design deliberately didn't store prompt or response text — only token counts, hashes, and metadata. That was the right call for a narrow attribution tool, but several in-flight features get meaningfully stronger with content available:

Decision: flip the default to store full content, via a sidecar store separate from the usage ledger, with retention and an opt-out.

Design

Sidecar layout

Main ledger stays as-is — ~/.relayburn/ledger.jsonl is still the usage/metadata spine. Add a parallel content store at ~/.relayburn/content/<sessionId>.jsonl, keyed by (sessionId, messageId). One file per session keeps file counts manageable and lets retention work by age-of-file.

Why sidecar and not inline in the ledger: burn summary --since 7d reads the full ledger. If each line carries message bodies, summary reads MB of content just to compute aggregate numbers. Sidecar means main-path queries stay fast; content loads only when a query needs it.

ContentRecord shape

// packages/reader/src/types.ts (added alongside TurnRecord)
interface ContentRecord {
  v: 1;
  sessionId: string;
  messageId: string;
  ts: string;
  role: 'user' | 'assistant' | 'tool_result';
  kind: 'text' | 'thinking' | 'tool_use' | 'tool_result';
  
  // Content by kind — at most one is populated:
  text?: string;              // role=user/assistant with text content, or thinking
  toolUse?: {
    id: string;
    name: string;
    input: Record<string, unknown>;   // raw input, not hashed
  };
  toolResult?: {
    toolUseId: string;
    content: string | unknown;        // can be structured for some tools
    isError?: boolean;
  };
}

TurnRecord stays untouched — no schema migration on existing ledger entries. Content is a sidecar, not a field.

Retention

Default: 90-day rolling window for content, forever for the main usage ledger. Ages files in ~/.relayburn/content/ by file mtime — delete files where mtime < now - TTL.

Retention runs:

  • Opportunistically on each burn command invocation (cheap: just readdir + stat + rm)
  • Explicitly via burn content prune

Configurable via:

  • Env: RELAYBURN_CONTENT_TTL_DAYS=<n> (default 90; forever or -1 disables)
  • Config file: ~/.relayburn/config.json{ \"content\": { \"retentionDays\": 90 } }

Opt-out modes

content.store: one of 'full' | 'hash-only' | 'off'. Default: full.

Configurable via:

  • Env: RELAYBURN_CONTENT_STORE=full|hash-only|off
  • Config file: ~/.relayburn/config.json{ \"content\": { \"store\": \"full\" } }

Reader changes

Each reader (claude.ts, opencode.ts, codex.ts, future collectors) returns two streams when content store is enabled:

async function parseClaudeSession(
  path: string,
  options: ParseOptions = {},
): Promise<{ turns: TurnRecord[]; content: ContentRecord[] }> { ... }

When content store is hash-only or off, the content array is empty. Callers of the reader API get the same turns either way.

Ledger API additions

// packages/ledger/src/content.ts
export async function appendContent(records: ContentRecord[]): Promise<void>;
export async function readContent(
  selector: { sessionId: string; messageId?: string }
): Promise<ContentRecord[]>;
export async function pruneContent(options: { olderThanMs: number }): Promise<{ filesDeleted: number; bytesFreed: number }>;

No schema migration on the main ledger. Content directory is created lazily on first write.

Affected consumers (all get comments linking here)

README and landscape.md updates on merge

Current README says "Not recorded: Prompt content, Model responses." That flips. New wording (to be drafted): "Content is stored locally in a sidecar keyed by session. Default retention 90 days. Disable with content.store=hash-only to restore the hash-only behavior, or off to skip entirely."

landscape.md section on agentsview's content-storage divergence updates: burn now matches that design choice, for the reasons they do.

Acceptance

  • Reader emits { turns, content } when content.store=full; just { turns } (content: []) otherwise.
  • packages/ledger/src/content.ts provides appendContent, readContent, pruneContent. Tests cover each mode.
  • burn summary is not measurably slower when the content sidecar exists — confirmed by reading only ledger.jsonl, never touching content/.
  • Retention: pruneContent({olderThanMs: 90 * 24 * 3600 * 1000}) deletes the expected files on a fixture with mixed ages.
  • Opt-out switch works end-to-end: RELAYBURN_CONTENT_STORE=off burn claude ... produces no content directory entries.
  • Documentation updated (README + landscape.md) before merge.

Priority

High. Foundational — several in-flight issues (#2, #3, #6, #10, #11) get meaningfully stronger once content is available. Order this before those when planning sprints.

Out of scope

  • SQLite-based content store (future option if FTS becomes a user need; JSONL sidecar ships first).
  • Encryption at rest (users who need this can put $RELAYBURN_HOME on an encrypted volume).
  • Content redaction / PII scrubbing at ingest (explicit hash-only mode is the answer here).
  • Remote sync of content (burn is local-first; if relay/workforce want team-visible content, that's their orchestration layer's call).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions