Skip to content

Protect recoverable sidecars from retention prune #61

@willwashburn

Description

@willwashburn

Context

`burn content prune` (and the opportunistic prune that runs on every CLI invocation) deletes sidecar files whose mtime is older than `content.retentionDays` (default 90). Reasonable for content that's effectively unrecoverable — but:

  • Source session files (Claude's `/.claude/projects/*`, Codex's `/.codex/sessions/`, OpenCode's `~/.local/share/opencode/storage/`) are maintained by the upstream agents, not by burn, and many users keep them indefinitely.
  • `burn rebuild --content` (added in Add content capture for Codex and OpenCode parsers (#33 follow-up) #58) can now rebuild a sidecar from the source file at any time.
  • Therefore, deleting a sidecar when its source file still exists is silently lossy for an operation that's no longer needed — the data can always be rederived.

Concretely: after #58, if a user has run the default opportunistic prune for a year on a ledger they also keep indexed sources for, then runs `burn waste`, they'll see even-split attribution on a large chunk of sessions that could have been sized. The user-visible symptom — "attribution degraded: N sessions have no content sidecar" (#60) — is real but misleading, because it looks like a parser or ingestion gap when the real cause is that retention pruned recoverable data.

We nearly dodged this in practice because content capture for codex and opencode was broken until #58; in a future with correctly-capturing parsers, pruning eats the data the parsers worked hard to produce.

Proposed behavior

Prune refuses to delete a sidecar whose corresponding source session file still exists under one of the adapter roots. Specifically:

  1. Resolve source existence. Given `.jsonl` in `contentDir()`, look it up in the walked source sets:
    • Claude: `~/.claude/projects/*/.jsonl` exists
    • Codex: any file under `~/.codex/sessions/**` whose name ends with `-.jsonl` exists
    • OpenCode: `~/.local/share/opencode/storage/session/*/.json` exists
  2. If found, skip pruning that sidecar. The source is the authoritative recovery path.
  3. If not found (source is genuinely gone — user deleted their agent data, rotated a cache, etc.), apply the existing retention rule unchanged.

Implementation lives in `packages/ledger/src/content.ts` → `pruneContent`. It currently does only `readdir + stat + mtimeCompare + unlink`; add a "is there still a source file for this session" check, and short-circuit.

To keep `pruneContent` source-agnostic (the ledger package shouldn't hardcode Claude/Codex/OpenCode paths), invert the dependency:

```ts
export interface PruneOptions {
olderThanMs: number;
// Optional. When provided, prune skips sessions for which the callback
// returns true — the source is recoverable and deleting the sidecar
// would be silently lossy.
isRecoverable?: (sessionId: string) => Promise;
}
```

The CLI (`packages/cli/src/commands/content.ts` and the opportunistic-prune caller) passes an `isRecoverable` that consults a small in-memory index of source files, built once at CLI start by the same walker that `ingest.ts` already uses (`walkJsonl`, `walkOpencodeSessions`, plus the Claude project dir scan). Cost: one `readdir` pass per source root, ~100ms total on a ledger this size.

The ledger package stays free of adapter-specific knowledge. Ingest-style source discovery stays in the CLI.

Why a callback and not adapter paths in the ledger

  • Keeps `@relayburn/ledger` decoupled from adapter roots (no circular dep between ledger and reader).
  • Future collectors (Collector: Gemini CLI #13Collector: Cortex Code (Snowflake) #36) plug in trivially: each adapter contributes to the source index before prune runs, no ledger API change needed.
  • Testable: tests can supply an in-memory `isRecoverable` without filesystem fixtures.

User-facing opt-out

Some users intentionally prune to reclaim disk space even when sources exist (sidecars can be 10–100× the source size for verbose tool outputs). Preserve that path explicitly:

  • `burn content prune --force` bypasses the recoverable check, matching today's behavior.
  • Default `burn content prune` applies the check.
  • Opportunistic prune on every CLI invocation always applies the check. If you want aggressive reclamation you must run `--force` explicitly.

Env-var toggle for automation: `RELAYBURN_PRUNE_FORCE=1`.

Output changes

`pruneContent` today returns `{ filesDeleted, bytesFreed }`. Add:

```ts
interface PruneResult {
filesDeleted: number;
bytesFreed: number;
skippedRecoverable: number; // new — sidecars kept because source still exists
}
```

`burn content prune` output:

```
pruned 42 content files (17.3 MB)
kept 1,823 recoverable sidecars whose source files still exist
(use 'burn content prune --force' to delete them anyway)
```

Acceptance

  • Default `burn content prune` on a fixture where sources exist leaves the sidecars alone and reports them as `skippedRecoverable`.
  • `--force` deletes per the retention rule regardless of source presence.
  • Opportunistic prune (on every CLI invocation) gains the same protection without a flag; only the explicit `--force` path skips it.
  • `ContentConfig` / `PruneOptions` API remains source-agnostic — the recoverability check is injected via callback.
  • Tests cover: source present (skipped), source absent (pruned), `--force` overrides both.
  • Output counts the skipped-recoverable files separately from deleted files.
  • README / `burn content` help reflect the new default.

Related

Out of scope

  • Compressing sidecars in place to avoid pruning for disk-reclamation reasons. Different concern; could be a separate issue if disk pressure becomes real.
  • Time-boxed grace period ("delete sidecars whose source is >N years old even if it still exists"). Retention on the source side is upstream's call; burn shouldn't second-guess it.
  • Background indexing of source roots. The walker is fast enough to run synchronously at prune time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions