You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`burn content prune` (and the opportunistic prune that runs on every CLI invocation) deletes sidecar files whose mtime is older than `content.retentionDays` (default 90). Reasonable for content that's effectively unrecoverable — but:
Source session files (Claude's `/.claude/projects/*`, Codex's `/.codex/sessions/`, OpenCode's `~/.local/share/opencode/storage/`) are maintained by the upstream agents, not by burn, and many users keep them indefinitely.
Therefore, deleting a sidecar when its source file still exists is silently lossy for an operation that's no longer needed — the data can always be rederived.
Concretely: after #58, if a user has run the default opportunistic prune for a year on a ledger they also keep indexed sources for, then runs `burn waste`, they'll see even-split attribution on a large chunk of sessions that could have been sized. The user-visible symptom — "attribution degraded: N sessions have no content sidecar" (#60) — is real but misleading, because it looks like a parser or ingestion gap when the real cause is that retention pruned recoverable data.
We nearly dodged this in practice because content capture for codex and opencode was broken until #58; in a future with correctly-capturing parsers, pruning eats the data the parsers worked hard to produce.
Proposed behavior
Prune refuses to delete a sidecar whose corresponding source session file still exists under one of the adapter roots. Specifically:
Resolve source existence. Given `.jsonl` in `contentDir()`, look it up in the walked source sets:
Claude: `~/.claude/projects/*/.jsonl` exists
Codex: any file under `~/.codex/sessions/**` whose name ends with `-.jsonl` exists
If found, skip pruning that sidecar. The source is the authoritative recovery path.
If not found (source is genuinely gone — user deleted their agent data, rotated a cache, etc.), apply the existing retention rule unchanged.
Implementation lives in `packages/ledger/src/content.ts` → `pruneContent`. It currently does only `readdir + stat + mtimeCompare + unlink`; add a "is there still a source file for this session" check, and short-circuit.
To keep `pruneContent` source-agnostic (the ledger package shouldn't hardcode Claude/Codex/OpenCode paths), invert the dependency:
```ts
export interface PruneOptions {
olderThanMs: number;
// Optional. When provided, prune skips sessions for which the callback
// returns true — the source is recoverable and deleting the sidecar
// would be silently lossy.
isRecoverable?: (sessionId: string) => Promise;
}
```
The CLI (`packages/cli/src/commands/content.ts` and the opportunistic-prune caller) passes an `isRecoverable` that consults a small in-memory index of source files, built once at CLI start by the same walker that `ingest.ts` already uses (`walkJsonl`, `walkOpencodeSessions`, plus the Claude project dir scan). Cost: one `readdir` pass per source root, ~100ms total on a ledger this size.
The ledger package stays free of adapter-specific knowledge. Ingest-style source discovery stays in the CLI.
Why a callback and not adapter paths in the ledger
Keeps `@relayburn/ledger` decoupled from adapter roots (no circular dep between ledger and reader).
Testable: tests can supply an in-memory `isRecoverable` without filesystem fixtures.
User-facing opt-out
Some users intentionally prune to reclaim disk space even when sources exist (sidecars can be 10–100× the source size for verbose tool outputs). Preserve that path explicitly:
Output counts the skipped-recoverable files separately from deleted files.
README / `burn content` help reflect the new default.
Related
Blocks / partially motivates Promote 'even-split' note to a prominent warning when it dominates #60 (even-split warning) — with prune-protected sidecars, degraded attribution becomes more clearly a "never had content" problem rather than a "lost content" problem, which changes what the warning should say.
Depends on the walker helpers already used by ingest (`walkJsonl`, `walkOpencodeSessions`). No new filesystem code needed in `ledger`.
Compressing sidecars in place to avoid pruning for disk-reclamation reasons. Different concern; could be a separate issue if disk pressure becomes real.
Time-boxed grace period ("delete sidecars whose source is >N years old even if it still exists"). Retention on the source side is upstream's call; burn shouldn't second-guess it.
Background indexing of source roots. The walker is fast enough to run synchronously at prune time.
Context
`burn content prune` (and the opportunistic prune that runs on every CLI invocation) deletes sidecar files whose mtime is older than `content.retentionDays` (default 90). Reasonable for content that's effectively unrecoverable — but:
/.claude/projects/*`, Codex's `/.codex/sessions/`, OpenCode's `~/.local/share/opencode/storage/`) are maintained by the upstream agents, not by burn, and many users keep them indefinitely.Concretely: after #58, if a user has run the default opportunistic prune for a year on a ledger they also keep indexed sources for, then runs `burn waste`, they'll see even-split attribution on a large chunk of sessions that could have been sized. The user-visible symptom — "attribution degraded: N sessions have no content sidecar" (#60) — is real but misleading, because it looks like a parser or ingestion gap when the real cause is that retention pruned recoverable data.
We nearly dodged this in practice because content capture for codex and opencode was broken until #58; in a future with correctly-capturing parsers, pruning eats the data the parsers worked hard to produce.
Proposed behavior
Prune refuses to delete a sidecar whose corresponding source session file still exists under one of the adapter roots. Specifically:
Implementation lives in `packages/ledger/src/content.ts` → `pruneContent`. It currently does only `readdir + stat + mtimeCompare + unlink`; add a "is there still a source file for this session" check, and short-circuit.
To keep `pruneContent` source-agnostic (the ledger package shouldn't hardcode Claude/Codex/OpenCode paths), invert the dependency:
```ts
export interface PruneOptions {
olderThanMs: number;
// Optional. When provided, prune skips sessions for which the callback
// returns true — the source is recoverable and deleting the sidecar
// would be silently lossy.
isRecoverable?: (sessionId: string) => Promise;
}
```
The CLI (`packages/cli/src/commands/content.ts` and the opportunistic-prune caller) passes an `isRecoverable` that consults a small in-memory index of source files, built once at CLI start by the same walker that `ingest.ts` already uses (`walkJsonl`, `walkOpencodeSessions`, plus the Claude project dir scan). Cost: one `readdir` pass per source root, ~100ms total on a ledger this size.
The ledger package stays free of adapter-specific knowledge. Ingest-style source discovery stays in the CLI.
Why a callback and not adapter paths in the ledger
User-facing opt-out
Some users intentionally prune to reclaim disk space even when sources exist (sidecars can be 10–100× the source size for verbose tool outputs). Preserve that path explicitly:
Env-var toggle for automation: `RELAYBURN_PRUNE_FORCE=1`.
Output changes
`pruneContent` today returns `{ filesDeleted, bytesFreed }`. Add:
```ts
interface PruneResult {
filesDeleted: number;
bytesFreed: number;
skippedRecoverable: number; // new — sidecars kept because source still exists
}
```
`burn content prune` output:
```
pruned 42 content files (17.3 MB)
kept 1,823 recoverable sidecars whose source files still exist
(use 'burn content prune --force' to delete them anyway)
```
Acceptance
Related
Out of scope