Content sidecar with retention and opt-out (#33)#43
Conversation
Stores full conversation content in a parallel JSONL store keyed by
session, separate from the usage ledger. Defaults to full capture with a
90-day rolling retention; env/config switches select hash-only or off to
restore the pre-flip minimal-storage behavior.
- reader: ContentRecord type; parseClaudeSession emits { turns, content }
with contentMode: full captures assistant text/thinking/tool_use and
user text/tool_result; incremental path defers content alongside the
message it belongs to.
- ledger: appendContent / readContent / pruneContent against
~/.relayburn/content/<sessionId>.jsonl. Config loader (env +
config.json) for content.store and content.retentionDays.
- cli: burn content prune [--days <n>] subcommand plus opportunistic
prune on every invocation; ingest paths for claude/codex/opencode pass
the resolved contentMode and write content alongside turns.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
noodleonthecape
left a comment
There was a problem hiding this comment.
I found two blocking issues:
-
packages/reader/src/claude.tswrites content out of chronological order. In both the full parser and the incremental parser, all user-side records are accumulated first and then all assistant records are appended later (parseClaudeSession: around lines 108, 143, 174;parseClaudeSessionIncremental: around lines 469, 527, 562). That means the sidecar no longer preserves the actual conversation sequence for a session, for exampleuser1, user2, assistant1, assistant2instead ofuser1, assistant1, user2, assistant2. For anything that reads the sidecar as a transcript, that is a correctness bug. -
The docs currently say the content sidecar stores full conversation content by default, including tool inputs and outputs, but the Codex and OpenCode readers still always return
content: [](packages/reader/src/codex.ts,packages/reader/src/opencode.ts). Sincecontent.store=fullis the default and the CLI passes that mode through for all three ingestors, this is user-visible behavior drift, not just an internal TODO. Either narrow the docs/UX to Claude-only for now or implement the missing capture before merge.
I did not approve for those reasons.
barryollama
left a comment
There was a problem hiding this comment.
Code Review
Overall this is a solid implementation of the content sidecar feature. A few issues to address:
🔴 Critical Issues
-
packages/reader/src/claude.ts:100 - Default is but sets . This mismatch means content is never captured by default.
-
packages/reader/src/codex.ts - Content capture is not implemented at all - always returns . The PR description mentions this is a follow-up, but the code should at least have a TODO or placeholder.
-
packages/reader/src/opencode.ts - Same issue - content capture not implemented, always returns .
⚠️ Warnings
-
packages/ledger/src/content.ts:56 - No validation on record size before appending. Large content could cause memory issues.
-
packages/ledger/src/config.ts:48 - Silently ignores invalid JSON in config file. Consider logging a warning.
-
packages/cli/src/commands/content.ts:62-64 - Swallows all errors in . Should at least log to stderr for debugging.
💡 Suggestions
-
packages/reader/src/types.ts - Consider adding a validator/type guard for runtime validation of parsed content.
-
packages/ledger/src/content.ts:106 - Could use a more efficient file iteration pattern for large content directories.
Let me add more detailed inline comments below.
|
{ |
| toolResult?: ContentToolResult; | ||
| } | ||
|
|
||
| export type ContentStoreMode = 'full' | 'hash-only' | 'off'; |
There was a problem hiding this comment.
🚩 'hash-only' content mode is accepted but functionally identical to 'off'
The ContentStoreMode type at packages/reader/src/types.ts:78 defines three modes: 'full' | 'hash-only' | 'off'. However, the parsers only check contentMode === 'full' (e.g., packages/reader/src/claude.ts:101), meaning 'hash-only' and 'off' both produce an empty content array. A user configuring store: 'hash-only' in their config would silently get no content captured at all, which may be surprising. The test at packages/reader/src/claude.test.ts:88-92 confirms this is the current intentional behavior, but it's worth documenting that hash-only is a placeholder for future functionality.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
You're right that this is worth calling out, but the modes aren't quite equivalent — hash-only still affects what lands in the main ledger, just not via the sidecar. The sidecar holds the text of prompts/responses/tool I/O; argsHash lives on TurnRecord.toolCalls (see reader/src/types.ts ToolCall), so hash-only = full usage + tool-call hashes + metadata in ledger.jsonl, no content sidecar records. That's the pre-flip behavior from #33. off is the same sidecar outcome (no files) with identical reader output, which is why the empty-array branch is shared. I'll happily add a doc comment on ContentStoreMode making that distinction explicit if it'd help readers.
| options: ParseOptions = {}, | ||
| ): Promise<TurnRecord[]> { | ||
| ): Promise<ParseResult> { | ||
| const contentMode = options.contentMode ?? 'off'; |
There was a problem hiding this comment.
There was a problem hiding this comment.
This is intentional, not a mismatch — the reader is a pure library and defaults to 'off' so embedding it (e.g. in tests or non-CLI callers) doesn't quietly start synthesizing content records. The CLI always calls loadConfig() and passes the resolved mode through to the reader (see ingest.ts and commands/{claude,codex,opencode}.ts), so user-facing default behavior is store: 'full' per DEFAULT_CONFIG. Tests exercise both paths (claude.test.ts 'returns empty content array when contentMode is off (default)' for the library default, content.test.ts for the CLI default via loadConfig). I can add a doc comment on ParseOptions.contentMode if you want it made explicit in-code.
| records: ContentRecord[], | ||
| ): Promise<void> { | ||
| const file = contentFilePath(sessionId); | ||
| const payload = records.map((r) => JSON.stringify(r)).join('\n') + '\n'; |
There was a problem hiding this comment.
💡 Consider size validation: No check on record size before appending. For large conversations, this could accumulate significant memory before writing.\n\nSuggestion: Add a warning or limit for individual records over a certain size (e.g., 1MB).
There was a problem hiding this comment.
Punting on this one. burn is local-first and the input is a session log produced on the same machine by Claude/codex/opencode — there's no adversarial write path into appendContent. A per-record size cap would need a policy (drop? truncate? warn?) and a follow-on question of what to do when a legitimate tool_result actually is several MB (grep/ls output, test transcripts). I'd rather land that alongside a concrete pain point. Happy to revisit if we see it cause trouble.
- Preserve chronological order of content records by tagging user records
with their line seq/offset and assistant records with the seq/offset of
the message's first line, then merging before emit (claude reader,
both full and incremental). Add interleaved-turns fixture + test.
- Validate sessionId in contentFilePath (and skip unsafe records in
appendContent) so a malicious `sessionId: "../foo"` can't escape the
content directory. New isValidSessionId export.
- Coordinate pruneContent with appendContent by acquiring the same
per-session lock before stat+unlink, eliminating the write/prune race.
- Use inclusive cutoff (mtimeMs > cutoff → keep) so
`pruneContent({olderThanMs: 0})` reliably clears the directory.
- README: clarify content sidecar is currently Claude-only; add TODO
comments in codex/opencode readers pointing at the follow-up.
- loadConfig: distinguish ENOENT (silent) from invalid JSON (warn to
stderr) when reading config.json.
- opportunisticPrune: log prune failures to stderr instead of swallowing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed both blocking issues in 7c68f03:
Could you re-review when you have a minute? Summary of other review-driven fixes in the same commit: sessionId path-traversal sanitization (Copilot), per-session lock coordination between prune and append (Copilot), inclusive cutoff so |
Devin Review caught an asymmetry: user content is filtered by offset < endOffset, but assistant content was emitted for every complete message in `order[]`. If a complete message appears in the file after an incomplete one, its content was emitted this pass and re-emitted next pass when the bytes are re-read from endOffset. appendTurns dedups by messageId so turns were safe; appendContent has no dedup so content records duplicated. Fix: apply the same offset < endOffset check to assistant content. Adds a test with [done, inprog, after] where only `done` commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add burn-readme-banner.png to the repo and update README.md to display it at the top. This adds a visual banner to improve the README's presentation.
noodleonthecape
left a comment
There was a problem hiding this comment.
Re-reviewed after the follow-up commits. The earlier blockers look fixed: Claude content ordering is now preserved, the incremental parser no longer duplicates assistant content across the endOffset boundary, session IDs are validated before writing sidecar files, and the README now clearly scopes content capture to Claude for now.
I also installed deps locally, built the workspace, and ran the test suite successfully (corepack pnpm -r build, corepack pnpm test: 95/95 passing).
Looks good to me.
Before this change, `burn claude` called ingestSession to append turns and content for the just-finished session file but never persisted a cursor. A later `burn summary` (which calls ingestAll → ingestClaudeInto) walked ~/.claude/projects/, found no cursor for the file, parsed from offset 0, and re-appended. Turns were safe — appendTurns dedups via the index sidecar — but appendContent has no dedup, so every content record duplicated. Fix: ingestSession now stats the file, runs the parse, and writes a ClaudeCursor at offsetBytes=file.size so ingestAll's rotated/offset check skips re-parsing. Export ingestSession for testability; new commands/claude.test.ts asserts the cursor is written at EOF. Scope note: codex and opencode wrappers have the same shape but their content arrays are currently always empty, so duplication isn't reachable there yet. Will need the equivalent cursor save when content capture lands for those formats (tracked by the existing TODO(#33-followup) comments in reader/src/codex.ts and reader/src/opencode.ts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop __setContentFileMtimeForTesting from @relayburn/ledger's public index.ts. Tests continue to import it via ./content.js relative path, so this just hides it from the package surface. - burn content prune --days <bad>: parseRetention now returns null for invalid input and runContentPrune prints a concise error + help and exits 2, instead of letting the exception bubble as a stack trace. New commands/content.test.ts covers the invalid, numeric, and forever input paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- types.ts: ContentToolResult.content changes from `string | unknown`
(collapses to `unknown` in TS) to `unknown`, with a comment
describing the shapes tool results can take.
- paths.ts: tighten sessionId cap from 255 to 128 so
`${id}.jsonl` and `content.${id}.lock` stay well under the
common 255-byte filename limit. Real session IDs (UUIDs and
short `ses_*` / `sess_*` / `turn_*` strings) are comfortably
within the new bound.
- config.ts: treat empty-string (and whitespace-only)
`RELAYBURN_CONTENT_TTL_DAYS` as unset. Previously `Number('')`
parsed as 0, which combined with opportunisticPrune on every CLI
invocation would silently delete the entire content sidecar — a
realistic hazard when CI/CD sets the var to an empty string or
users try to "unset" it.
Tests cover all three: empty/whitespace TTL env falls through to
config and then default; oversized sessionIds are rejected; the
128-char boundary is accepted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closes #33.
Summary
{ turns, content }.contentMode: 'full'captures assistant text/thinking/tool_use and user text/tool_result;'hash-only'and'off'returncontent: [].@relayburn/ledgersidecar at~/.relayburn/content/<sessionId>.jsonlwithappendContent/readContent/pruneContent, plus a config loader (env +config.json) forcontent.storeandcontent.retentionDays(defaultfull, 90 days).burn content prune [--days <n>]subcommand, plus opportunistic mtime-based prune on every invocation. Ingest paths for claude/codex/opencode pass the resolved content mode and write content alongside turns.Design notes
burn summarystill reads onlyledger.jsonl, so aggregate queries stay fast regardless of sidecar size.RELAYBURN_CONTENT_STORE=offproduces no content files end-to-end.content: []for now; their content capture is a follow-up. The signature change keeps the API uniform.Test plan
pnpm -r buildpassespnpm test— 89/89 pass, including new tests for content modes, content sidecar round-trip,pruneContentmtime deletion, config env/file/default precedence, and deferred content for in-progress messages in the incremental parserburn claudewith default config; confirm a file appears in~/.relayburn/content/RELAYBURN_CONTENT_STORE=off burn claude ...leaves no content directory entriesburn content prune --days 0clears the content dir🤖 Generated with Claude Code