Skip to content

Content sidecar with retention and opt-out (#33)#43

Merged
willwashburn merged 9 commits intomainfrom
content-sidecar-33
Apr 22, 2026
Merged

Content sidecar with retention and opt-out (#33)#43
willwashburn merged 9 commits intomainfrom
content-sidecar-33

Conversation

@willwashburn
Copy link
Copy Markdown
Member

Closes #33.

Summary

  • Reader emits { turns, content }. contentMode: 'full' captures assistant text/thinking/tool_use and user text/tool_result; 'hash-only' and 'off' return content: [].
  • New @relayburn/ledger sidecar at ~/.relayburn/content/<sessionId>.jsonl with appendContent / readContent / pruneContent, plus a config loader (env + config.json) for content.store and content.retentionDays (default full, 90 days).
  • burn content prune [--days <n>] subcommand, plus opportunistic mtime-based prune on every invocation. Ingest paths for claude/codex/opencode pass the resolved content mode and write content alongside turns.

Design notes

  • Main ledger is untouched — burn summary still reads only ledger.jsonl, so aggregate queries stay fast regardless of sidecar size.
  • Content directory is created lazily on first write. RELAYBURN_CONTENT_STORE=off produces no content files end-to-end.
  • Opencode and codex readers return content: [] for now; their content capture is a follow-up. The signature change keeps the API uniform.

Test plan

  • pnpm -r build passes
  • pnpm test — 89/89 pass, including new tests for content modes, content sidecar round-trip, pruneContent mtime deletion, config env/file/default precedence, and deferred content for in-progress messages in the incremental parser
  • Manual smoke: run burn claude with default config; confirm a file appears in ~/.relayburn/content/
  • Manual smoke: RELAYBURN_CONTENT_STORE=off burn claude ... leaves no content directory entries
  • Manual smoke: burn content prune --days 0 clears the content dir

🤖 Generated with Claude Code

Stores full conversation content in a parallel JSONL store keyed by
session, separate from the usage ledger. Defaults to full capture with a
90-day rolling retention; env/config switches select hash-only or off to
restore the pre-flip minimal-storage behavior.

- reader: ContentRecord type; parseClaudeSession emits { turns, content }
  with contentMode: full captures assistant text/thinking/tool_use and
  user text/tool_result; incremental path defers content alongside the
  message it belongs to.
- ledger: appendContent / readContent / pruneContent against
  ~/.relayburn/content/<sessionId>.jsonl. Config loader (env +
  config.json) for content.store and content.retentionDays.
- cli: burn content prune [--days <n>] subcommand plus opportunistic
  prune on every invocation; ingest paths for claude/codex/opencode pass
  the resolved contentMode and write content alongside turns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This comment was marked as resolved.

Copy link
Copy Markdown

@noodleonthecape noodleonthecape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two blocking issues:

  1. packages/reader/src/claude.ts writes content out of chronological order. In both the full parser and the incremental parser, all user-side records are accumulated first and then all assistant records are appended later (parseClaudeSession: around lines 108, 143, 174; parseClaudeSessionIncremental: around lines 469, 527, 562). That means the sidecar no longer preserves the actual conversation sequence for a session, for example user1, user2, assistant1, assistant2 instead of user1, assistant1, user2, assistant2. For anything that reads the sidecar as a transcript, that is a correctness bug.

  2. The docs currently say the content sidecar stores full conversation content by default, including tool inputs and outputs, but the Codex and OpenCode readers still always return content: [] (packages/reader/src/codex.ts, packages/reader/src/opencode.ts). Since content.store=full is the default and the CLI passes that mode through for all three ingestors, this is user-visible behavior drift, not just an internal TODO. Either narrow the docs/UX to Claude-only for now or implement the missing capture before merge.

I did not approve for those reasons.

Copy link
Copy Markdown

@barryollama barryollama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Overall this is a solid implementation of the content sidecar feature. A few issues to address:

🔴 Critical Issues

  1. packages/reader/src/claude.ts:100 - Default is but sets . This mismatch means content is never captured by default.

  2. packages/reader/src/codex.ts - Content capture is not implemented at all - always returns . The PR description mentions this is a follow-up, but the code should at least have a TODO or placeholder.

  3. packages/reader/src/opencode.ts - Same issue - content capture not implemented, always returns .

⚠️ Warnings

  1. packages/ledger/src/content.ts:56 - No validation on record size before appending. Large content could cause memory issues.

  2. packages/ledger/src/config.ts:48 - Silently ignores invalid JSON in config file. Consider logging a warning.

  3. packages/cli/src/commands/content.ts:62-64 - Swallows all errors in . Should at least log to stderr for debugging.

💡 Suggestions

  1. packages/reader/src/types.ts - Consider adding a validator/type guard for runtime validation of parsed content.

  2. packages/ledger/src/content.ts:106 - Could use a more efficient file iteration pattern for large content directories.

Let me add more detailed inline comments below.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

Open in Devin Review

@barryollama
Copy link
Copy Markdown

{
"body": "## Code Review\n\nOverall this is a solid implementation of the content sidecar feature. A few issues to address:\n\n### 🔴 Critical Issues\n\n1. packages/reader/src/claude.ts:100 - Default contentMode is 'off' but DEFAULT_CONFIG sets store: 'full'. This mismatch means content is never captured by default.\n\n2. packages/reader/src/codex.ts - Content capture is not implemented at all - always returns content: []. The PR description mentions this is a follow-up, but the code should at least have a TODO or placeholder.\n\n3. packages/reader/src/opencode.ts - Same issue - content capture not implemented, always returns content: [].\n\n### ⚠️ Warnings\n\n1. packages/ledger/src/content.ts:56 - No validation on record size before appending. Large content could cause memory issues.\n\n2. packages/ledger/src/config.ts:48 - Silently ignores invalid JSON in config file. Consider logging a warning.\n\n3. packages/cli/src/commands/content.ts:62-64 - Swallows all errors in opportunisticPrune(). Should at least log to stderr for debugging.\n\n### 💡 Suggestions\n\n1. packages/reader/src/types.ts - Consider adding a ContentRecord validator/type guard for runtime validation of parsed content.\n\n2. packages/ledger/src/content.ts:106 - Could use a more efficient file iteration pattern for large content directories.",
"commit_id": "ed684e6afe72a5324d62b70247ac6d1845982744",
"event": "COMMENT"
}

toolResult?: ContentToolResult;
}

export type ContentStoreMode = 'full' | 'hash-only' | 'off';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 'hash-only' content mode is accepted but functionally identical to 'off'

The ContentStoreMode type at packages/reader/src/types.ts:78 defines three modes: 'full' | 'hash-only' | 'off'. However, the parsers only check contentMode === 'full' (e.g., packages/reader/src/claude.ts:101), meaning 'hash-only' and 'off' both produce an empty content array. A user configuring store: 'hash-only' in their config would silently get no content captured at all, which may be surprising. The test at packages/reader/src/claude.test.ts:88-92 confirms this is the current intentional behavior, but it's worth documenting that hash-only is a placeholder for future functionality.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that this is worth calling out, but the modes aren't quite equivalent — hash-only still affects what lands in the main ledger, just not via the sidecar. The sidecar holds the text of prompts/responses/tool I/O; argsHash lives on TurnRecord.toolCalls (see reader/src/types.ts ToolCall), so hash-only = full usage + tool-call hashes + metadata in ledger.jsonl, no content sidecar records. That's the pre-flip behavior from #33. off is the same sidecar outcome (no files) with identical reader output, which is why the empty-array branch is shared. I'll happily add a doc comment on ContentStoreMode making that distinction explicit if it'd help readers.

devin-ai-integration[bot]

This comment was marked as resolved.

options: ParseOptions = {},
): Promise<TurnRecord[]> {
): Promise<ParseResult> {
const contentMode = options.contentMode ?? 'off';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Default mode mismatch: Default is 'off' here but DEFAULT_CONFIG sets store: 'full'. Content capture disabled by default.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional, not a mismatch — the reader is a pure library and defaults to 'off' so embedding it (e.g. in tests or non-CLI callers) doesn't quietly start synthesizing content records. The CLI always calls loadConfig() and passes the resolved mode through to the reader (see ingest.ts and commands/{claude,codex,opencode}.ts), so user-facing default behavior is store: 'full' per DEFAULT_CONFIG. Tests exercise both paths (claude.test.ts 'returns empty content array when contentMode is off (default)' for the library default, content.test.ts for the CLI default via loadConfig). I can add a doc comment on ParseOptions.contentMode if you want it made explicit in-code.

Comment thread packages/reader/src/codex.ts
Comment thread packages/reader/src/opencode.ts
records: ContentRecord[],
): Promise<void> {
const file = contentFilePath(sessionId);
const payload = records.map((r) => JSON.stringify(r)).join('\n') + '\n';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Consider size validation: No check on record size before appending. For large conversations, this could accumulate significant memory before writing.\n\nSuggestion: Add a warning or limit for individual records over a certain size (e.g., 1MB).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Punting on this one. burn is local-first and the input is a session log produced on the same machine by Claude/codex/opencode — there's no adversarial write path into appendContent. A per-record size cap would need a policy (drop? truncate? warn?) and a follow-on question of what to do when a legitimate tool_result actually is several MB (grep/ls output, test transcripts). I'd rather land that alongside a concrete pain point. Happy to revisit if we see it cause trouble.

Comment thread packages/ledger/src/config.ts
Comment thread packages/cli/src/commands/content.ts Outdated
- Preserve chronological order of content records by tagging user records
  with their line seq/offset and assistant records with the seq/offset of
  the message's first line, then merging before emit (claude reader,
  both full and incremental). Add interleaved-turns fixture + test.
- Validate sessionId in contentFilePath (and skip unsafe records in
  appendContent) so a malicious `sessionId: "../foo"` can't escape the
  content directory. New isValidSessionId export.
- Coordinate pruneContent with appendContent by acquiring the same
  per-session lock before stat+unlink, eliminating the write/prune race.
- Use inclusive cutoff (mtimeMs > cutoff → keep) so
  `pruneContent({olderThanMs: 0})` reliably clears the directory.
- README: clarify content sidecar is currently Claude-only; add TODO
  comments in codex/opencode readers pointing at the follow-up.
- loadConfig: distinguish ENOENT (silent) from invalid JSON (warn to
  stderr) when reading config.json.
- opportunisticPrune: log prune failures to stderr instead of swallowing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

@willwashburn
Copy link
Copy Markdown
Member Author

Addressed both blocking issues in 7c68f03:

  1. Chronological order: the reader now tags user content with its line offset and assistant content with the first-line offset of its message, then merges before emit. New interleaved-turns.jsonl fixture + claude.test.ts test asserts the sequence is user, assistant, user, assistant for a two-round conversation.
  2. Docs / behavior drift for codex/opencode: README now explicitly says the content sidecar is currently captured for claude-code only; codex/opencode still flow content.store through (so off still produces no files, correctness preserved) but have a TODO(#33-followup) comment at the return site. When capture lands, the CLI wiring is already a no-op.

Could you re-review when you have a minute? Summary of other review-driven fixes in the same commit: sessionId path-traversal sanitization (Copilot), per-session lock coordination between prune and append (Copilot), inclusive cutoff so --days 0 clears the dir (Copilot), config.json warn-on-bad-JSON (barryollama), and stderr log on opportunistic prune failures (barryollama).

willwashburn and others added 3 commits April 22, 2026 07:16
Devin Review caught an asymmetry: user content is filtered by
offset < endOffset, but assistant content was emitted for every
complete message in `order[]`. If a complete message appears in the
file after an incomplete one, its content was emitted this pass and
re-emitted next pass when the bytes are re-read from endOffset.
appendTurns dedups by messageId so turns were safe; appendContent
has no dedup so content records duplicated.

Fix: apply the same offset < endOffset check to assistant content.
Adds a test with [done, inprog, after] where only `done` commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swept up as an untracked file in 0b592cc; not part of issue #33.
Keep it on disk (gitignore covers image artifacts in follow-up).
Add burn-readme-banner.png to the repo and update README.md to display it at the top. This adds a visual banner to improve the README's presentation.
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@noodleonthecape noodleonthecape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after the follow-up commits. The earlier blockers look fixed: Claude content ordering is now preserved, the incremental parser no longer duplicates assistant content across the endOffset boundary, session IDs are validated before writing sidecar files, and the README now clearly scopes content capture to Claude for now.

I also installed deps locally, built the workspace, and ran the test suite successfully (corepack pnpm -r build, corepack pnpm test: 95/95 passing).

Looks good to me.

This comment was marked as resolved.

Before this change, `burn claude` called ingestSession to append turns
and content for the just-finished session file but never persisted a
cursor. A later `burn summary` (which calls ingestAll → ingestClaudeInto)
walked ~/.claude/projects/, found no cursor for the file, parsed from
offset 0, and re-appended. Turns were safe — appendTurns dedups via the
index sidecar — but appendContent has no dedup, so every content record
duplicated.

Fix: ingestSession now stats the file, runs the parse, and writes a
ClaudeCursor at offsetBytes=file.size so ingestAll's rotated/offset
check skips re-parsing. Export ingestSession for testability; new
commands/claude.test.ts asserts the cursor is written at EOF.

Scope note: codex and opencode wrappers have the same shape but their
content arrays are currently always empty, so duplication isn't reachable
there yet. Will need the equivalent cursor save when content capture
lands for those formats (tracked by the existing TODO(#33-followup)
comments in reader/src/codex.ts and reader/src/opencode.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop __setContentFileMtimeForTesting from @relayburn/ledger's public
  index.ts. Tests continue to import it via ./content.js relative path,
  so this just hides it from the package surface.
- burn content prune --days <bad>: parseRetention now returns null for
  invalid input and runContentPrune prints a concise error + help and
  exits 2, instead of letting the exception bubble as a stack trace.
  New commands/content.test.ts covers the invalid, numeric, and forever
  input paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

- types.ts: ContentToolResult.content changes from `string | unknown`
  (collapses to `unknown` in TS) to `unknown`, with a comment
  describing the shapes tool results can take.
- paths.ts: tighten sessionId cap from 255 to 128 so
  `${id}.jsonl` and `content.${id}.lock` stay well under the
  common 255-byte filename limit. Real session IDs (UUIDs and
  short `ses_*` / `sess_*` / `turn_*` strings) are comfortably
  within the new bound.
- config.ts: treat empty-string (and whitespace-only)
  `RELAYBURN_CONTENT_TTL_DAYS` as unset. Previously `Number('')`
  parsed as 0, which combined with opportunisticPrune on every CLI
  invocation would silently delete the entire content sidecar — a
  realistic hazard when CI/CD sets the var to an empty string or
  users try to "unset" it.

Tests cover all three: empty/whitespace TTL env falls through to
config and then default; oversized sessionIds are rejected; the
128-char boundary is accepted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@willwashburn willwashburn requested review from barryollama and removed request for barryollama April 22, 2026 17:30
@willwashburn willwashburn merged commit ea6edd8 into main Apr 22, 2026
@willwashburn willwashburn deleted the content-sidecar-33 branch April 22, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Design: content sidecar store with retention and opt-out

4 participants