relayburn-ledger: add list_content_session_ids + load_config (#279)#292
relayburn-ledger: add list_content_session_ids + load_config (#279)#292willwashburn merged 2 commits intomainfrom
Conversation
Adds the two read APIs that `relayburn-ingest` (#277, #278) needs from the SQLite-only ledger: - `Ledger::list_content_session_ids()` runs `SELECT DISTINCT session_id FROM content` against `content.sqlite` and returns the set, filtering malformed ids defensively (mirrors the TS sqlite-adapter). Powers the skip filter in `reingest_missing_content`. - `BurnConfig` + `load_config` mirror `packages/ledger/src/config.ts`: `content.store: full | hash-only | off` and `content.retentionDays: number | 'forever'`, sourced from `$RELAYBURN_HOME/config.json` with `RELAYBURN_CONTENT_STORE` / `RELAYBURN_CONTENT_TTL_DAYS` env overrides on top. Empty env strings fall back to default to avoid the `RELAYBURN_CONTENT_TTL_DAYS=` → zero-day-retention footgun. Config is co-located in `relayburn-ledger` to match the TS layout (`@relayburn/ledger` exports `loadConfig`); the crate already depends on `relayburn-reader` for `ContentStoreMode`, so no new dependency edge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a user-level JSON config loader with environment-variable overrides and permissive parsing to ChangesConfiguration System & Session ID Querying
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91a74f5d5c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if f < 0.0 { | ||
| return Some(Retention::Forever); | ||
| } | ||
| return Some(Retention::Days(f as u64)); |
There was a problem hiding this comment.
Preserve fractional retention days from config/env
The Rust parser truncates fractional TTL values to an integer day (f as u64), which diverges from the TS source behavior in packages/ledger/src/config.ts (normalizeRetention keeps finite numbers as-is). If a user sets RELAYBURN_CONTENT_TTL_DAYS=0.5 or "retentionDays": 0.5, this code resolves to 0 days instead of 12 hours, causing content to be pruned much earlier than intended.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 44f64d3. Switched Retention::Days from u64 to f64 so finite fractional values survive end-to-end — RELAYBURN_CONTENT_TTL_DAYS=0.5 now resolves to Days(0.5) and as_millis() returns Some(43_200_000) (12 hours). New tests cover both the env and file paths.
| pub(crate) fn list_session_ids(conn: &Connection) -> Result<HashSet<String>> { | ||
| let mut stmt = conn.prepare("SELECT DISTINCT session_id FROM content")?; | ||
| let rows = stmt | ||
| .query_map([], |r| r.get::<_, String>(0))? |
There was a problem hiding this comment.
Skip malformed session_id rows instead of failing
list_session_ids is documented to filter malformed IDs, but r.get::<_, String>(0) makes the whole call fail if any session_id is stored as a non-text value, so the filter never runs. Because content is not a STRICT table (schema.rs), a single bad row can cause list_content_session_ids() to return an error and break callers that rely on this as a best-effort skip set.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 44f64d3. list_session_ids now iterates with query() and skips rows whose session_id fails to decode as String (via let Ok(...) = row.get::<_, String>(0) else { continue }). New test injects a BLOB-typed session_id row — TEXT affinity preserves BLOBs, and rusqlite returns InvalidType for them — to exercise the skip path; the call returns the three valid ids and ignores the corrupt one.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/relayburn-ledger/src/tests.rs (1)
679-701: ⚡ Quick winAdd one malformed-row coverage check to match the defensive-filtering contract
This test verifies distinctness, but not the “ignore malformed IDs” behavior. Consider injecting one corrupted
session_idrow (via direct SQL in test) and asserting the method still returns only valid IDs.Possible test extension
fn list_content_session_ids_returns_distinct_set() { @@ l.append_content(&[ make_content("ses_a", "m1", "alpha"), make_content("ses_a", "m2", "beta"), make_content("ses_b", "m1", "gamma"), make_content("ses_c", "m1", "delta"), ]) .unwrap(); + + // Simulate corruption bypassing writer validation. + l.conns + .content + .execute("UPDATE content SET session_id = 123 WHERE message_id = 'm2'", []) + .unwrap(); @@ - assert_eq!(ids.len(), 3); + assert_eq!(ids.len(), 2); assert!(ids.contains("ses_a")); - assert!(ids.contains("ses_b")); assert!(ids.contains("ses_c")); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-ledger/src/tests.rs` around lines 679 - 701, Extend the list_content_session_ids_returns_distinct_set test to also insert one malformed row directly into the underlying SQLite DB and assert the method filters it out: after setting up l (open_in) and appending valid rows with append_content, obtain a DB connection (via the same storage used by list_content_session_ids), execute a direct SQL INSERT that writes a corrupted session_id value (e.g., NULL or an invalid type/text) into the content table, then call l.list_content_session_ids() again and assert the returned set still has length 3 and contains only "ses_a", "ses_b", "ses_c"; this ensures list_content_session_ids() correctly ignores malformed rows.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/relayburn-ledger/src/content.rs`:
- Around line 138-143: The function list_session_ids currently collects
query_map results with .collect::<rusqlite::Result<Vec<_>>>()? which propagates
row decoding errors; change it to iterate the query_map iterator and skip rows
that return Err from row.get rather than returning Err to the caller (e.g.,
replace the collect call with a manual loop or use .filter_map(|res| res.ok())
to only keep successfully decoded Strings), then apply is_valid_session_id and
collect into the HashSet; reference the existing list_session_ids, conn.prepare,
the query_map call, row.get::<_, String>(0), and is_valid_session_id when making
this change.
---
Nitpick comments:
In `@crates/relayburn-ledger/src/tests.rs`:
- Around line 679-701: Extend the list_content_session_ids_returns_distinct_set
test to also insert one malformed row directly into the underlying SQLite DB and
assert the method filters it out: after setting up l (open_in) and appending
valid rows with append_content, obtain a DB connection (via the same storage
used by list_content_session_ids), execute a direct SQL INSERT that writes a
corrupted session_id value (e.g., NULL or an invalid type/text) into the content
table, then call l.list_content_session_ids() again and assert the returned set
still has length 3 and contains only "ses_a", "ses_b", "ses_c"; this ensures
list_content_session_ids() correctly ignores malformed rows.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: eb387229-6a55-49bb-91bb-b4a183275c0d
📒 Files selected for processing (4)
crates/relayburn-ledger/src/config.rscrates/relayburn-ledger/src/content.rscrates/relayburn-ledger/src/lib.rscrates/relayburn-ledger/src/tests.rs
- Preserve fractional retentionDays. Switch `Retention::Days` from `u64` to `f64` so `RELAYBURN_CONTENT_TTL_DAYS=0.5` round-trips as a 12-hour window instead of being silently truncated to 0 days. Mirrors TS `normalizeRetention`, which keeps any finite JS number as-is. - Skip malformed `session_id` rows in `list_session_ids` instead of aborting the whole call. The `content` table is non-STRICT, so a BLOB-typed value can land in a TEXT-affinity column and would otherwise poison the entire skip set. - Cover both behaviors with new tests: env + file fractional retention paths and a BLOB-injected row that exercises the row-decode skip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/relayburn-ledger/src/config.rs (1)
133-145: 💤 Low valueUnused
Serializederive.
RawConfigandRawContentare only deserialized from the config file. TheSerializederive is unused and could be removed, but this is a minor cleanup.♻️ Suggested cleanup
-#[derive(Debug, Default, Deserialize, Serialize)] +#[derive(Debug, Default, Deserialize)] struct RawConfig { #[serde(default)] content: Option<RawContent>, } -#[derive(Debug, Default, Deserialize, Serialize)] +#[derive(Debug, Default, Deserialize)] struct RawContent {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-ledger/src/config.rs` around lines 133 - 145, The derives for Serialize on structs RawConfig and RawContent are unused; remove Serialize from the #[derive(...)] lists for both RawConfig and RawContent and, if present, delete any now-unused serde::Serialize import so the compiler no longer warns about an unused derive/import while keeping Deserialize and Debug/Default intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/relayburn-ledger/src/config.rs`:
- Around line 133-145: The derives for Serialize on structs RawConfig and
RawContent are unused; remove Serialize from the #[derive(...)] lists for both
RawConfig and RawContent and, if present, delete any now-unused serde::Serialize
import so the compiler no longer warns about an unused derive/import while
keeping Deserialize and Debug/Default intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 51fa5415-5f30-4f32-b6b9-47ddace7ba01
📒 Files selected for processing (3)
crates/relayburn-ledger/src/config.rscrates/relayburn-ledger/src/content.rscrates/relayburn-ledger/src/tests.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/relayburn-ledger/src/tests.rs
- crates/relayburn-ledger/src/content.rs
Summary
Ledger::list_content_session_ids()(returnsHashSet<String>of distinct ids incontent.sqlite, filters malformed ids) sorelayburn-ingest::reingest_missing_content([Rust port] relayburn-ingest: gap-warning state machine + reingest_missing_content #278) can skip already-covered sessions.BurnConfig+load_config/load_config_atmirroringpackages/ledger/src/config.tsbyte-for-byte:content.store: full|hash-only|off,content.retentionDays: number|'forever', sourced from$RELAYBURN_HOME/config.jsonwithRELAYBURN_CONTENT_STORE/RELAYBURN_CONTENT_TTL_DAYSenv overrides. Empty env string falls back to default to avoid theRELAYBURN_CONTENT_TTL_DAYS=→ zero-day-retention footgun.Config: kept inrelayburn-ledgerto mirror@relayburn/ledger(TS source of truth co-locatesloadConfigthere). No new dependency edge — the crate already depends onrelayburn-readerforContentStoreMode, andrelayburn-ingestalready depends onrelayburn-ledger. Splitting it into a newrelayburn-configcrate would create a third pole of dependencies for a ~150-line file with no separate consumers, so the layering judgement is to keep it co-located until we have a second non-ledger consumer.Test plan
cargo test -p relayburn-ledger(44 tests; covers both new APIs:list_content_session_ids_returns_distinct_setplus sevenconfig::tests::*cases for default / file-override / env-override / empty-env / malformed-JSON / non-object / unknown-store-fallback).cargo build --workspaceclean.cargo test --workspaceclean.pnpm run build && pnpm run testclean (873 TS tests pass).Closes #279.