relayburn-cli: burn compare presenter (#248 D3)#314
Conversation
Wire `burn compare <model_a,model_b[,...]>` as a presenter over relayburn_sdk::analyze::compare's building blocks (build_compare_table + the per-turn fidelity gate); the SDK keeps the heavy lifting so a future burn__compare MCP tool wraps the same call. The CLI flag set mirrors the TS path 1:1 — positional comma-separated model list, --include-partial / --fidelity / --since / --project / --session / --workflow / --agent / --provider / --min-sample / --csv / --no-archive / --json — and the wire shape (cells ordering, rounding rules, fidelity-summary key order) goes byte-for-byte with the TS-captured cli-golden snapshots. --workflow / --agent / --provider short-circuit with a typed "not yet wired" error since the Rust ledger query layer doesn't expose the enrichment / provider filter today; the rest of the surface is fully wired. The two compare entries in tests/fixtures/cli-golden/invocations.json flip to enabled: true so BURN_GOLDEN=1 cargo test --test golden enforces parity. Because the on-disk fixture is a TS-style JSONL ledger and the Rust SDK reads SQLite, the golden runner now materializes a sibling SQLite home from the JSONL once per test run and points the binary at it; the import is loose-typed on tool_result_event / relationship / stamp so a fixture extension surfaces a stderr breadcrumb instead of panicking the whole runner. Smoke test loses `compare` from its "stub exits 1" loop and gains an explicit "rejects missing models" assertion against the wired path.
📝 WalkthroughWalkthroughThis PR implements a fully wired ChangesCLI Compare Command Implementation
Sequence DiagramsequenceDiagram
participant CLI as User/CLI
participant Parser as Arg Parser
participant Validator as Validation
participant Ledger as SDK Ledger
participant Compare as SDK Compare
participant Renderer as Output Renderer
CLI->>Parser: burn compare --models m1,m2 --since 7d
Parser->>Validator: CompareArgs struct
Validator->>Validator: Parse models, fidelity, since date
Validator->>Ledger: Open ledger with options
Ledger-->>Validator: Ledger handle
Validator->>Ledger: Query turns (with filters)
Ledger-->>Validator: Raw turns
Validator->>Compare: build_compare_table(turns, options)
Compare-->>Validator: CompareTable
Validator->>Renderer: Select output format (JSON/CSV/TTY)
Renderer->>Renderer: Format and render data
Renderer-->>CLI: Write stdout (byte-equivalent)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 459aec276e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| Ok(Some(raw.to_string())) |
There was a problem hiding this comment.
Normalize ISO
--since input to UTC before filtering
Passing an ISO timestamp through unchanged here can produce incorrect compare slices whenever users provide offsets (for example 2026-05-06T00:00:00-07:00), because Ledger::query_turns applies since via lexicographic string comparison against stored ...Z timestamps. Without converting to a canonical UTC ISO string first (as the TS path does), older turns can be incorrectly included or newer turns excluded depending on offset formatting.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 822963c. Added normalize_iso_to_utc_z that parses RFC 3339 with optional Z / +HH:MM / -HH:MM offsets and re-emits the canonical UTC YYYY-MM-DDTHH:MM:SS.mmmZ form before threading into Ledger::query_turns. Offsets are subtracted to land on UTC, sub-millisecond fractional digits truncate to 3, and shorter fractions are right-padded to .mmm. Unit tests cover positive/negative offsets, fractional widening, lowercase-z, date-only, and reject-garbage cases. Avoided pulling in chrono — the existing days_to_ymd Hinnant routine got an inverse ymd_to_days so the helper stays self-contained.
| .map(|d| d.as_secs()) | ||
| .unwrap_or(0); | ||
| let when = now.saturating_sub(secs_back); | ||
| return Ok(Some(format_iso_z(when))); |
There was a problem hiding this comment.
Preserve fractional seconds in relative
--since output
The relative-range branch emits timestamps as YYYY-MM-DDTHH:MM:SSZ (no milliseconds), but ledger rows are stored with fractional seconds (...SS.mmmZ) and compared lexicographically. That makes same-second rows (e.g. ...12.500Z) sort as older than ...12Z, so --since 24h/7d can drop valid boundary turns. Emitting a fully canonical ISO string with milliseconds avoids this off-by-subsecond exclusion.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 822963c. The relative-range branch now feeds through format_iso_z_ms(when, 0), which always emits YYYY-MM-DDTHH:MM:SS.000Z. Verified the resulting cutoff lex-compares correctly against ledger rows with non-zero subsecond fragments via a dedicated unit test (normalize_since_relative_format_is_lex_compatible_with_ledger_rows), plus a shape assertion on 7d that pins the 24-char canonical form.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/fixtures/cli-golden/invocations.json (1)
51-60: ⚡ Quick winAdd a golden invocation for the new CSV presenter too.
This PR promises byte-for-byte parity for the compare outputs, but the golden gate only covers TTY and JSON right now. A
compare --csvsnapshot here would lock down the one format most sensitive to column ordering and rounding drift.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/fixtures/cli-golden/invocations.json` around lines 51 - 60, Add a new golden invocation entry mirroring the existing "compare" and "compare-json" cases but with the CSV presenter: create an object like the others with "name": "compare-csv", "args": ["compare", "claude-sonnet-4-6,claude-haiku-4-5", "--include-partial", "--csv"], "expectStatus": 0, and "enabled": true so the test suite captures the byte-for-byte CSV output; place it alongside the existing "compare" and "compare-json" entries in invocations.json.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/relayburn-cli/src/commands/compare.rs`:
- Around line 55-60: Replace direct eprintln!/stderr writes and early returns in
compare.rs (e.g., the positional models branch using args.models/as_deref and
messages like NEEDS_MODELS_MSG) with calls to the central error reporter
function report_error so errors follow the {"error": ...} envelope; for each
branch cited (lines around the models check, invalid fidelity, conflicting
--json/--csv, invalid --min-sample, and the other ranges noted) call
report_error with a descriptive message and return the appropriate exit code
instead of printing, preserving the original exit values and messages.
---
Nitpick comments:
In `@tests/fixtures/cli-golden/invocations.json`:
- Around line 51-60: Add a new golden invocation entry mirroring the existing
"compare" and "compare-json" cases but with the CSV presenter: create an object
like the others with "name": "compare-csv", "args": ["compare",
"claude-sonnet-4-6,claude-haiku-4-5", "--include-partial", "--csv"],
"expectStatus": 0, and "enabled": true so the test suite captures the
byte-for-byte CSV output; place it alongside the existing "compare" and
"compare-json" entries in invocations.json.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 36dee19c-7c18-4eb6-8e74-ba78fa40c837
📒 Files selected for processing (8)
CHANGELOG.mdcrates/relayburn-cli/src/cli.rscrates/relayburn-cli/src/commands/compare.rscrates/relayburn-cli/src/main.rscrates/relayburn-cli/tests/golden.rscrates/relayburn-cli/tests/smoke.rscrates/relayburn-sdk/src/lib.rstests/fixtures/cli-golden/invocations.json
- P1: normalize ISO `--since` inputs to UTC `...mmmZ` so offset-bearing
timestamps (e.g. `2026-05-06T00:00:00-07:00`) lex-compare correctly
against ledger rows. New `normalize_iso_to_utc_z` parses RFC 3339
with `Z`, `+HH:MM`, `-HH:MM`, optional fractional seconds, and
date-only forms; truncates sub-millisecond precision. `ymd_to_days`
added as the round-trip inverse of the existing `days_to_ymd`.
- P2: relative `--since` (e.g. `7d`, `24h`) now emits
`YYYY-MM-DDTHH:MM:SS.000Z` rather than `...SSZ`, so same-second
ledger rows with sub-second precision (e.g. `...12.500Z`) sort
consistently against the cutoff.
- CodeRabbit: every `eprintln!`-then-return-exit-code branch in
`run_inner` now returns `Err(anyhow!(...))`. The outer `run` already
routes errors through `report_error`, so `burn --json compare ...`
emits the documented `{"error": "..."}` envelope on stdout instead
of plain stderr. Exit code remains 2. Messages stripped of their
embedded `burn:` prefix since `report_error` adds one.
Tests: 11 new unit tests covering the offset, fractional-second,
date-only, lowercase-`z`, and reject-garbage cases plus a sanity check
that the canonical `.000Z` format lex-orders correctly against ledger
rows. `cargo test --workspace` and `BURN_GOLDEN=1 cargo test --test
golden -p relayburn-cli` both green.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/relayburn-cli/src/commands/compare.rs (2)
967-979: 💤 Low valueRemove unused
owned_emptyvariable.The variable
owned_emptyis declared but never referenced — the closurecell_forcallsempty_cell()directly inunwrap_or_else. The suppressionlet _ = &owned_empty;and accompanying comment are misleading since the variable is truly unused.♻️ Proposed fix
- let owned_empty = empty_cell(); let cell_for = |m: &str, cat: &str| -> CompareCell { table .cells .get(m) .and_then(|by| by.get(cat)) .cloned() .unwrap_or_else(empty_cell) }; - // Suppress the unused-variable warning on `owned_empty`; it's only - // referenced when we run a corner case where neither cells.get nor - // by_cat.get is hit, which the table builder doesn't produce today. - let _ = &owned_empty;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-cli/src/commands/compare.rs` around lines 967 - 979, Remove the truly unused variable owned_empty and its suppression snippet; delete the line declaring let owned_empty = empty_cell(); the accompanying comment about suppressing warnings and the line let _ = &owned_empty; — leave the closure cell_for, its use of empty_cell() in unwrap_or_else, and the surrounding code intact so behavior is unchanged.
503-530: 💤 Low value
std::env::set_var/remove_varcan be unsound in multi-threaded contexts.These functions are documented as potentially unsafe when called from multi-threaded programs (particularly in Rust 1.80+ or with TSAN). Since this is a CLI tool with single-threaded startup and the comment at lines 130-131 notes the SDK doesn't currently read
RELAYBURN_ARCHIVE, this is low risk today.If the SDK gains threading or async ledger access in the future, consider using a process-scoped config struct or lazy_static/once_cell instead of environment manipulation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-cli/src/commands/compare.rs` around lines 503 - 530, ArchiveOverride currently calls std::env::set_var/remove_var in activate and Drop which is unsafe in multi-threaded contexts; replace the global env mutation with a process-scoped static override (e.g. a OnceLock/OnceCell or lazy_static boolean/Option) that ArchiveOverride::activate sets and Drop clears, and update any code that checks RELAYBURN_ARCHIVE to read that static instead of the environment; ensure the symbols ArchiveOverride::activate and the Drop impl are changed to manage this dedicated static so you avoid calling std::env::set_var/remove_var.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/relayburn-cli/src/commands/compare.rs`:
- Around line 967-979: Remove the truly unused variable owned_empty and its
suppression snippet; delete the line declaring let owned_empty = empty_cell();
the accompanying comment about suppressing warnings and the line let _ =
&owned_empty; — leave the closure cell_for, its use of empty_cell() in
unwrap_or_else, and the surrounding code intact so behavior is unchanged.
- Around line 503-530: ArchiveOverride currently calls
std::env::set_var/remove_var in activate and Drop which is unsafe in
multi-threaded contexts; replace the global env mutation with a process-scoped
static override (e.g. a OnceLock/OnceCell or lazy_static boolean/Option) that
ArchiveOverride::activate sets and Drop clears, and update any code that checks
RELAYBURN_ARCHIVE to read that static instead of the environment; ensure the
symbols ArchiveOverride::activate and the Drop impl are changed to manage this
dedicated static so you avoid calling std::env::set_var/remove_var.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d6ecb9b0-afce-4720-a29e-2a6b62c09890
📒 Files selected for processing (1)
crates/relayburn-cli/src/commands/compare.rs
…→SQLite bridge in favor of D1's
There was a problem hiding this comment.
🧹 Nitpick comments (1)
CHANGELOG.md (1)
9-9: 💤 Low valueMinor documentation gap: missing
--workflow,--agent, and--providerfrom flag list.The changelog entry states it matches "the TS CLI flag set" but omits
--workflow,--agent, and--providerfrom the parenthetical list, even though these flags are defined inCompareArgs(and short-circuit with "not yet wired" errors per the PR objectives). Consider adding them with a note about their stub status, or clarifying that only the fully-wired flags are listed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CHANGELOG.md` at line 9, The changelog entry for relayburn-cli omits flags that exist in the CompareArgs struct; update the sentence listing CLI flags to include `--workflow`, `--agent`, and `--provider` (and optionally mark them as "stubbed/not yet wired" or note they short-circuit with "not yet wired" errors) so the parenthetical accurately reflects CompareArgs and the implemented flag surface; locate the mention in CHANGELOG.md and add those three flags to the parenthetical list with a brief clarifying note.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@CHANGELOG.md`:
- Line 9: The changelog entry for relayburn-cli omits flags that exist in the
CompareArgs struct; update the sentence listing CLI flags to include
`--workflow`, `--agent`, and `--provider` (and optionally mark them as
"stubbed/not yet wired" or note they short-circuit with "not yet wired" errors)
so the parenthetical accurately reflects CompareArgs and the implemented flag
surface; locate the mention in CHANGELOG.md and add those three flags to the
parenthetical list with a brief clarifying note.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 6992b9b4-5860-42f0-9a4d-8e46e8040521
📒 Files selected for processing (6)
CHANGELOG.mdcrates/relayburn-cli/src/cli.rscrates/relayburn-cli/src/main.rscrates/relayburn-cli/tests/smoke.rscrates/relayburn-sdk/src/lib.rstests/fixtures/cli-golden/invocations.json
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/relayburn-cli/src/main.rs
- crates/relayburn-cli/tests/smoke.rs
Folds D1+D2+D3 (PRs #315/#312/#314) into the D4 state branch. Resolutions: - CHANGELOG.md: union all four bullets, alphabetized. - crates/relayburn-cli/src/cli.rs: keep all five typed Command variants (Summary/Hotspots/Overhead/Compare/State); merge ValueEnum import; keep D4's state-related types (StateArgs et al.) plus D2/D3's OverheadArgs / CompareArgs / OverheadKind. - crates/relayburn-cli/src/main.rs: auto-merged — five typed dispatch arms. - crates/relayburn-cli/tests/smoke.rs: drop REAL_COMMANDS allow-list and early-continue; remove "state" from UNIMPLEMENTED_SUBCOMMANDS, leaving ["run", "ingest", "mcp-server"]. - tests/fixtures/cli-golden/ledger/.gitignore: superset of patterns. - Cargo.lock: regenerated from origin/main. - tests/fixtures/cli-golden/snapshots/state-status*.stdout.txt: keep D4's 2.0 SQLite layout (deliberate divergence) but refresh row counts to reflect the bootstrapped fixture (golden.rs on main now replays ledger.jsonl into burn.sqlite before running, so the fixture has 18 rows instead of 0).
Summary
Wire
burn compare <model_a,model_b[,...]>as a presenter overrelayburn_sdk::analyze::compare's building blocks (build_compare_table+ the per-turn fidelity gate); the SDK keeps the heavy lifting so a futureburn__compareMCP tool wraps the same call.The CLI flag set mirrors the TS path 1:1 — positional comma-separated model list,
--include-partial/--fidelity/--since/--project/--session/--workflow/--agent/--provider/--min-sample/--csv/--no-archive/--json— and the wire shape (cells ordering, rounding rules, fidelity-summary key order) goes byte-for-byte with the TS-captured cli-golden snapshots.--workflow/--agent/--providershort-circuit with a typed "not yet wired" error since the Rust ledger query layer doesn't expose the enrichment / provider filter today; the rest of the surface is fully wired.The two compare entries in
tests/fixtures/cli-golden/invocations.jsonflip toenabled: truesoBURN_GOLDEN=1 cargo test --test goldenenforces parity. Because the on-disk fixture is a TS-style JSONL ledger and the Rust SDK reads SQLite, the golden runner now materializes a sibling SQLite home from the JSONL once per test run and points the binary at it; the import is loose-typed ontool_result_event/relationship/stampso a fixture extension surfaces a stderr breadcrumb instead of panicking the whole runner. Smoke test losescomparefrom its "stub exits 1" loop and gains an explicit "rejects missing models" assertion against the wired path.Test plan
cargo test --workspaceBURN_GOLDEN=1 cargo test --test golden -p relayburn-cli—compare+compare-jsonpass byte-for-byteburn compare --helpmatches TS flag setburn compare claude-sonnet-4-6,claude-haiku-4-5 --include-partialproduces TS-equivalent output🤖 Generated with Claude Code