relayburn-sdk: fix compare() turn-iteration parity with TS @relayburn/sdk@1.x (α-followup)#358
relayburn-sdk: fix compare() turn-iteration parity with TS @relayburn/sdk@1.x (α-followup)#358willwashburn merged 1 commit intomainfrom
Conversation
…/sdk@1.x
The Rust SDK's `compare()` was pre-filtering the turn list by `opts.models`
*before* computing `analyzedTurns` and the fidelity summary. The TS contract
in `packages/sdk/index.js::compare()` does the opposite: `analyzedTurns =
filteredTurns.length` is taken AFTER the fidelity gate but BEFORE the
model allow-list, which is honored inside `buildCompareTable` (which also
pre-seeds requested-but-absent models as all-empty columns).
Net effect on the conformance fixture (`tests/fixtures/cli-golden`,
seven turns spanning sonnet-4-6 / haiku-4-5 / gpt-5-codex / sonnet-4-6):
calling `compare({ models: ['claude-sonnet-4-5', 'claude-opus-4-7'],
minFidelity: 'partial' })` — neither requested model is present in the
fixture — yielded `analyzedTurns: 0` and an all-zero fidelity summary on
the Rust side (every turn dropped at the early model filter), versus
`analyzedTurns: 7` plus a populated `byClass` / `byGranularity` /
`missingCoverage` block on TS. The conformance gate at
`packages/sdk-node/test/conformance.test.js` reduced this to a
`deepStrictEqual` failure on the only verb that hits this code path.
Fix: drop the early `requested_models` `retain` from `LedgerHandle::compare`.
Provider filtering and fidelity summarization now run on the full slice the
ledger query returned, matching the TS path; cell construction still
honors the model allow-list via `AnalyzeCompareOptions::models`. The
unused `compare_model_id` helper is removed.
Test deltas:
- `compare_metadata_counts_requested_models_only` was asserting the buggy
behavior. Renamed to `compare_metadata_counts_all_matched_turns_pre_models_filter`
and updated to the TS-parity expectations: `analyzed_turns == 3` /
`summary.total == 3` for a 3-turn fixture even when the requested models
only match 2 of them.
- New `compare_reports_full_fidelity_summary_when_no_requested_model_appears`
regression covering the exact conformance scenario (request two models
that are absent from the ledger; metadata still describes the slice).
Refs #240 (rust-port epic). Follows #354/#356/#357 and unblocks #355
(α-followup conformance gate). Local conformance now 7/7 green
(summary, sessionCost, overhead, overheadTrim, hotspots, compare, ingest).
📝 WalkthroughWalkthrough
ChangesModel Filtering Timing in Compare
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/relayburn-sdk/src/query_verbs.rs (1)
2200-2212: ⚡ Quick winAssert placeholder cells per requested model explicitly.
This loop only validates whatever cells happen to exist. If
build_compare_tableregresses and stops emitting flat cells for one requested model, the test can still pass as long asr.modelscontains the model name. Please assert at least one cell for each requested model before checkingno_data/turns == 0.Suggested test tightening
assert!(r.models.contains(&"claude-sonnet-4-5".to_string())); assert!(r.models.contains(&"claude-opus-4-7".to_string())); // `claude-sonnet-4-6` is in the ledger but not requested, so it // does NOT appear in the result rows even though it contributed // to `analyzed_turns`. assert!(!r.models.contains(&"claude-sonnet-4-6".to_string())); - // Every cell for the requested-but-absent models is no_data. - for cell in &r.cells { + let sonnet_cells: Vec<_> = r + .cells + .iter() + .filter(|cell| cell.model == "claude-sonnet-4-5") + .collect(); + let opus_cells: Vec<_> = r + .cells + .iter() + .filter(|cell| cell.model == "claude-opus-4-7") + .collect(); + assert!(!sonnet_cells.is_empty()); + assert!(!opus_cells.is_empty()); + // Every cell for the requested-but-absent models is no_data. + for cell in sonnet_cells.into_iter().chain(opus_cells) { assert!(cell.no_data, "expected no_data for cell {cell:?}"); assert_eq!(cell.turns, 0); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-sdk/src/query_verbs.rs` around lines 2200 - 2212, The test currently only checks all cells in r.cells for no_data/turns==0 but doesn’t ensure there’s at least one cell per requested model; update the test around the build_compare_table/asserts to explicitly verify for each requested model (e.g., "claude-sonnet-4-5" and "claude-opus-4-7") that r.cells contains at least one cell whose model equals that requested model, then for those cells assert cell.no_data and cell.turns == 0; keep the existing negative assertion for "claude-sonnet-4-6" and the r.models contains checks but tighten the cell-level checks to target cells by their model value instead of iterating all r.cells blindly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/relayburn-sdk/src/query_verbs.rs`:
- Around line 2200-2212: The test currently only checks all cells in r.cells for
no_data/turns==0 but doesn’t ensure there’s at least one cell per requested
model; update the test around the build_compare_table/asserts to explicitly
verify for each requested model (e.g., "claude-sonnet-4-5" and
"claude-opus-4-7") that r.cells contains at least one cell whose model equals
that requested model, then for those cells assert cell.no_data and cell.turns ==
0; keep the existing negative assertion for "claude-sonnet-4-6" and the r.models
contains checks but tighten the cell-level checks to target cells by their model
value instead of iterating all r.cells blindly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 15f4dc41-30f6-4ff9-bff5-c0abe3339ec3
📒 Files selected for processing (1)
crates/relayburn-sdk/src/query_verbs.rs
Summary
α-followup #3 for the burn 2.0 conformance gate (epic #240). After #354 (shape conform), #356 (BigInt coercion in umbrella facade), and #357 (ledger.jsonl bootstrap), the napi-rs conformance suite at
packages/sdk-node/test/conformance.test.jswas 6/7 green —compare()was the lone holdout, returninganalyzedTurns: 0and an all-zerofidelity.summaryagainst the cli-golden fixture while TS returnedanalyzedTurns: 7and a populated coverage block.Root cause
LedgerHandle::compareincrates/relayburn-sdk/src/query_verbs.rswas pre-filtering the queried turn list to only those whose model appeared inopts.modelsbefore computinganalyzed_turnsand the fidelity summary. The TS contract (packages/sdk/index.js::compare()) is the opposite: themodelsallow-list is honored insidebuildCompareTable(which also pre-seeds requested-but-absent models as all-empty columns), andanalyzedTurns = filteredTurns.lengthis taken AFTER the fidelity gate but BEFORE the model filter. Net effect on the cli-golden fixture: when callers asked to compareclaude-sonnet-4-5vsclaude-opus-4-7(neither present in the seven-turn fixture), every Rust-side turn was dropped at the early filter, collapsing the metadata to zeros. The deepStrictEqual diff on conformance was therefore on every counter underanalyzedTurns/fidelity.summary.total/byClass/byGranularity/missingCoverage.Fix
requested_modelsretainfromLedgerHandle::compare. Provider filtering and fidelity summarization now run on the full slice the ledger query returned, matching the TS sequence. Cell construction still honors the model allow-list viaAnalyzeCompareOptions::models.compare_model_idhelper.packages/sdk/index.js::compare()so the next porter does not reintroduce the same shortcut.Test deltas
compare_metadata_counts_requested_models_onlywas asserting the buggy behavior (analyzed_turns == 2 in a 3-turn fixture). Renamed tocompare_metadata_counts_all_matched_turns_pre_models_filterand updated to the TS-parity expectations:analyzed_turns == 3/summary.total == 3even when requested models only match 2 of them. The unrequested model is still excluded frommodels/totals/ cells.compare_reports_full_fidelity_summary_when_no_requested_model_appearsregression covers the exact conformance scenario: requesting two models that are absent from the ledger still produces non-zero metadata describing the slice the comparison was drawn from.Test plan
cargo build --workspaceclean.cargo test --workspace— 616 SDK tests + 2 SDK integration + 10 SDK-node binding tests pass, including the renamed and the new regression test.packages/sdk-node/test/conformance.test.jsnow 7/7 green:summary,sessionCost,overhead,overheadTrim,hotspots,compare,ingest(verified by rebuildingpnpm run build:napi:debugand running withRELAYBURN_SDK_NAPI_BUILT=1).pnpm -r run build && pnpm run test— 873 TS tests across all 8 published packages pass.Refs #354 #356 #357 #355 #240.