Skip to content

relayburn-sdk: fix compare() turn-iteration parity with TS @relayburn/sdk@1.x (α-followup)#358

Merged
willwashburn merged 1 commit intomainfrom
compare-parity
May 7, 2026
Merged

relayburn-sdk: fix compare() turn-iteration parity with TS @relayburn/sdk@1.x (α-followup)#358
willwashburn merged 1 commit intomainfrom
compare-parity

Conversation

@willwashburn
Copy link
Copy Markdown
Member

Summary

α-followup #3 for the burn 2.0 conformance gate (epic #240). After #354 (shape conform), #356 (BigInt coercion in umbrella facade), and #357 (ledger.jsonl bootstrap), the napi-rs conformance suite at packages/sdk-node/test/conformance.test.js was 6/7 green — compare() was the lone holdout, returning analyzedTurns: 0 and an all-zero fidelity.summary against the cli-golden fixture while TS returned analyzedTurns: 7 and a populated coverage block.

Root cause

LedgerHandle::compare in crates/relayburn-sdk/src/query_verbs.rs was pre-filtering the queried turn list to only those whose model appeared in opts.models before computing analyzed_turns and the fidelity summary. The TS contract (packages/sdk/index.js::compare()) is the opposite: the models allow-list is honored inside buildCompareTable (which also pre-seeds requested-but-absent models as all-empty columns), and analyzedTurns = filteredTurns.length is taken AFTER the fidelity gate but BEFORE the model filter. Net effect on the cli-golden fixture: when callers asked to compare claude-sonnet-4-5 vs claude-opus-4-7 (neither present in the seven-turn fixture), every Rust-side turn was dropped at the early filter, collapsing the metadata to zeros. The deepStrictEqual diff on conformance was therefore on every counter under analyzedTurns / fidelity.summary.total / byClass / byGranularity / missingCoverage.

Fix

  • Drop the early requested_models retain from LedgerHandle::compare. Provider filtering and fidelity summarization now run on the full slice the ledger query returned, matching the TS sequence. Cell construction still honors the model allow-list via AnalyzeCompareOptions::models.
  • Remove the now-unused compare_model_id helper.
  • Inline-comment the path with the source-of-truth pointer back to packages/sdk/index.js::compare() so the next porter does not reintroduce the same shortcut.

Test deltas

  • compare_metadata_counts_requested_models_only was asserting the buggy behavior (analyzed_turns == 2 in a 3-turn fixture). Renamed to compare_metadata_counts_all_matched_turns_pre_models_filter and updated to the TS-parity expectations: analyzed_turns == 3 / summary.total == 3 even when requested models only match 2 of them. The unrequested model is still excluded from models / totals / cells.
  • New compare_reports_full_fidelity_summary_when_no_requested_model_appears regression covers the exact conformance scenario: requesting two models that are absent from the ledger still produces non-zero metadata describing the slice the comparison was drawn from.

Test plan

  • cargo build --workspace clean.
  • cargo test --workspace — 616 SDK tests + 2 SDK integration + 10 SDK-node binding tests pass, including the renamed and the new regression test.
  • Local napi conformance suite at packages/sdk-node/test/conformance.test.js now 7/7 green: summary, sessionCost, overhead, overheadTrim, hotspots, compare, ingest (verified by rebuilding pnpm run build:napi:debug and running with RELAYBURN_SDK_NAPI_BUILT=1).
  • pnpm -r run build && pnpm run test — 873 TS tests across all 8 published packages pass.

Refs #354 #356 #357 #355 #240.

…/sdk@1.x

The Rust SDK's `compare()` was pre-filtering the turn list by `opts.models`
*before* computing `analyzedTurns` and the fidelity summary. The TS contract
in `packages/sdk/index.js::compare()` does the opposite: `analyzedTurns =
filteredTurns.length` is taken AFTER the fidelity gate but BEFORE the
model allow-list, which is honored inside `buildCompareTable` (which also
pre-seeds requested-but-absent models as all-empty columns).

Net effect on the conformance fixture (`tests/fixtures/cli-golden`,
seven turns spanning sonnet-4-6 / haiku-4-5 / gpt-5-codex / sonnet-4-6):
calling `compare({ models: ['claude-sonnet-4-5', 'claude-opus-4-7'],
minFidelity: 'partial' })` — neither requested model is present in the
fixture — yielded `analyzedTurns: 0` and an all-zero fidelity summary on
the Rust side (every turn dropped at the early model filter), versus
`analyzedTurns: 7` plus a populated `byClass` / `byGranularity` /
`missingCoverage` block on TS. The conformance gate at
`packages/sdk-node/test/conformance.test.js` reduced this to a
`deepStrictEqual` failure on the only verb that hits this code path.

Fix: drop the early `requested_models` `retain` from `LedgerHandle::compare`.
Provider filtering and fidelity summarization now run on the full slice the
ledger query returned, matching the TS path; cell construction still
honors the model allow-list via `AnalyzeCompareOptions::models`. The
unused `compare_model_id` helper is removed.

Test deltas:
 - `compare_metadata_counts_requested_models_only` was asserting the buggy
   behavior. Renamed to `compare_metadata_counts_all_matched_turns_pre_models_filter`
   and updated to the TS-parity expectations: `analyzed_turns == 3` /
   `summary.total == 3` for a 3-turn fixture even when the requested models
   only match 2 of them.
 - New `compare_reports_full_fidelity_summary_when_no_requested_model_appears`
   regression covering the exact conformance scenario (request two models
   that are absent from the ledger; metadata still describes the slice).

Refs #240 (rust-port epic). Follows #354/#356/#357 and unblocks #355
(α-followup conformance gate). Local conformance now 7/7 green
(summary, sessionCost, overhead, overheadTrim, hotspots, compare, ingest).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

LedgerHandle::compare defers model filtering from turn computation to cell table construction. Analyzed turns and fidelity metadata are now computed from all turns passing provider and fidelity filters, regardless of model allow-list. The unused compare_model_id helper is removed. Test coverage added for metadata accuracy when models are requested but absent from the ledger.

Changes

Model Filtering Timing in Compare

Layer / File(s) Summary
Core Implementation
crates/relayburn-sdk/src/query_verbs.rs
Provider filtering and fidelity gate applied to turns; analyzed_turns and fidelity.summary computed before model allow-list. Unused compare_model_id helper removed.
Test Coverage
crates/relayburn-sdk/src/query_verbs.rs
Regression tests assert that analyzed_turns and fidelity.summary.total count all matched turns prior to model filtering, and metadata stays non-zero when requested models are absent with cells remaining empty.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • AgentWorkforce/burn#349: Both PRs refactor when "analyzed_turns" is computed—moving analyzed turn counting before model allow-list filtering in their respective code paths.
  • AgentWorkforce/burn#314: Main PR changes the SDK's compare filtering contract, which affects downstream consumer behavior relying on accurate metadata when models are filtered.

Poem

🐰 A rabbit hops through turns, all neat,
Counting them before the model-filter beat,
No premature snipping from the queue,
Metadata true, cells curated too! 🔍✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing compare() parity between Rust and TS SDK implementations for turn-iteration behavior.
Description check ✅ Passed The description comprehensively explains the root cause, the fix, test changes, and verification steps, all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch compare-parity

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/relayburn-sdk/src/query_verbs.rs (1)

2200-2212: ⚡ Quick win

Assert placeholder cells per requested model explicitly.

This loop only validates whatever cells happen to exist. If build_compare_table regresses and stops emitting flat cells for one requested model, the test can still pass as long as r.models contains the model name. Please assert at least one cell for each requested model before checking no_data/turns == 0.

Suggested test tightening
         assert!(r.models.contains(&"claude-sonnet-4-5".to_string()));
         assert!(r.models.contains(&"claude-opus-4-7".to_string()));
         // `claude-sonnet-4-6` is in the ledger but not requested, so it
         // does NOT appear in the result rows even though it contributed
         // to `analyzed_turns`.
         assert!(!r.models.contains(&"claude-sonnet-4-6".to_string()));
-        // Every cell for the requested-but-absent models is no_data.
-        for cell in &r.cells {
+        let sonnet_cells: Vec<_> = r
+            .cells
+            .iter()
+            .filter(|cell| cell.model == "claude-sonnet-4-5")
+            .collect();
+        let opus_cells: Vec<_> = r
+            .cells
+            .iter()
+            .filter(|cell| cell.model == "claude-opus-4-7")
+            .collect();
+        assert!(!sonnet_cells.is_empty());
+        assert!(!opus_cells.is_empty());
+        // Every cell for the requested-but-absent models is no_data.
+        for cell in sonnet_cells.into_iter().chain(opus_cells) {
             assert!(cell.no_data, "expected no_data for cell {cell:?}");
             assert_eq!(cell.turns, 0);
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/relayburn-sdk/src/query_verbs.rs` around lines 2200 - 2212, The test
currently only checks all cells in r.cells for no_data/turns==0 but doesn’t
ensure there’s at least one cell per requested model; update the test around the
build_compare_table/asserts to explicitly verify for each requested model (e.g.,
"claude-sonnet-4-5" and "claude-opus-4-7") that r.cells contains at least one
cell whose model equals that requested model, then for those cells assert
cell.no_data and cell.turns == 0; keep the existing negative assertion for
"claude-sonnet-4-6" and the r.models contains checks but tighten the cell-level
checks to target cells by their model value instead of iterating all r.cells
blindly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/relayburn-sdk/src/query_verbs.rs`:
- Around line 2200-2212: The test currently only checks all cells in r.cells for
no_data/turns==0 but doesn’t ensure there’s at least one cell per requested
model; update the test around the build_compare_table/asserts to explicitly
verify for each requested model (e.g., "claude-sonnet-4-5" and
"claude-opus-4-7") that r.cells contains at least one cell whose model equals
that requested model, then for those cells assert cell.no_data and cell.turns ==
0; keep the existing negative assertion for "claude-sonnet-4-6" and the r.models
contains checks but tighten the cell-level checks to target cells by their model
value instead of iterating all r.cells blindly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 15f4dc41-30f6-4ff9-bff5-c0abe3339ec3

📥 Commits

Reviewing files that changed from the base of the PR and between a0deb81 and 5209891.

📒 Files selected for processing (1)
  • crates/relayburn-sdk/src/query_verbs.rs

@willwashburn willwashburn merged commit c37f539 into main May 7, 2026
8 checks passed
@willwashburn willwashburn deleted the compare-parity branch May 7, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant