relayburn-sdk: fix compare() turn-iteration parity with TS @relayburn/sdk@1.x (α-followup) by willwashburn · Pull Request #358 · AgentWorkforce/burn

willwashburn · 2026-05-07T06:33:45Z

Summary

α-followup #3 for the burn 2.0 conformance gate (epic #240). After #354 (shape conform), #356 (BigInt coercion in umbrella facade), and #357 (ledger.jsonl bootstrap), the napi-rs conformance suite at packages/sdk-node/test/conformance.test.js was 6/7 green — compare() was the lone holdout, returning analyzedTurns: 0 and an all-zero fidelity.summary against the cli-golden fixture while TS returned analyzedTurns: 7 and a populated coverage block.

Root cause

LedgerHandle::compare in crates/relayburn-sdk/src/query_verbs.rs was pre-filtering the queried turn list to only those whose model appeared in opts.models before computing analyzed_turns and the fidelity summary. The TS contract (packages/sdk/index.js::compare()) is the opposite: the models allow-list is honored inside buildCompareTable (which also pre-seeds requested-but-absent models as all-empty columns), and analyzedTurns = filteredTurns.length is taken AFTER the fidelity gate but BEFORE the model filter. Net effect on the cli-golden fixture: when callers asked to compare claude-sonnet-4-5 vs claude-opus-4-7 (neither present in the seven-turn fixture), every Rust-side turn was dropped at the early filter, collapsing the metadata to zeros. The deepStrictEqual diff on conformance was therefore on every counter under analyzedTurns / fidelity.summary.total / byClass / byGranularity / missingCoverage.

Fix

Drop the early requested_models retain from LedgerHandle::compare. Provider filtering and fidelity summarization now run on the full slice the ledger query returned, matching the TS sequence. Cell construction still honors the model allow-list via AnalyzeCompareOptions::models.
Remove the now-unused compare_model_id helper.
Inline-comment the path with the source-of-truth pointer back to packages/sdk/index.js::compare() so the next porter does not reintroduce the same shortcut.

Test deltas

compare_metadata_counts_requested_models_only was asserting the buggy behavior (analyzed_turns == 2 in a 3-turn fixture). Renamed to compare_metadata_counts_all_matched_turns_pre_models_filter and updated to the TS-parity expectations: analyzed_turns == 3 / summary.total == 3 even when requested models only match 2 of them. The unrequested model is still excluded from models / totals / cells.
New compare_reports_full_fidelity_summary_when_no_requested_model_appears regression covers the exact conformance scenario: requesting two models that are absent from the ledger still produces non-zero metadata describing the slice the comparison was drawn from.

Test plan

cargo build --workspace clean.
cargo test --workspace — 616 SDK tests + 2 SDK integration + 10 SDK-node binding tests pass, including the renamed and the new regression test.
Local napi conformance suite at packages/sdk-node/test/conformance.test.js now 7/7 green: summary, sessionCost, overhead, overheadTrim, hotspots, compare, ingest (verified by rebuilding pnpm run build:napi:debug and running with RELAYBURN_SDK_NAPI_BUILT=1).
pnpm -r run build && pnpm run test — 873 TS tests across all 8 published packages pass.

Refs #354 #356 #357 #355 #240.

…/sdk@1.x The Rust SDK's `compare()` was pre-filtering the turn list by `opts.models` *before* computing `analyzedTurns` and the fidelity summary. The TS contract in `packages/sdk/index.js::compare()` does the opposite: `analyzedTurns = filteredTurns.length` is taken AFTER the fidelity gate but BEFORE the model allow-list, which is honored inside `buildCompareTable` (which also pre-seeds requested-but-absent models as all-empty columns). Net effect on the conformance fixture (`tests/fixtures/cli-golden`, seven turns spanning sonnet-4-6 / haiku-4-5 / gpt-5-codex / sonnet-4-6): calling `compare({ models: ['claude-sonnet-4-5', 'claude-opus-4-7'], minFidelity: 'partial' })` — neither requested model is present in the fixture — yielded `analyzedTurns: 0` and an all-zero fidelity summary on the Rust side (every turn dropped at the early model filter), versus `analyzedTurns: 7` plus a populated `byClass` / `byGranularity` / `missingCoverage` block on TS. The conformance gate at `packages/sdk-node/test/conformance.test.js` reduced this to a `deepStrictEqual` failure on the only verb that hits this code path. Fix: drop the early `requested_models` `retain` from `LedgerHandle::compare`. Provider filtering and fidelity summarization now run on the full slice the ledger query returned, matching the TS path; cell construction still honors the model allow-list via `AnalyzeCompareOptions::models`. The unused `compare_model_id` helper is removed. Test deltas: - `compare_metadata_counts_requested_models_only` was asserting the buggy behavior. Renamed to `compare_metadata_counts_all_matched_turns_pre_models_filter` and updated to the TS-parity expectations: `analyzed_turns == 3` / `summary.total == 3` for a 3-turn fixture even when the requested models only match 2 of them. - New `compare_reports_full_fidelity_summary_when_no_requested_model_appears` regression covering the exact conformance scenario (request two models that are absent from the ledger; metadata still describes the slice). Refs #240 (rust-port epic). Follows #354/#356/#357 and unblocks #355 (α-followup conformance gate). Local conformance now 7/7 green (summary, sessionCost, overhead, overheadTrim, hotspots, compare, ingest).

coderabbitai · 2026-05-07T06:34:00Z

📝 Walkthrough

Walkthrough

LedgerHandle::compare defers model filtering from turn computation to cell table construction. Analyzed turns and fidelity metadata are now computed from all turns passing provider and fidelity filters, regardless of model allow-list. The unused compare_model_id helper is removed. Test coverage added for metadata accuracy when models are requested but absent from the ledger.

Changes

Model Filtering Timing in Compare

Layer / File(s)	Summary
Core Implementation `crates/relayburn-sdk/src/query_verbs.rs`	Provider filtering and fidelity gate applied to turns; analyzed_turns and fidelity.summary computed before model allow-list. Unused `compare_model_id` helper removed.
Test Coverage `crates/relayburn-sdk/src/query_verbs.rs`	Regression tests assert that analyzed_turns and fidelity.summary.total count all matched turns prior to model filtering, and metadata stays non-zero when requested models are absent with cells remaining empty.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

AgentWorkforce/burn#349: Both PRs refactor when "analyzed_turns" is computed—moving analyzed turn counting before model allow-list filtering in their respective code paths.
AgentWorkforce/burn#314: Main PR changes the SDK's compare filtering contract, which affects downstream consumer behavior relying on accurate metadata when models are filtered.

Poem

🐰 A rabbit hops through turns, all neat,
Counting them before the model-filter beat,
No premature snipping from the queue,
Metadata true, cells curated too! 🔍✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing compare() parity between Rust and TS SDK implementations for turn-iteration behavior.
Description check	✅ Passed	The description comprehensively explains the root cause, the fix, test changes, and verification steps, all directly related to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch compare-parity

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

crates/relayburn-sdk/src/query_verbs.rs (1)

2200-2212: ⚡ Quick win

Assert placeholder cells per requested model explicitly.

This loop only validates whatever cells happen to exist. If build_compare_table regresses and stops emitting flat cells for one requested model, the test can still pass as long as r.models contains the model name. Please assert at least one cell for each requested model before checking no_data/turns == 0.

Suggested test tightening

         assert!(r.models.contains(&"claude-sonnet-4-5".to_string()));
         assert!(r.models.contains(&"claude-opus-4-7".to_string()));
         // `claude-sonnet-4-6` is in the ledger but not requested, so it
         // does NOT appear in the result rows even though it contributed
         // to `analyzed_turns`.
         assert!(!r.models.contains(&"claude-sonnet-4-6".to_string()));
-        // Every cell for the requested-but-absent models is no_data.
-        for cell in &r.cells {
+        let sonnet_cells: Vec<_> = r
+            .cells
+            .iter()
+            .filter(|cell| cell.model == "claude-sonnet-4-5")
+            .collect();
+        let opus_cells: Vec<_> = r
+            .cells
+            .iter()
+            .filter(|cell| cell.model == "claude-opus-4-7")
+            .collect();
+        assert!(!sonnet_cells.is_empty());
+        assert!(!opus_cells.is_empty());
+        // Every cell for the requested-but-absent models is no_data.
+        for cell in sonnet_cells.into_iter().chain(opus_cells) {
             assert!(cell.no_data, "expected no_data for cell {cell:?}");
             assert_eq!(cell.turns, 0);
         }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/relayburn-sdk/src/query_verbs.rs` around lines 2200 - 2212, The test
currently only checks all cells in r.cells for no_data/turns==0 but doesn’t
ensure there’s at least one cell per requested model; update the test around the
build_compare_table/asserts to explicitly verify for each requested model (e.g.,
"claude-sonnet-4-5" and "claude-opus-4-7") that r.cells contains at least one
cell whose model equals that requested model, then for those cells assert
cell.no_data and cell.turns == 0; keep the existing negative assertion for
"claude-sonnet-4-6" and the r.models contains checks but tighten the cell-level
checks to target cells by their model value instead of iterating all r.cells
blindly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/relayburn-sdk/src/query_verbs.rs`:
- Around line 2200-2212: The test currently only checks all cells in r.cells for
no_data/turns==0 but doesn’t ensure there’s at least one cell per requested
model; update the test around the build_compare_table/asserts to explicitly
verify for each requested model (e.g., "claude-sonnet-4-5" and
"claude-opus-4-7") that r.cells contains at least one cell whose model equals
that requested model, then for those cells assert cell.no_data and cell.turns ==
0; keep the existing negative assertion for "claude-sonnet-4-6" and the r.models
contains checks but tighten the cell-level checks to target cells by their model
value instead of iterating all r.cells blindly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 15f4dc41-30f6-4ff9-bff5-c0abe3339ec3

📥 Commits

Reviewing files that changed from the base of the PR and between a0deb81 and 5209891.

📒 Files selected for processing (1)

crates/relayburn-sdk/src/query_verbs.rs

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

willwashburn merged commit c37f539 into main May 7, 2026
8 checks passed

willwashburn deleted the compare-parity branch May 7, 2026 12:36

willwashburn added a commit that referenced this pull request May 7, 2026

Merge main into napi-conformance to pick up #358 (compare() parity fix)

366e7ee

willwashburn mentioned this pull request May 7, 2026

Add Windows targets to napi sdk-node + CLI binary distribution #359

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relayburn-sdk: fix compare() turn-iteration parity with TS @relayburn/sdk@1.x (α-followup)#358

relayburn-sdk: fix compare() turn-iteration parity with TS @relayburn/sdk@1.x (α-followup)#358
willwashburn merged 1 commit intomainfrom
compare-parity

willwashburn commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

willwashburn commented May 7, 2026

Summary

Root cause

Fix

Test deltas

Test plan

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 7, 2026 •

edited

Loading