reader: dedupe assistant rows by requestId into Inference unit (#434) by willwashburn · Pull Request #448 · AgentWorkforce/burn

willwashburn · 2026-05-25T17:25:43Z

Closes #434.

Summary

Collapses multi-row Claude assistant messages (text + tool_use + reasoning sharing one requestId) into a single Inference unit.

New Inference aggregate in crates/relayburn-sdk/src/reader/inference.rs: Inference, InferenceKind (Reasoning | Message | ToolUse | Mixed), ToolUseRef, RequestIdLookup, TurnKey, InferenceKeySource, build_inferences.
Reader pass in claude.rs captures request_id on WorkingRecord, threads a request_id_lookup through ParseResult / ParseIncrementalResult.
Schema v4 (chained on hotspots: rank tools by raw output bytes alongside tokens (#436) #444's v3): new inferences derived table via CREATE TABLE IF NOT EXISTS migration in migrate_burn_schema.
SDK verb: LedgerHandle::inferences(opts) + free function. Materialization happens in apply_parsed_extras via build_inferences(turns, request_id_lookup) and Ledger::append_inferences.
state status surfaces inference count via new BurnDbRowCounts.inferences field.

Bug found and fixed along the way

The existing Claude parser's usage-merge for multi-block messages updated usage_coverage from later carrier rows but NOT usage itself. If the carrier row arrived second or later, its tokens were silently dropped. Fixed: WorkingRecord.usage now overwrites from any row carrying the usage block. The existing multi_block_turn_keeps_usage_once test continues to pass because that fixture puts usage on row 1.

Restraint on `turn_count` semantics

Kept Summary.turn_count as-is. Burn's reader already collapses to one TurnRecord per message.id, and Claude's requestId ↔ message.id is 1:1, so per-API-call counts are already correct. Switching turn_count to inference-keyed would be a no-op for Claude and a breaking change for Codex/opencode (no requestId). The new Inference unit is additive.

Downstream callers audited

analyze/hotspots.rs — per-tool attribution keys on tool_use_id (no change needed).
analyze/tool_output_bloat.rs — uses model_by_message_id, not row counts (no change).
Summary.turn_count — already correct per above.

Merge note

This PR bumps schema v3 → v4 (assuming #444 is in main, which it is). #435 (subagent pairing) ALSO uses v4 in its branch. Whichever lands second needs to become v5 with a chained ALTER TABLE. Migration shape is idempotent — mechanical reconcile. Doc comment in schema.rs calls out the renumber path.

Test plan

cargo build --workspace clean
cargo test --workspace — 813 passed (SDK lib alone: 693), 0 failed
BURN_GOLDEN=1 cargo test --test golden — 5/5
Golden updates: state-status.stdout.txt + state-status-json.stdout.txt — schema 3→4, new inferences: 0 row (cli-golden fixture is JSONL bootstrap only; populated ledgers will see real counts)
New fixture tests cover multi-row inference collapse, usage-merge fix, fallback paths

Out of scope

Switching turn_count semantics (see "Restraint" above).
#[non_exhaustive] on new types.
Codex/opencode requestId equivalent — they don't have one; documented.

Generated by Claude Code

coderabbitai · 2026-05-25T17:25:50Z

Warning

Review limit reached

@willwashburn, we couldn't start this review because you've used your available PR reviews for now.

Your plan includes 1 review of capacity. Refill in 14 minutes and 32 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 41e850a0-a4f2-4847-bfee-4910675d1ab6

📥 Commits

Reviewing files that changed from the base of the PR and between 892ac11 and 038d514.

📒 Files selected for processing (16)

CHANGELOG.md
crates/relayburn-cli/src/commands/state.rs
crates/relayburn-sdk/src/ingest/ingest.rs
crates/relayburn-sdk/src/ledger.rs
crates/relayburn-sdk/src/ledger/db.rs
crates/relayburn-sdk/src/ledger/reader.rs
crates/relayburn-sdk/src/ledger/schema.rs
crates/relayburn-sdk/src/ledger/tests.rs
crates/relayburn-sdk/src/ledger/writer.rs
crates/relayburn-sdk/src/lib.rs
crates/relayburn-sdk/src/query_verbs.rs
crates/relayburn-sdk/src/reader.rs
crates/relayburn-sdk/src/reader/claude.rs
crates/relayburn-sdk/src/reader/inference.rs
tests/fixtures/cli-golden/snapshots/state-status-json.stdout.txt
tests/fixtures/cli-golden/snapshots/state-status.stdout.txt

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/burn-434-request-id-dedup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c73976c12f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-25T17:29:53Z

+    let has_tools = !tool_uses.is_empty();
+    let has_reasoning = turn.usage.reasoning > 0;
+    let has_text = !has_tools; // proxy: a turn with no tool_uses must have produced text
+    match (has_reasoning, has_tools, has_text) {
+        (true, false, false) => InferenceKind::Reasoning,


Classify reasoning-only inferences as Reasoning

The new kind classifier makes InferenceKind::Reasoning unreachable because has_text is hard-coded as !has_tools, so any inference without tool calls is always treated as having text. As a result, calls with reasoning tokens and no tools are labeled mixed instead of reasoning, which makes the new kind field inaccurate for downstream analytics that rely on reasoning-only detection.

Useful? React with 👍 / 👎.

Good catch — InferenceKind::Reasoning was indeed unreachable. Fixed in ff7bfaa by collapsing the classifier to a 2-tuple match on (has_reasoning, has_tools) and added a test that constructs a reasoning-only turn (reasoning > 0, no tool_uses, output == 0) and asserts it classifies as Reasoning. The trade-off — that "reasoning + text" still lumps under Reasoning since TurnRecord doesn't surface per-block content kinds — is documented inline at the call site.

Generated by Claude Code

chatgpt-codex-connector · 2026-05-25T17:29:53Z

+    if let Some(source) = q.source {
+        clauses.push("source = ?");
+        bound.push(source.wire_str().to_string());
+    }


Apply project filter when querying inferences

InferencesOptions accepts project, and LedgerHandle::inferences builds a Query with that field, but query_inferences never applies q.project in SQL. This means callers requesting project-scoped results will receive cross-project inferences, which breaks expected filtering behavior for multi-project ledgers.

Useful? React with 👍 / 👎.

Fixed in ff7bfaa. The inferences table doesn't carry project / project_key columns directly — those live on turns — so the filter is applied via a subquery: session_id IN (SELECT DISTINCT session_id FROM turns WHERE project = ? OR project_key = ?). Mirrors the predicate shape query_turns already uses (matches against either column). Added a regression test that ingests two sessions with distinct projects and asserts the project-scoped query returns only the matching one.

Generated by Claude Code

cubic-dev-ai

2 issues found across 16 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

…t filter (#448) The classifier hard-coded `has_text = !has_tools`, which made the `(true, false, false)` arm — reasoning-only turns — unreachable; they were silently lumped into `Mixed`. Switch to a 2-tuple match on `(has_reasoning, has_tools)` and document the intentional coarseness (reasoning + text lumps with `Reasoning`, tools + text with `ToolUse`) so the trade-off is visible at the call site. `query_inferences` accepted `Query::project` but never applied it to SQL, so project-scoped callers received cross-project rows. The `inferences` table doesn't carry project columns; filter via a subquery against `turns` (`session_id IN (... WHERE project = ? OR project_key = ?)`) which mirrors the predicate shape `query_turns` already uses. Adds two tests: a reasoning-only turn classifies as `Reasoning`, and a two-project ledger returns only the requested project's inferences. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY

Introduces an `Inference` aggregate keyed by `(source, session_id, request_id)` so callers asking "how many API calls" stop conflating Claude's multi-content-block assistant rows. One Claude API call lands as multiple JSONL rows sharing a `requestId`; the existing `TurnRecord` collapses by `message.id` (1:1 with `requestId` today), but the inference key gives a durable per-API-call identity that survives future harness changes and exposes a stable provenance field (`request-id` / `message-id` / `row-synthetic`). Schema bumps to v4: new `inferences` table keyed by the composite triple, populated by the ingest pipeline from the parser's new `request_id_lookup`. Chained migration on top of v2 (#437 stop_reason) and v3 (#436 output_bytes); `burn state rebuild` repopulates on legacy ledgers. (If #444 hasn't merged by integration time, this should renumber to v3 — the version constant + migration step + tests sit together for an easy rebase.) Also fixes a latent bug in the Claude parser: usage merging only updated `usage_coverage` on subsequent rows of the same `message_id`, not `usage` itself. If the carrier row wasn't the first row for that message id, its tokens were dropped. The merge now adopts the carrier's `usage` values regardless of arrival order. SDK verb: `LedgerHandle::inferences(InferencesOptions) -> Vec<Inference>` + free-function `inferences()`. Codex / opencode (no upstream `requestId`) fall back to `message_id` via the `InferenceKeySource::MessageId` provenance. Golden updates: `state-status.stdout.txt` and `state-status-json.stdout.txt` gain an `inferences: 0` row and the `schemaVersion` bumps 3 → 4. The fixture is bootstrap-only (no ingest), so the count stays 0 until the next `burn ingest` or `burn state rebuild`. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY

…t filter (#448) The classifier hard-coded `has_text = !has_tools`, which made the `(true, false, false)` arm — reasoning-only turns — unreachable; they were silently lumped into `Mixed`. Switch to a 2-tuple match on `(has_reasoning, has_tools)` and document the intentional coarseness (reasoning + text lumps with `Reasoning`, tools + text with `ToolUse`) so the trade-off is visible at the call site. `query_inferences` accepted `Query::project` but never applied it to SQL, so project-scoped callers received cross-project rows. The `inferences` table doesn't carry project columns; filter via a subquery against `turns` (`session_id IN (... WHERE project = ? OR project_key = ?)`) which mirrors the predicate shape `query_turns` already uses. Adds two tests: a reasoning-only turn classifies as `Reasoning`, and a two-project ledger returns only the requested project's inferences. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY

* sdk: per-turn span tree as derived analytical primitive (#430) Project the per-turn hierarchy that flat row records can't express: Turn -> { UserPrompt, Inference -> ToolUse -> { ToolResult, Subagent } }. Pure projection over TurnRecord + tool_result_event rows + Claude subagent sidecars - no schema change, no caching, derive on every call. - analyze/span_tree.rs: SpanKind / SpanStatus / AttrValue / SpanEvent / SpanNode / TurnSpanTree types with locked-in attribute schema (tokens.*, model, request_id, agent_id, tool_use_id, stop_reason) documented in the module preamble. Kebab-case wire form matches the existing repo convention. - reader/claude/span_tree.rs: harness builder that consumes the #448 Inference aggregates (falls back to a synthetic single-inference for pre-v5 ledgers), pairs tool_result events by tool_use_id, and nests paired subagents under their Task ToolUse. Unpaired subagents surface as sibling Subagent spans under the Turn root with attributes["unattached"] = true. - reader/codex/span_tree.rs: equivalent builder for Codex rollouts. Codex carries strictly less hierarchy (no requestId, no sidecar transcripts, no stop_reason), so the builder is documented as limited and produces Turn -> { UserPrompt, Inference -> ToolUse } without fabricating data. - query_verbs.rs: LedgerHandle::turn_span_tree(session_id, turn_id) and session_span_trees(session_id) verbs + free-function forms; source dispatch picks the right per-harness builder. Status mapping: tool_use.is_error -> "tool_error" on the ToolUse span, bubbles to parent Inference and root as "child_error"; stop_reason == Refusal -> "refusal" on root; stop_reason == MaxTokens -> "max_tokens" on root. Tests: 29 new (12 type / 10 Claude builder / 3 Codex builder / 4 LedgerHandle integration), covering every acceptance case in the issue. cargo test --workspace passes 871 tests; BURN_GOLDEN=1 cargo test --test golden passes 5. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY * fix(span-tree): scope subagents per turn, propagate ToolResult end_ms, drop unsound Eq Address PR #451 review findings: - session_span_trees passed the session-wide subagents slice into every turn, so the Claude builder duplicated each orphan sidecar into every turn tree. Pre-bucket subagents so each lands in exactly one turn: paired sidecars route by tool_use_id; orphans go to the latest turn whose start <= subagent_start (first turn if none precede). The orphan-semantics decision (sibling under the turn root, attributes unattached=true) stands. - Claude and Codex builders widened tool_node.end_ms from a later ToolResult timestamp but left the parent Inference end_ms unchanged, so turns reported truncated durations once the root rolled up its inference children. Propagate the widened end up to the inference span before pushing the tool_use child. - impl Eq for AttrValue violated reflexivity (AttrValue::Float(f64), NaN != NaN). Drop the impl. BTreeMap<String, AttrValue> only needs Ord on its keys, so no consumer required Eq. Tests: - bucket_subagents_per_turn unit test covers paired + three orphan placements (mid, early, late) with a no-duplication assertion. - session_span_trees regression test pins the no-duplication contract end-to-end. - Claude + Codex span-tree builder tests assert Inference and root end_ms widen to a trailing ToolResult timestamp. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY --------- Co-authored-by: Claude <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed May 25, 2026

View reviewed changes

cubic-dev-ai Bot reviewed May 25, 2026

View reviewed changes

Comment thread crates/relayburn-sdk/src/reader/inference.rs Outdated

Comment thread crates/relayburn-sdk/src/reader/inference.rs Outdated

claude added 2 commits May 25, 2026 18:39

willwashburn force-pushed the claude/burn-434-request-id-dedup branch from ff7bfaa to 038d514 Compare May 25, 2026 18:42

willwashburn merged commit f87a701 into main May 25, 2026
11 checks passed

willwashburn deleted the claude/burn-434-request-id-dedup branch May 25, 2026 21:59

willwashburn mentioned this pull request May 25, 2026

sdk: per-turn span tree as analytical primitive (#430) #451

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reader: dedupe assistant rows by requestId into Inference unit (#434)#448

reader: dedupe assistant rows by requestId into Inference unit (#434)#448
willwashburn merged 2 commits into
mainfrom
claude/burn-434-request-id-dedup

willwashburn commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Review limit reached

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Uh oh!

willwashburn May 25, 2026

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Uh oh!

willwashburn May 25, 2026

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

willwashburn commented May 25, 2026

Summary

Bug found and fixed along the way

Restraint on turn_count semantics

Downstream callers audited

Merge note

Test plan

Out of scope

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

willwashburn May 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

willwashburn May 25, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Restraint on `turn_count` semantics

coderabbitai Bot commented May 25, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading