Skip to content

reader: dedupe assistant rows by requestId into Inference unit (#434)#448

Merged
willwashburn merged 2 commits into
mainfrom
claude/burn-434-request-id-dedup
May 25, 2026
Merged

reader: dedupe assistant rows by requestId into Inference unit (#434)#448
willwashburn merged 2 commits into
mainfrom
claude/burn-434-request-id-dedup

Conversation

@willwashburn
Copy link
Copy Markdown
Member

Closes #434.

Summary

Collapses multi-row Claude assistant messages (text + tool_use + reasoning sharing one requestId) into a single Inference unit.

  • New Inference aggregate in crates/relayburn-sdk/src/reader/inference.rs: Inference, InferenceKind (Reasoning | Message | ToolUse | Mixed), ToolUseRef, RequestIdLookup, TurnKey, InferenceKeySource, build_inferences.
  • Reader pass in claude.rs captures request_id on WorkingRecord, threads a request_id_lookup through ParseResult / ParseIncrementalResult.
  • Schema v4 (chained on hotspots: rank tools by raw output bytes alongside tokens (#436) #444's v3): new inferences derived table via CREATE TABLE IF NOT EXISTS migration in migrate_burn_schema.
  • SDK verb: LedgerHandle::inferences(opts) + free function. Materialization happens in apply_parsed_extras via build_inferences(turns, request_id_lookup) and Ledger::append_inferences.
  • state status surfaces inference count via new BurnDbRowCounts.inferences field.

Bug found and fixed along the way

The existing Claude parser's usage-merge for multi-block messages updated usage_coverage from later carrier rows but NOT usage itself. If the carrier row arrived second or later, its tokens were silently dropped. Fixed: WorkingRecord.usage now overwrites from any row carrying the usage block. The existing multi_block_turn_keeps_usage_once test continues to pass because that fixture puts usage on row 1.

Restraint on turn_count semantics

Kept Summary.turn_count as-is. Burn's reader already collapses to one TurnRecord per message.id, and Claude's requestId ↔ message.id is 1:1, so per-API-call counts are already correct. Switching turn_count to inference-keyed would be a no-op for Claude and a breaking change for Codex/opencode (no requestId). The new Inference unit is additive.

Downstream callers audited

  • analyze/hotspots.rs — per-tool attribution keys on tool_use_id (no change needed).
  • analyze/tool_output_bloat.rs — uses model_by_message_id, not row counts (no change).
  • Summary.turn_count — already correct per above.

Merge note

This PR bumps schema v3 → v4 (assuming #444 is in main, which it is). #435 (subagent pairing) ALSO uses v4 in its branch. Whichever lands second needs to become v5 with a chained ALTER TABLE. Migration shape is idempotent — mechanical reconcile. Doc comment in schema.rs calls out the renumber path.

Test plan

  • cargo build --workspace clean
  • cargo test --workspace — 813 passed (SDK lib alone: 693), 0 failed
  • BURN_GOLDEN=1 cargo test --test golden — 5/5
  • Golden updates: state-status.stdout.txt + state-status-json.stdout.txt — schema 3→4, new inferences: 0 row (cli-golden fixture is JSONL bootstrap only; populated ledgers will see real counts)
  • New fixture tests cover multi-row inference collapse, usage-merge fix, fallback paths

Out of scope

  • Switching turn_count semantics (see "Restraint" above).
  • #[non_exhaustive] on new types.
  • Codex/opencode requestId equivalent — they don't have one; documented.

Generated by Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Warning

Review limit reached

@willwashburn, we couldn't start this review because you've used your available PR reviews for now.

Your plan includes 1 review of capacity. Refill in 14 minutes and 32 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 41e850a0-a4f2-4847-bfee-4910675d1ab6

📥 Commits

Reviewing files that changed from the base of the PR and between 892ac11 and 038d514.

📒 Files selected for processing (16)
  • CHANGELOG.md
  • crates/relayburn-cli/src/commands/state.rs
  • crates/relayburn-sdk/src/ingest/ingest.rs
  • crates/relayburn-sdk/src/ledger.rs
  • crates/relayburn-sdk/src/ledger/db.rs
  • crates/relayburn-sdk/src/ledger/reader.rs
  • crates/relayburn-sdk/src/ledger/schema.rs
  • crates/relayburn-sdk/src/ledger/tests.rs
  • crates/relayburn-sdk/src/ledger/writer.rs
  • crates/relayburn-sdk/src/lib.rs
  • crates/relayburn-sdk/src/query_verbs.rs
  • crates/relayburn-sdk/src/reader.rs
  • crates/relayburn-sdk/src/reader/claude.rs
  • crates/relayburn-sdk/src/reader/inference.rs
  • tests/fixtures/cli-golden/snapshots/state-status-json.stdout.txt
  • tests/fixtures/cli-golden/snapshots/state-status.stdout.txt
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/burn-434-request-id-dedup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c73976c12f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +339 to +343
let has_tools = !tool_uses.is_empty();
let has_reasoning = turn.usage.reasoning > 0;
let has_text = !has_tools; // proxy: a turn with no tool_uses must have produced text
match (has_reasoning, has_tools, has_text) {
(true, false, false) => InferenceKind::Reasoning,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Classify reasoning-only inferences as Reasoning

The new kind classifier makes InferenceKind::Reasoning unreachable because has_text is hard-coded as !has_tools, so any inference without tool calls is always treated as having text. As a result, calls with reasoning tokens and no tools are labeled mixed instead of reasoning, which makes the new kind field inaccurate for downstream analytics that rely on reasoning-only detection.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — InferenceKind::Reasoning was indeed unreachable. Fixed in ff7bfaa by collapsing the classifier to a 2-tuple match on (has_reasoning, has_tools) and added a test that constructs a reasoning-only turn (reasoning > 0, no tool_uses, output == 0) and asserts it classifies as Reasoning. The trade-off — that "reasoning + text" still lumps under Reasoning since TurnRecord doesn't surface per-block content kinds — is documented inline at the call site.


Generated by Claude Code

Comment on lines +147 to +150
if let Some(source) = q.source {
clauses.push("source = ?");
bound.push(source.wire_str().to_string());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply project filter when querying inferences

InferencesOptions accepts project, and LedgerHandle::inferences builds a Query with that field, but query_inferences never applies q.project in SQL. This means callers requesting project-scoped results will receive cross-project inferences, which breaks expected filtering behavior for multi-project ledgers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ff7bfaa. The inferences table doesn't carry project / project_key columns directly — those live on turns — so the filter is applied via a subquery: session_id IN (SELECT DISTINCT session_id FROM turns WHERE project = ? OR project_key = ?). Mirrors the predicate shape query_turns already uses (matches against either column). Added a regression test that ingests two sessions with distinct projects and asserts the project-scoped query returns only the matching one.


Generated by Claude Code

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 16 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread crates/relayburn-sdk/src/reader/inference.rs Outdated
Comment thread crates/relayburn-sdk/src/reader/inference.rs Outdated
willwashburn pushed a commit that referenced this pull request May 25, 2026
…t filter (#448)

The classifier hard-coded `has_text = !has_tools`, which made the
`(true, false, false)` arm — reasoning-only turns — unreachable; they
were silently lumped into `Mixed`. Switch to a 2-tuple match on
`(has_reasoning, has_tools)` and document the intentional coarseness
(reasoning + text lumps with `Reasoning`, tools + text with `ToolUse`)
so the trade-off is visible at the call site.

`query_inferences` accepted `Query::project` but never applied it to
SQL, so project-scoped callers received cross-project rows. The
`inferences` table doesn't carry project columns; filter via a subquery
against `turns` (`session_id IN (... WHERE project = ? OR project_key = ?)`)
which mirrors the predicate shape `query_turns` already uses.

Adds two tests: a reasoning-only turn classifies as `Reasoning`, and a
two-project ledger returns only the requested project's inferences.

https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY
claude added 2 commits May 25, 2026 18:39
Introduces an `Inference` aggregate keyed by `(source, session_id,
request_id)` so callers asking "how many API calls" stop conflating
Claude's multi-content-block assistant rows. One Claude API call lands
as multiple JSONL rows sharing a `requestId`; the existing
`TurnRecord` collapses by `message.id` (1:1 with `requestId` today),
but the inference key gives a durable per-API-call identity that
survives future harness changes and exposes a stable provenance field
(`request-id` / `message-id` / `row-synthetic`).

Schema bumps to v4: new `inferences` table keyed by the composite
triple, populated by the ingest pipeline from the parser's new
`request_id_lookup`. Chained migration on top of v2 (#437
stop_reason) and v3 (#436 output_bytes); `burn state rebuild` repopulates
on legacy ledgers. (If #444 hasn't merged by integration time, this
should renumber to v3 — the version constant + migration step + tests
sit together for an easy rebase.)

Also fixes a latent bug in the Claude parser: usage merging only
updated `usage_coverage` on subsequent rows of the same `message_id`,
not `usage` itself. If the carrier row wasn't the first row for that
message id, its tokens were dropped. The merge now adopts the
carrier's `usage` values regardless of arrival order.

SDK verb: `LedgerHandle::inferences(InferencesOptions) ->
Vec<Inference>` + free-function `inferences()`. Codex / opencode (no
upstream `requestId`) fall back to `message_id` via the
`InferenceKeySource::MessageId` provenance.

Golden updates: `state-status.stdout.txt` and `state-status-json.stdout.txt`
gain an `inferences: 0` row and the `schemaVersion` bumps 3 → 4. The
fixture is bootstrap-only (no ingest), so the count stays 0 until the
next `burn ingest` or `burn state rebuild`.

https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY
…t filter (#448)

The classifier hard-coded `has_text = !has_tools`, which made the
`(true, false, false)` arm — reasoning-only turns — unreachable; they
were silently lumped into `Mixed`. Switch to a 2-tuple match on
`(has_reasoning, has_tools)` and document the intentional coarseness
(reasoning + text lumps with `Reasoning`, tools + text with `ToolUse`)
so the trade-off is visible at the call site.

`query_inferences` accepted `Query::project` but never applied it to
SQL, so project-scoped callers received cross-project rows. The
`inferences` table doesn't carry project columns; filter via a subquery
against `turns` (`session_id IN (... WHERE project = ? OR project_key = ?)`)
which mirrors the predicate shape `query_turns` already uses.

Adds two tests: a reasoning-only turn classifies as `Reasoning`, and a
two-project ledger returns only the requested project's inferences.

https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY
@willwashburn willwashburn force-pushed the claude/burn-434-request-id-dedup branch from ff7bfaa to 038d514 Compare May 25, 2026 18:42
@willwashburn willwashburn merged commit f87a701 into main May 25, 2026
11 checks passed
@willwashburn willwashburn deleted the claude/burn-434-request-id-dedup branch May 25, 2026 21:59
willwashburn added a commit that referenced this pull request May 26, 2026
* sdk: per-turn span tree as derived analytical primitive (#430)

Project the per-turn hierarchy that flat row records can't express:
Turn -> { UserPrompt, Inference -> ToolUse -> { ToolResult, Subagent } }.
Pure projection over TurnRecord + tool_result_event rows + Claude
subagent sidecars - no schema change, no caching, derive on every call.

- analyze/span_tree.rs: SpanKind / SpanStatus / AttrValue / SpanEvent /
  SpanNode / TurnSpanTree types with locked-in attribute schema
  (tokens.*, model, request_id, agent_id, tool_use_id, stop_reason)
  documented in the module preamble. Kebab-case wire form matches the
  existing repo convention.
- reader/claude/span_tree.rs: harness builder that consumes the #448
  Inference aggregates (falls back to a synthetic single-inference for
  pre-v5 ledgers), pairs tool_result events by tool_use_id, and nests
  paired subagents under their Task ToolUse. Unpaired subagents
  surface as sibling Subagent spans under the Turn root with
  attributes["unattached"] = true.
- reader/codex/span_tree.rs: equivalent builder for Codex rollouts.
  Codex carries strictly less hierarchy (no requestId, no sidecar
  transcripts, no stop_reason), so the builder is documented as
  limited and produces Turn -> { UserPrompt, Inference -> ToolUse }
  without fabricating data.
- query_verbs.rs: LedgerHandle::turn_span_tree(session_id, turn_id) and
  session_span_trees(session_id) verbs + free-function forms; source
  dispatch picks the right per-harness builder.

Status mapping: tool_use.is_error -> "tool_error" on the ToolUse span,
bubbles to parent Inference and root as "child_error"; stop_reason ==
Refusal -> "refusal" on root; stop_reason == MaxTokens ->
"max_tokens" on root.

Tests: 29 new (12 type / 10 Claude builder / 3 Codex builder /
4 LedgerHandle integration), covering every acceptance case in the
issue. cargo test --workspace passes 871 tests; BURN_GOLDEN=1
cargo test --test golden passes 5.

https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY

* fix(span-tree): scope subagents per turn, propagate ToolResult end_ms, drop unsound Eq

Address PR #451 review findings:

- session_span_trees passed the session-wide subagents slice into every
  turn, so the Claude builder duplicated each orphan sidecar into every
  turn tree. Pre-bucket subagents so each lands in exactly one turn:
  paired sidecars route by tool_use_id; orphans go to the latest turn
  whose start <= subagent_start (first turn if none precede). The
  orphan-semantics decision (sibling under the turn root, attributes
  unattached=true) stands.

- Claude and Codex builders widened tool_node.end_ms from a later
  ToolResult timestamp but left the parent Inference end_ms unchanged,
  so turns reported truncated durations once the root rolled up its
  inference children. Propagate the widened end up to the inference
  span before pushing the tool_use child.

- impl Eq for AttrValue violated reflexivity (AttrValue::Float(f64),
  NaN != NaN). Drop the impl. BTreeMap<String, AttrValue> only needs
  Ord on its keys, so no consumer required Eq.

Tests:
- bucket_subagents_per_turn unit test covers paired + three orphan
  placements (mid, early, late) with a no-duplication assertion.
- session_span_trees regression test pins the no-duplication contract
  end-to-end.
- Claude + Codex span-tree builder tests assert Inference and root
  end_ms widen to a trailing ToolResult timestamp.

https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reader: dedupe assistant rows by requestId into a single "inference" unit

2 participants