reader: dedupe assistant rows by requestId into Inference unit (#434)#448
Conversation
|
Warning Review limit reached
Your plan includes 1 review of capacity. Refill in 14 minutes and 32 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (16)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c73976c12f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let has_tools = !tool_uses.is_empty(); | ||
| let has_reasoning = turn.usage.reasoning > 0; | ||
| let has_text = !has_tools; // proxy: a turn with no tool_uses must have produced text | ||
| match (has_reasoning, has_tools, has_text) { | ||
| (true, false, false) => InferenceKind::Reasoning, |
There was a problem hiding this comment.
Classify reasoning-only inferences as Reasoning
The new kind classifier makes InferenceKind::Reasoning unreachable because has_text is hard-coded as !has_tools, so any inference without tool calls is always treated as having text. As a result, calls with reasoning tokens and no tools are labeled mixed instead of reasoning, which makes the new kind field inaccurate for downstream analytics that rely on reasoning-only detection.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — InferenceKind::Reasoning was indeed unreachable. Fixed in ff7bfaa by collapsing the classifier to a 2-tuple match on (has_reasoning, has_tools) and added a test that constructs a reasoning-only turn (reasoning > 0, no tool_uses, output == 0) and asserts it classifies as Reasoning. The trade-off — that "reasoning + text" still lumps under Reasoning since TurnRecord doesn't surface per-block content kinds — is documented inline at the call site.
Generated by Claude Code
| if let Some(source) = q.source { | ||
| clauses.push("source = ?"); | ||
| bound.push(source.wire_str().to_string()); | ||
| } |
There was a problem hiding this comment.
Apply project filter when querying inferences
InferencesOptions accepts project, and LedgerHandle::inferences builds a Query with that field, but query_inferences never applies q.project in SQL. This means callers requesting project-scoped results will receive cross-project inferences, which breaks expected filtering behavior for multi-project ledgers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in ff7bfaa. The inferences table doesn't carry project / project_key columns directly — those live on turns — so the filter is applied via a subquery: session_id IN (SELECT DISTINCT session_id FROM turns WHERE project = ? OR project_key = ?). Mirrors the predicate shape query_turns already uses (matches against either column). Added a regression test that ingests two sessions with distinct projects and asserts the project-scoped query returns only the matching one.
Generated by Claude Code
There was a problem hiding this comment.
2 issues found across 16 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
…t filter (#448) The classifier hard-coded `has_text = !has_tools`, which made the `(true, false, false)` arm — reasoning-only turns — unreachable; they were silently lumped into `Mixed`. Switch to a 2-tuple match on `(has_reasoning, has_tools)` and document the intentional coarseness (reasoning + text lumps with `Reasoning`, tools + text with `ToolUse`) so the trade-off is visible at the call site. `query_inferences` accepted `Query::project` but never applied it to SQL, so project-scoped callers received cross-project rows. The `inferences` table doesn't carry project columns; filter via a subquery against `turns` (`session_id IN (... WHERE project = ? OR project_key = ?)`) which mirrors the predicate shape `query_turns` already uses. Adds two tests: a reasoning-only turn classifies as `Reasoning`, and a two-project ledger returns only the requested project's inferences. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY
Introduces an `Inference` aggregate keyed by `(source, session_id, request_id)` so callers asking "how many API calls" stop conflating Claude's multi-content-block assistant rows. One Claude API call lands as multiple JSONL rows sharing a `requestId`; the existing `TurnRecord` collapses by `message.id` (1:1 with `requestId` today), but the inference key gives a durable per-API-call identity that survives future harness changes and exposes a stable provenance field (`request-id` / `message-id` / `row-synthetic`). Schema bumps to v4: new `inferences` table keyed by the composite triple, populated by the ingest pipeline from the parser's new `request_id_lookup`. Chained migration on top of v2 (#437 stop_reason) and v3 (#436 output_bytes); `burn state rebuild` repopulates on legacy ledgers. (If #444 hasn't merged by integration time, this should renumber to v3 — the version constant + migration step + tests sit together for an easy rebase.) Also fixes a latent bug in the Claude parser: usage merging only updated `usage_coverage` on subsequent rows of the same `message_id`, not `usage` itself. If the carrier row wasn't the first row for that message id, its tokens were dropped. The merge now adopts the carrier's `usage` values regardless of arrival order. SDK verb: `LedgerHandle::inferences(InferencesOptions) -> Vec<Inference>` + free-function `inferences()`. Codex / opencode (no upstream `requestId`) fall back to `message_id` via the `InferenceKeySource::MessageId` provenance. Golden updates: `state-status.stdout.txt` and `state-status-json.stdout.txt` gain an `inferences: 0` row and the `schemaVersion` bumps 3 → 4. The fixture is bootstrap-only (no ingest), so the count stays 0 until the next `burn ingest` or `burn state rebuild`. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY
…t filter (#448) The classifier hard-coded `has_text = !has_tools`, which made the `(true, false, false)` arm — reasoning-only turns — unreachable; they were silently lumped into `Mixed`. Switch to a 2-tuple match on `(has_reasoning, has_tools)` and document the intentional coarseness (reasoning + text lumps with `Reasoning`, tools + text with `ToolUse`) so the trade-off is visible at the call site. `query_inferences` accepted `Query::project` but never applied it to SQL, so project-scoped callers received cross-project rows. The `inferences` table doesn't carry project columns; filter via a subquery against `turns` (`session_id IN (... WHERE project = ? OR project_key = ?)`) which mirrors the predicate shape `query_turns` already uses. Adds two tests: a reasoning-only turn classifies as `Reasoning`, and a two-project ledger returns only the requested project's inferences. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY
ff7bfaa to
038d514
Compare
* sdk: per-turn span tree as derived analytical primitive (#430) Project the per-turn hierarchy that flat row records can't express: Turn -> { UserPrompt, Inference -> ToolUse -> { ToolResult, Subagent } }. Pure projection over TurnRecord + tool_result_event rows + Claude subagent sidecars - no schema change, no caching, derive on every call. - analyze/span_tree.rs: SpanKind / SpanStatus / AttrValue / SpanEvent / SpanNode / TurnSpanTree types with locked-in attribute schema (tokens.*, model, request_id, agent_id, tool_use_id, stop_reason) documented in the module preamble. Kebab-case wire form matches the existing repo convention. - reader/claude/span_tree.rs: harness builder that consumes the #448 Inference aggregates (falls back to a synthetic single-inference for pre-v5 ledgers), pairs tool_result events by tool_use_id, and nests paired subagents under their Task ToolUse. Unpaired subagents surface as sibling Subagent spans under the Turn root with attributes["unattached"] = true. - reader/codex/span_tree.rs: equivalent builder for Codex rollouts. Codex carries strictly less hierarchy (no requestId, no sidecar transcripts, no stop_reason), so the builder is documented as limited and produces Turn -> { UserPrompt, Inference -> ToolUse } without fabricating data. - query_verbs.rs: LedgerHandle::turn_span_tree(session_id, turn_id) and session_span_trees(session_id) verbs + free-function forms; source dispatch picks the right per-harness builder. Status mapping: tool_use.is_error -> "tool_error" on the ToolUse span, bubbles to parent Inference and root as "child_error"; stop_reason == Refusal -> "refusal" on root; stop_reason == MaxTokens -> "max_tokens" on root. Tests: 29 new (12 type / 10 Claude builder / 3 Codex builder / 4 LedgerHandle integration), covering every acceptance case in the issue. cargo test --workspace passes 871 tests; BURN_GOLDEN=1 cargo test --test golden passes 5. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY * fix(span-tree): scope subagents per turn, propagate ToolResult end_ms, drop unsound Eq Address PR #451 review findings: - session_span_trees passed the session-wide subagents slice into every turn, so the Claude builder duplicated each orphan sidecar into every turn tree. Pre-bucket subagents so each lands in exactly one turn: paired sidecars route by tool_use_id; orphans go to the latest turn whose start <= subagent_start (first turn if none precede). The orphan-semantics decision (sibling under the turn root, attributes unattached=true) stands. - Claude and Codex builders widened tool_node.end_ms from a later ToolResult timestamp but left the parent Inference end_ms unchanged, so turns reported truncated durations once the root rolled up its inference children. Propagate the widened end up to the inference span before pushing the tool_use child. - impl Eq for AttrValue violated reflexivity (AttrValue::Float(f64), NaN != NaN). Drop the impl. BTreeMap<String, AttrValue> only needs Ord on its keys, so no consumer required Eq. Tests: - bucket_subagents_per_turn unit test covers paired + three orphan placements (mid, early, late) with a no-duplication assertion. - session_span_trees regression test pins the no-duplication contract end-to-end. - Claude + Codex span-tree builder tests assert Inference and root end_ms widen to a trailing ToolResult timestamp. https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY --------- Co-authored-by: Claude <noreply@anthropic.com>
Closes #434.
Summary
Collapses multi-row Claude assistant messages (text + tool_use + reasoning sharing one
requestId) into a singleInferenceunit.Inferenceaggregate incrates/relayburn-sdk/src/reader/inference.rs:Inference,InferenceKind(Reasoning | Message | ToolUse | Mixed),ToolUseRef,RequestIdLookup,TurnKey,InferenceKeySource,build_inferences.claude.rscapturesrequest_idonWorkingRecord, threads arequest_id_lookupthroughParseResult/ParseIncrementalResult.inferencesderived table viaCREATE TABLE IF NOT EXISTSmigration inmigrate_burn_schema.LedgerHandle::inferences(opts)+ free function. Materialization happens inapply_parsed_extrasviabuild_inferences(turns, request_id_lookup)andLedger::append_inferences.state statussurfaces inference count via newBurnDbRowCounts.inferencesfield.Bug found and fixed along the way
The existing Claude parser's usage-merge for multi-block messages updated
usage_coveragefrom later carrier rows but NOTusageitself. If the carrier row arrived second or later, its tokens were silently dropped. Fixed:WorkingRecord.usagenow overwrites from any row carrying theusageblock. The existingmulti_block_turn_keeps_usage_oncetest continues to pass because that fixture putsusageon row 1.Restraint on
turn_countsemanticsKept
Summary.turn_countas-is. Burn's reader already collapses to oneTurnRecordpermessage.id, and Claude'srequestId ↔ message.idis 1:1, so per-API-call counts are already correct. Switchingturn_countto inference-keyed would be a no-op for Claude and a breaking change for Codex/opencode (norequestId). The newInferenceunit is additive.Downstream callers audited
analyze/hotspots.rs— per-tool attribution keys ontool_use_id(no change needed).analyze/tool_output_bloat.rs— usesmodel_by_message_id, not row counts (no change).Summary.turn_count— already correct per above.Merge note
This PR bumps schema v3 → v4 (assuming #444 is in main, which it is). #435 (subagent pairing) ALSO uses v4 in its branch. Whichever lands second needs to become v5 with a chained
ALTER TABLE. Migration shape is idempotent — mechanical reconcile. Doc comment inschema.rscalls out the renumber path.Test plan
cargo build --workspacecleancargo test --workspace— 813 passed (SDK lib alone: 693), 0 failedBURN_GOLDEN=1 cargo test --test golden— 5/5state-status.stdout.txt+state-status-json.stdout.txt— schema 3→4, newinferences: 0row (cli-golden fixture is JSONL bootstrap only; populated ledgers will see real counts)Out of scope
turn_countsemantics (see "Restraint" above).#[non_exhaustive]on new types.Generated by Claude Code