Skip to content

sdk: per-turn span tree as analytical primitive (#430)#451

Merged
willwashburn merged 3 commits into
mainfrom
claude/burn-430-span-tree-foundation
May 26, 2026
Merged

sdk: per-turn span tree as analytical primitive (#430)#451
willwashburn merged 3 commits into
mainfrom
claude/burn-430-span-tree-foundation

Conversation

@willwashburn
Copy link
Copy Markdown
Member

Closes #430.

Summary

Introduces a per-turn span tree as a derived analytical primitive. Pure projection from TurnRecord + tool_result_event rows + (optional) subagent transcripts. No schema changes, no caching — always re-derived per call.

Foundation for inference-flow DAG (#431), context-delta attribution (#432), and several other downstream features.

New types (crates/relayburn-sdk/src/analyze/span_tree.rs)

pub enum SpanKind { Turn, Inference, ToolUse, Subagent, Skill, UserPrompt, ToolResult }
pub enum SpanStatus { Ok, Error { msg: String } }
pub struct SpanEvent { ts, name, attributes }
pub enum AttrValue { String, Int, Float, Bool }
pub struct SpanNode { kind, name, start_ms, end_ms, status, attributes, events, children }
pub struct TurnSpanTree { session_id, turn_id, turn_number, root: SpanNode }

Serde with kebab-case wire form, matching ActivityCategory / StopReason convention.

Attribute schema (locked in module doc)

tokens.input, tokens.output, tokens.cache_read, tokens.cache_write, tokens.reasoning, model, request_id, agent_id, tool_use_id, cwd, mode, stop_reason. Downstream consumers can rely on these.

Status mapping

  • tool_use.is_error == trueError { msg: <tool error message> }
  • stop_reason == RefusalError { msg: "refusal" }
  • stop_reason == MaxTokensError { msg: "max_tokens" }
  • Otherwise → Ok

Errors propagate from leaf to root.

Claude builder (reader/claude/span_tree.rs)

Builds the full hierarchy: Turn (root) → UserPrompt + N Inferences; each Inference → ToolUses; each ToolUse → ToolResult + optional Subagent subtree.

Codex builder (reader/codex/span_tree.rs)

Limited but honest about it. Codex rollouts expose strictly less hierarchy than Claude:

  • No requestId → one Inference per turn keyed by message_id (matches InferenceKeySource::MessageId fallback)
  • No subagent sidecars → no Subagent spans
  • No stop_reason on assistant rows → root status defaults to Ok unless a child tool errors

Produces: Turn → {UserPrompt, Inference → ToolUse → ToolResult}. Same attribute keys, same status mapping. Documented in module preamble.

SDK verbs

impl LedgerHandle {
    pub fn turn_span_tree(&self, session_id: &str, turn_id: &str) -> Result<TurnSpanTree>;
    pub fn session_span_trees(&self, session_id: &str) -> Result<Vec<TurnSpanTree>>;
}

Plus free-function forms for embedders.

Test plan

29 new tests covering all 6 acceptance cases from the issue:

  • Single-inference turn → well-formed tree, scalar sums match TurnRecord within rounding
  • Multi-inference turn (multiple requestIds) → multiple Inference children
  • Tool_use with paired result → ToolResult nested under ToolUse
  • Tool_use with unpaired subagent → orphan handling per chosen semantics
  • MaxTokens turn → root status Error { msg: "max_tokens" }
  • Error turn (refusal or tool_use.is_error) → propagates to root

Breakdown:

  • 12 type tests in analyze/span_tree.rs
  • 10 Claude builder tests in reader/claude/span_tree.rs
  • 3 Codex builder tests in reader/codex/span_tree.rs
  • 4 LedgerHandle integration tests in query_verbs.rs

Workspace: cargo build --workspace clean (zero warnings). cargo test --workspace — 871 passed, 0 failed. BURN_GOLDEN=1 cargo test --test golden — 5/5.

Deferred to follow-ups

  • Node SDK TS facade (packages/sdk-node/src/index.d.ts mirror + napi bridge). Issue lists it as out of scope. Rust surface lands first.
  • MCP burn__turnSpanTree tool — explicitly out of scope per issue.
  • Content sidecar reads for UserPrompt text body and ToolResult attached file content. The spans are emitted as structural placeholders; future PRs can populate the content without changing tree shape.
  • Caching to burn.sqlite — issue's default position: "always derive; cache only if profiling demands it." No caching this round.

Out of scope

Files

New (3): analyze/span_tree.rs (583L), reader/claude/span_tree.rs (1036L), reader/codex/span_tree.rs (498L). 29 tests.
Modified (7): analyze.rs, lib.rs, query_verbs.rs, reader.rs, reader/claude.rs, reader/codex.rs, CHANGELOG.md.


Generated by Claude Code

Project the per-turn hierarchy that flat row records can't express:
Turn -> { UserPrompt, Inference -> ToolUse -> { ToolResult, Subagent } }.
Pure projection over TurnRecord + tool_result_event rows + Claude
subagent sidecars - no schema change, no caching, derive on every call.

- analyze/span_tree.rs: SpanKind / SpanStatus / AttrValue / SpanEvent /
  SpanNode / TurnSpanTree types with locked-in attribute schema
  (tokens.*, model, request_id, agent_id, tool_use_id, stop_reason)
  documented in the module preamble. Kebab-case wire form matches the
  existing repo convention.
- reader/claude/span_tree.rs: harness builder that consumes the #448
  Inference aggregates (falls back to a synthetic single-inference for
  pre-v5 ledgers), pairs tool_result events by tool_use_id, and nests
  paired subagents under their Task ToolUse. Unpaired subagents
  surface as sibling Subagent spans under the Turn root with
  attributes["unattached"] = true.
- reader/codex/span_tree.rs: equivalent builder for Codex rollouts.
  Codex carries strictly less hierarchy (no requestId, no sidecar
  transcripts, no stop_reason), so the builder is documented as
  limited and produces Turn -> { UserPrompt, Inference -> ToolUse }
  without fabricating data.
- query_verbs.rs: LedgerHandle::turn_span_tree(session_id, turn_id) and
  session_span_trees(session_id) verbs + free-function forms; source
  dispatch picks the right per-harness builder.

Status mapping: tool_use.is_error -> "tool_error" on the ToolUse span,
bubbles to parent Inference and root as "child_error"; stop_reason ==
Refusal -> "refusal" on root; stop_reason == MaxTokens ->
"max_tokens" on root.

Tests: 29 new (12 type / 10 Claude builder / 3 Codex builder /
4 LedgerHandle integration), covering every acceptance case in the
issue. cargo test --workspace passes 871 tests; BURN_GOLDEN=1
cargo test --test golden passes 5.

https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8c453455-140f-4b62-8e01-4026e40790ed

📥 Commits

Reviewing files that changed from the base of the PR and between 9422e88 and 641083c.

📒 Files selected for processing (10)
  • CHANGELOG.md
  • crates/relayburn-sdk/src/analyze.rs
  • crates/relayburn-sdk/src/analyze/span_tree.rs
  • crates/relayburn-sdk/src/lib.rs
  • crates/relayburn-sdk/src/query_verbs.rs
  • crates/relayburn-sdk/src/reader.rs
  • crates/relayburn-sdk/src/reader/claude.rs
  • crates/relayburn-sdk/src/reader/claude/span_tree.rs
  • crates/relayburn-sdk/src/reader/codex.rs
  • crates/relayburn-sdk/src/reader/codex/span_tree.rs

📝 Walkthrough

Walkthrough

This PR introduces a per-turn span tree analytical primitive for the relayburn SDK. It projects existing ledger data (TurnRecord, tool-result events, subagent transcripts) into an OTel-style span hierarchy without schema changes or caching, enabling downstream analyses to work with tree structure instead of flat rows. Includes builders for Claude Code and Codex, ledger query verbs, and full SDK surface exposure.

Changes

Per-turn span tree analytical primitive

Layer / File(s) Summary
Span tree schema and core types
crates/relayburn-sdk/src/analyze/span_tree.rs, crates/relayburn-sdk/src/analyze.rs
Defines SpanKind, SpanStatus, AttrValue, SpanEvent, SpanNode with builder methods and DFS traversal, plus TurnSpanTree container with per-turn identifiers and sum_attr_int aggregation utility. Includes serde-round-trip tests for wire formats and traversal semantics.
Claude Code span tree builder
crates/relayburn-sdk/src/reader/claude/span_tree.rs, crates/relayburn-sdk/src/reader/claude.rs
Constructs span trees for Claude Code by grouping inferences per request_id, pairing ToolResultEventRecord by tool_use_id, nesting ToolUse with ToolResult and optional Subagent spans, synthesizing inferences when missing, propagating error state, and mapping stop reasons to root errors. Token attributes attached to Inference only.
Codex span tree builder
crates/relayburn-sdk/src/reader/codex/span_tree.rs, crates/relayburn-sdk/src/reader/codex.rs
Constructs span trees for Codex with a fixed UserPrompt child and inference children (synthesized if empty), nesting ToolUse spans with optional ToolResult children, propagating error state and stop-reason status, and widening end_ms based on tool-result timestamps.
Ledger query verbs and subagent integration
crates/relayburn-sdk/src/query_verbs.rs
Adds LedgerHandle::turn_span_tree(session_id, turn_id) and session_span_trees(session_id) methods that bulk-load per-session data, group by message_id, conditionally discover subagent sidecars from filesystem, bucket subagent transcripts per turn using tool-use pairing or orphan timestamp rule, and dispatch to harness-specific builders. Includes ISO timestamp parsing, subagent filesystem discovery, and comprehensive integration + unit tests.
Public API surface and documentation
crates/relayburn-sdk/src/lib.rs, crates/relayburn-sdk/src/reader.rs, CHANGELOG.md
Re-exports span tree types and builder functions at SDK root and intermediate boundaries; CHANGELOG documents projection semantics, orphan subagent behavior, and locked attribute keys for downstream consumers.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • AgentWorkforce/burn#445: Introduces stop_reason API/schema for TurnRecord, which this PR uses directly to map stop reasons (like Refusal/MaxTokens) into root span error state.

Poem

🐰 A tree grows in the span-light,
Where turns unfold in structured height—
Each tool and task finds its place,
Orphan subagents leave a trace.
Ledgers bloom with OTel grace! 🌳✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/burn-430-span-tree-foundation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a93c6e6cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/relayburn-sdk/src/query_verbs.rs Outdated
turn,
tool_result_events: &events_for_turn,
inferences: &infs_for_turn,
subagents: &subagents,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict subagent sidecars to the active turn

session_span_trees passes the full session-level subagents slice into every Claude turn build, but build_claude_span_tree treats paired_tool_use_id == None as an unattached root child. That means any orphan sidecar is duplicated into every turn in the session, inflating per-turn trees and any downstream counts/cost rollups derived from those nodes. Filter subagents per turn (or otherwise assign unattached sidecars once) before calling the builder.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a3086ef. session_span_trees now buckets the session-wide subagents slice per turn before calling build_claude_span_tree, so each sidecar lands in exactly one turn. Paired sidecars route by tool_use_id; orphans go to the latest turn whose start_ms <= subagent_start_ms (falling back to the first turn). The orphan-semantics choice (sibling under turn root with attributes["unattached"] = true) stands — only the duplication is fixed. New bucket_subagents_per_turn unit test + a session_span_trees regression test cover paired + orphan placement with an explicit no-duplication assertion.


Generated by Claude Code

Comment on lines +359 to +361
if tool_node.end_ms < result_node.end_ms {
tool_node.end_ms = result_node.end_ms;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate tool-result end time to Claude inference span

This branch extends tool_node.end_ms from the paired ToolResult timestamp, but the parent Inference span (node.end_ms) is never updated to that later value. Because root end time is computed from inference end times, turns with tool results after the assistant row will report truncated durations. Update the inference end to max(node.end_ms, tool_node.end_ms) before pushing the child.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a3086ef. After widening tool_node.end_ms from the paired ToolResult and pushing the tool_use child, the Claude builder now propagates node.end_ms = node.end_ms.max(tool_node.end_ms) so the parent Inference (and the turn root, which rolls up inference ends) reflects the trailing ToolResult timestamp. New regression test tool_result_after_assistant_row_widens_inference_and_root_end_ms covers it.


Generated by Claude Code

Comment on lines +232 to +234
if tool_node.end_ms < result_node.end_ms {
tool_node.end_ms = result_node.end_ms;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate tool-result end time to Codex inference span

The Codex builder has the same timing bug: tool_node.end_ms is widened to the ToolResult timestamp, but the parent Inference end is left unchanged. As a result, the inferred turn end can be earlier than its tool-result child, which skews duration-based analysis for Codex sessions. Mirror the child end back into node.end_ms while iterating tool uses.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a3086ef. The Codex builder now applies the same fix as Claude — node.end_ms = node.end_ms.max(tool_node.end_ms) after widening the tool_use end from the ToolResult and before pushing the child. New regression test codex_tool_result_after_assistant_row_widens_inference_and_root_end_ms covers Codex sessions.


Generated by Claude Code

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 10 files

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

Comment thread crates/relayburn-sdk/src/query_verbs.rs Outdated
Comment thread crates/relayburn-sdk/src/analyze/span_tree.rs Outdated
Comment thread crates/relayburn-sdk/src/reader/codex/span_tree.rs
Comment thread crates/relayburn-sdk/src/reader/claude/span_tree.rs
…, drop unsound Eq

Address PR #451 review findings:

- session_span_trees passed the session-wide subagents slice into every
  turn, so the Claude builder duplicated each orphan sidecar into every
  turn tree. Pre-bucket subagents so each lands in exactly one turn:
  paired sidecars route by tool_use_id; orphans go to the latest turn
  whose start <= subagent_start (first turn if none precede). The
  orphan-semantics decision (sibling under the turn root, attributes
  unattached=true) stands.

- Claude and Codex builders widened tool_node.end_ms from a later
  ToolResult timestamp but left the parent Inference end_ms unchanged,
  so turns reported truncated durations once the root rolled up its
  inference children. Propagate the widened end up to the inference
  span before pushing the tool_use child.

- impl Eq for AttrValue violated reflexivity (AttrValue::Float(f64),
  NaN != NaN). Drop the impl. BTreeMap<String, AttrValue> only needs
  Ord on its keys, so no consumer required Eq.

Tests:
- bucket_subagents_per_turn unit test covers paired + three orphan
  placements (mid, early, late) with a no-duplication assertion.
- session_span_trees regression test pins the no-duplication contract
  end-to-end.
- Claude + Codex span-tree builder tests assert Inference and root
  end_ms widen to a trailing ToolResult timestamp.

https://claude.ai/code/session_01QEpNZbWEYNwxzqQjTN5LCY
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/relayburn-sdk/src/reader/codex/span_tree.rs">

<violation number="1" location="crates/relayburn-sdk/src/reader/codex/span_tree.rs:245">
P2: Inference end update inside loop leaks into later ToolUse defaults. Per-tool end_ms becomes order-dependent. Track inference max separately and apply after building children.</violation>
</file>

<file name="crates/relayburn-sdk/src/query_verbs.rs">

<violation number="1" location="crates/relayburn-sdk/src/query_verbs.rs:4363">
P2: Orphan-to-turn match uses row order, not max timestamp. Wrong turn can get the subagent when rows are out of time order. Pick the greatest `turn_start <= subagent_start` instead of reverse-find.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

// turns whose tool_result timestamps trail the assistant row
// don't underreport duration (the turn root rolls up its
// inference children's end_ms).
if node.end_ms < tool_node.end_ms {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Inference end update inside loop leaks into later ToolUse defaults. Per-tool end_ms becomes order-dependent. Track inference max separately and apply after building children.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/relayburn-sdk/src/reader/codex/span_tree.rs, line 245:

<comment>Inference end update inside loop leaks into later ToolUse defaults. Per-tool end_ms becomes order-dependent. Track inference max separately and apply after building children.</comment>

<file context>
@@ -238,6 +238,13 @@ fn build_inference_node(
+        // turns whose tool_result timestamps trail the assistant row
+        // don't underreport duration (the turn root rolls up its
+        // inference children's end_ms).
+        if node.end_ms < tool_node.end_ms {
+            node.end_ms = tool_node.end_ms;
+        }
</file context>

Comment on lines +4363 to +4369
Some(sa_ms) => turn_starts
.iter()
.enumerate()
.rev()
.find(|(_, ts)| **ts <= sa_ms)
.map(|(i, _)| i)
.unwrap_or(0),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Orphan-to-turn match uses row order, not max timestamp. Wrong turn can get the subagent when rows are out of time order. Pick the greatest turn_start <= subagent_start instead of reverse-find.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/relayburn-sdk/src/query_verbs.rs, line 4363:

<comment>Orphan-to-turn match uses row order, not max timestamp. Wrong turn can get the subagent when rows are out of time order. Pick the greatest `turn_start <= subagent_start` instead of reverse-find.</comment>

<file context>
@@ -4292,6 +4313,146 @@ impl LedgerHandle {
+            // when the sidecar carries no parseable timestamp.
+            let sa_start_ms = first_record_ts_ms(&sa.records);
+            assigned = Some(match sa_start_ms {
+                Some(sa_ms) => turn_starts
+                    .iter()
+                    .enumerate()
</file context>
Suggested change
Some(sa_ms) => turn_starts
.iter()
.enumerate()
.rev()
.find(|(_, ts)| **ts <= sa_ms)
.map(|(i, _)| i)
.unwrap_or(0),
Some(sa_ms) => turn_starts
.iter()
.enumerate()
.filter(|(_, ts)| **ts <= sa_ms)
.max_by_key(|(_, ts)| **ts)
.map(|(i, _)| i)
.unwrap_or(0),

@willwashburn willwashburn merged commit 131e31e into main May 26, 2026
1 of 2 checks passed
@willwashburn willwashburn deleted the claude/burn-430-span-tree-foundation branch May 26, 2026 11:12
willwashburn pushed a commit that referenced this pull request May 26, 2026
In-scope #452 fixes (A-G):

- `burn overhead deltas` now honors `--since` (A): thread the parent
  overhead args' `since` into `ContextDeltaOpts`, parse relative ranges
  (`24h`/`7d`/`4w`/`2m`) into `Duration`, and use that to scope the
  session-enumeration `query_turns` seed inside the SDK so the
  `None`-session path no longer walks every historical session.
- Make the per-rail and cross-session `ContextDelta` sort fully
  deterministic across HashMap iteration order (B): chain
  `owner_rail` then `session_id` as final tie-breakers.
- UTF-8-safe `short_turn_label` / `short_agent_label` (C): switch from
  byte slicing to `chars().take(8)` so multi-byte ids never panic.
- `format_signed_tokens` preserves the negative sign (D): emit `-`
  for negative deltas instead of dropping it.
- Sort compactions with `sort_by_cached_key` (E) so `parse_iso_ms`
  runs once per element rather than once per comparison.
- Dedup four copies of `parse_iso_ms` (F) into
  `crates/relayburn-sdk/src/util/time.rs`; the analyze/context_delta,
  query_verbs, reader/claude, and reader/codex copies now share one
  implementation.
- `read_jsonl_values` streams via `BufReader::lines()` (G) rather
  than reading the entire file into memory.

Foundation fixes (carried in #452's diff; will also land on #451):

- Propagate `output_truncated` on `ToolResult` span nodes (H) in
  both the Claude and Codex builders so downstream consumers can
  flag truncated tool outputs.
- Propagate `ToolResult` error status to the parent `ToolUse` (I, J)
  in both builders — the runtime tool_result is the ground truth,
  not the assistant row's `is_error` hint.
- Don't drop subagents whose `paired_tool_use_id` doesn't match any
  ToolUse in the turn (K): drain leftover `paired_subagents` after
  the inference walk and surface them as `unattached` siblings under
  the turn root.
- Stop swallowing real ledger-read failures (M): replace the blanket
  `unwrap_or_default()` on `query_inferences` /
  `query_tool_result_events` with a `match` that tolerates only the
  pre-schema "no such table/column" class of error and propagates
  every other failure.

Tests: `cargo test --workspace` (zero failures, +2 new tests in the
`query_verbs` mod for the since filter and helper). Zero warnings on
`cargo build --workspace`.
willwashburn added a commit that referenced this pull request May 26, 2026
* sdk/cli: per-inference context-delta attribution (#432)

New `burn overhead deltas` verb answers "what blew up my context
between inference N and inference N+1?" by walking each session's
TurnSpanTree timeline, pairing same-rail Inference spans, and
attributing the delta in `tokens.input + cache_read + cache_write`
to the intervening ToolResult / UserPrompt / SystemReminder leaves.

SDK surface: `LedgerHandle::context_delta(ContextDeltaOpts)` returns
`Vec<ContextDelta>` with per-step intervening breakdown, attributed
cost (charged at cache_read rate — what the *future* will pay for
the persisted prefix), and compaction events surfaced as their own
row rather than negative deltas. Main-rail deltas never see subagent
tool_results and vice versa.

Tool-result token estimates use `output_bytes / 4` as a first-cut
fallback; documented as approximate in the output.

CLI: `burn overhead deltas [--session ID] [--top N] [--min-delta TOK]
[--owner main|subagent|all] [--explain] [--json]`. Default top is 20,
default min_delta is 1000 tokens. Compaction rows ignore min_delta so
they always surface.

Tests: 9 unit tests covering the Bash blow-up driver path, compaction-
replaces-negative-delta, subagent isolation, owner filter, top cap,
min_delta filter, single-inference no-op, and JSON wire format. Two
golden snapshots (`overhead-deltas`, `overhead-deltas-json`) anchor
the CLI output against the fixture ledger.

* fix(context-delta): address review feedback on PR #452

In-scope #452 fixes (A-G):

- `burn overhead deltas` now honors `--since` (A): thread the parent
  overhead args' `since` into `ContextDeltaOpts`, parse relative ranges
  (`24h`/`7d`/`4w`/`2m`) into `Duration`, and use that to scope the
  session-enumeration `query_turns` seed inside the SDK so the
  `None`-session path no longer walks every historical session.
- Make the per-rail and cross-session `ContextDelta` sort fully
  deterministic across HashMap iteration order (B): chain
  `owner_rail` then `session_id` as final tie-breakers.
- UTF-8-safe `short_turn_label` / `short_agent_label` (C): switch from
  byte slicing to `chars().take(8)` so multi-byte ids never panic.
- `format_signed_tokens` preserves the negative sign (D): emit `-`
  for negative deltas instead of dropping it.
- Sort compactions with `sort_by_cached_key` (E) so `parse_iso_ms`
  runs once per element rather than once per comparison.
- Dedup four copies of `parse_iso_ms` (F) into
  `crates/relayburn-sdk/src/util/time.rs`; the analyze/context_delta,
  query_verbs, reader/claude, and reader/codex copies now share one
  implementation.
- `read_jsonl_values` streams via `BufReader::lines()` (G) rather
  than reading the entire file into memory.

Foundation fixes (carried in #452's diff; will also land on #451):

- Propagate `output_truncated` on `ToolResult` span nodes (H) in
  both the Claude and Codex builders so downstream consumers can
  flag truncated tool outputs.
- Propagate `ToolResult` error status to the parent `ToolUse` (I, J)
  in both builders — the runtime tool_result is the ground truth,
  not the assistant row's `is_error` hint.
- Don't drop subagents whose `paired_tool_use_id` doesn't match any
  ToolUse in the turn (K): drain leftover `paired_subagents` after
  the inference walk and surface them as `unattached` siblings under
  the turn root.
- Stop swallowing real ledger-read failures (M): replace the blanket
  `unwrap_or_default()` on `query_inferences` /
  `query_tool_result_events` with a `match` that tolerates only the
  pre-schema "no such table/column" class of error and propagates
  every other failure.

Tests: `cargo test --workspace` (zero failures, +2 new tests in the
`query_verbs` mod for the since filter and helper). Zero warnings on
`cargo build --workspace`.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sdk: introduce per-turn span tree as analytical primitive

2 participants