overhead: per-inference context-delta attribution (#432) by willwashburn · Pull Request #452 · AgentWorkforce/burn

willwashburn · 2026-05-26T11:42:16Z

Closes #432.

Depends on #451 (span tree foundation, #430) being in main first. Branch is currently based on claude/burn-430-span-tree-foundation; will rebase mechanically onto main once #451 lands.

Summary

New burn overhead deltas --session <id> verb. For each inference in a session, computes the context_tokens delta from the prior same-owner inference, attributes the delta to intervening tool_results / user_prompts / system_reminders / compaction events. Answers "what blew up my context between inference 5 and inference 6".

Pure derivation from span trees + content sidecars. No DB writes, no schema changes.

SDK

pub fn LedgerHandle::context_delta(&self, opts: ContextDeltaOpts) -> Result<Vec<ContextDelta>>;

ContextDelta carries session_id, turn_id, inference_idx, owner_rail (Main | Subagent(agent_id)), prior_context_tokens, current_context_tokens, delta_tokens, intervening: Vec<InterveningStep>, and attributed_cost_usd.

InterveningStep variants: ToolResult { tool_use_id, tool_name, approx_tokens, approx_bytes, truncated }, UserPrompt, SystemReminder (reserved — see approximations below), Compaction { tokens_freed }, Other.

CLI

burn overhead deltas
    [--session <id>]
    [--since <duration>]         # default 24h
    [--top N]                    # default 20
    [--min-delta TOKENS]         # default 1000
    [--owner main|subagent|all]  # default all
    [--json]
    [--explain]                  # expand to all intervening steps per delta

Human render: Inference | Owner | Δtokens | Δcost | Driver table where Driver = largest intervening step (or "N steps" with --explain). JSON: full Vec<ContextDelta>.

Key decisions

Cost rate: max(delta_tokens, 0) * curr.model.cache_read per million. Rationale (documented inline): cache_read is what every future inference pays for the persisted prefix the delta added. cache_write is paid once; cache_read is the steady-state cost the user keeps paying. Matches the issue's open-question #3 recommendation. Cost is 0.0 for models the pricing table doesn't recognize, matching the rest of analyze/.

Subagent rail isolation: each Subagent span flips the active owner rail to OwnerRail::Subagent { agent_id } for its entire subtree during DFS. Leaves under a Task fanout never enter the main-rail timeline. Tested directly by subagent_isolation_main_rail_excludes_subagent_results.

Compaction handling: when delta < 0 and a CompactionEvent sits between prev and curr, surface as Compaction { tokens_freed } instead of a negative delta. Compaction rows bypass min_delta filtering so a small compaction never silently vanishes. Tested by compaction_replaces_negative_delta.

Known approximations (documented in source + output)

Tool-result token estimates use output_bytes / 4 — no real tokenizer pass yet
All <system-reminder> content classified as ReminderSource::Other for first-cut; relaycast/harness classification deferred to overhead: account for <system-reminder> injection cost (relaycast, harness, …) #425
SystemReminder InterveningStep variant is reserved (#[allow(dead_code)]) until the span-tree builder synthesizes those leaves

Test plan

cargo test --workspace — 891 passed, 0 failed, zero warnings on build
BURN_GOLDEN=1 cargo test --test golden — passing including 2 new snapshots (overhead-deltas, overhead-deltas-json)
9 SDK unit tests in context_delta.rs covering: known >10k jump via Bash result, compaction replaces negative delta, subagent rail isolation, single-inference graceful handling, --min-delta filter, --top cap
4 CLI unit tests in overhead.rs covering renderer output shape

Out of scope

Real tokenizer (use bytes/4 fallback; documented)
Relaycast/harness source detection (issue overhead: account for <system-reminder> injection cost (relaycast, harness, …) #425)
MCP tool — separate follow-up
One-line "Largest context jump" entry in burn summary — separate follow-up
#[non_exhaustive]

Files

New (1): crates/relayburn-sdk/src/analyze/context_delta.rs
Modified (5): analyze.rs, lib.rs, query_verbs.rs, cli.rs (deltas args), commands/overhead.rs (runner + renderers), CHANGELOG.md, plus 2 new golden snapshots + invocations entry.

Generated by Claude Code

coderabbitai · 2026-05-26T11:42:27Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This PR implements per-inference context-window delta attribution by adding a shared ISO-8601 timestamp parser, improving span-tree builders with error handling and truncation tracking, introducing a context-delta analysis algorithm that pairs consecutive inferences and attributes intervening steps (tool results, user prompts, compactions), exposing SDK query verbs, and delivering a new burn overhead deltas CLI command with human-readable and JSON output formats.

Changes

Per-inference context-delta attribution pipeline

Layer / File(s)	Summary
Shared ISO-8601 timestamp parser `crates/relayburn-sdk/src/util.rs`, `crates/relayburn-sdk/src/util/time.rs`	New `parse_iso_ms` utility converts ISO-8601 timestamps to Unix milliseconds, replacing duplicate local implementations across span-tree readers with a centralized, tested parser.
Span-tree builder improvements `crates/relayburn-sdk/src/reader/claude/span_tree.rs`, `crates/relayburn-sdk/src/reader/codex/span_tree.rs`, `crates/relayburn-sdk/src/query_verbs.rs`	Builders switch to shared timestamp parser, emit orphan subagents with `unattached` flag (Claude), propagate tool-result errors to parent ToolUse spans, record `output_truncated` attributes. Query verbs add schema-missing fail-soft logic and refactor JSONL streaming.
Context-delta attribution algorithm `crates/relayburn-sdk/src/analyze/context_delta.rs`	New module implements `deltas_for_session` that builds DFS timelines, pairs consecutive inferences per rail, collects intervening steps (ToolResult, UserPrompt, SystemReminder, Compaction), derives token deltas, handles negative deltas via compaction replacement/clamping, computes attributed USD cost, and filters/sorts/truncates output. Includes extensive unit tests.
SDK API surface `crates/relayburn-sdk/src/analyze.rs`, `crates/relayburn-sdk/src/lib.rs`	Declares `pub mod context_delta` and re-exports core types and functions. Root lib.rs adds `util` module and context-delta re-exports with type aliases for ergonomic imports.
Context-delta query verb `crates/relayburn-sdk/src/query_verbs.rs`	Adds `LedgerHandle::context_delta` method and free-function, selecting session scopes, loading span-trees and compactions, computing deltas, and cross-session sorting. Includes filtering and since-window tests.
CLI overhead deltas arguments and routing `crates/relayburn-cli/src/cli.rs`	Extends `OverheadAction` with `Deltas(OverheadDeltasArgs)` variant. Adds `OverheadDeltasArgs` struct with `--session`, `--top`, `--min-delta`, `--owner`, `--explain`, `--json` flags and `OverheadDeltasOwner` enum with SDK filter mapping.
Overhead deltas command and rendering `crates/relayburn-cli/src/commands/overhead.rs`	Implements `run_deltas` handler building `ContextDeltaOpts`, invoking SDK verb, and dispatching to output renderers. Provides `render_human_deltas` table with Inference/Owner/Δ/Cost/Driver columns, optional explain sections, and helper functions for label formatting, driver selection, and step stringification. Unit tests cover label trimming, compaction prioritization, and token formatting.
Documentation and integration tests `CHANGELOG.md`, `tests/fixtures/cli-golden/invocations.json`, `tests/fixtures/cli-golden/snapshots/overhead-deltas*.stdout.txt`	Documents new CLI flags and SDK API. Adds golden-fixture invocations for human and JSON output variants. Provides snapshot outputs showing expected table format and JSON structure with realistic token/cost values and approximation notes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

AgentWorkforce/burn#432: This PR implements the complete per-inference context-delta attribution feature, including the burn overhead deltas CLI, SDK algorithm, intervening-step attribution, and compaction handling specified in the issue.

Possibly related PRs

AgentWorkforce/burn#309: The main PR implements the previously stubbed burn overhead command, adding the OverheadAction::Deltas path and run_deltas handler that builds on the CLI scaffold.
AgentWorkforce/burn#312: Both PRs extend the shared burn overhead CLI wiring in crates/relayburn-cli/src/cli.rs and command module, with this PR introducing the new OverheadAction::Deltas variant.

Poem

🐰 With careful hop through inference trees,
We trace the tokens, find the keys—
Which tool result made context bloom?
Context deltas light the room,
Per-step accounting, clear and bright! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'overhead: per-inference context-delta attribution (`#432`)' clearly and concisely summarizes the main change: adding per-inference context-delta attribution functionality to the overhead command.
Description check	✅ Passed	The PR description is comprehensive and directly related to the changeset, providing implementation details, architecture decisions, test coverage, and approximations used throughout the changes.
Linked Issues check	✅ Passed	The PR successfully implements all major objectives from issue `#432`: per-inference context delta computation via `LedgerHandle::context_delta()`, intervening step attribution, cost calculation using cache_read rate, compaction handling with min_delta bypass, subagent rail isolation, and both human and JSON CLI outputs with comprehensive test coverage.
Out of Scope Changes check	✅ Passed	The PR introduces comprehensive changes to support context-delta functionality including new SDK APIs, CLI commands, span-tree improvements for error tracking, and refactored shared timestamp parsing, all directly supporting the core feature. The refactoring of duplicate `parse_iso_ms` implementations into shared utilities is a necessary supporting change.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/burn-432-context-delta

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces per-inference context-window delta attribution and per-turn span trees as derived analytical primitives. It adds the burn overhead deltas CLI command (and corresponding SDK entry point LedgerHandle::context_delta) to attribute context-window growth to intervening steps like tool results, user prompts, system reminders, or compactions. It also implements per-turn span tree builders for Claude Code and Codex rollouts to project hierarchical turn structures. The reviewer feedback highlights several important improvement opportunities: preventing potential panics from byte-slicing UTF-8 strings in label helpers, fixing a formatting bug for negative tokens in format_signed_tokens, optimizing sorting performance using sort_by_cached_key, centralizing the duplicated parse_iso_ms utility, and improving memory efficiency when reading JSONL files by utilizing a BufReader.

gemini-code-assist · 2026-05-26T11:43:43Z

+fn short_turn_label(turn_id: &str) -> String {
+    // Turn ids on Claude are `msg-...` UUIDs; trim to a short prefix
+    // for the table. Keep the original for JSON output.
+    let trimmed = turn_id.trim_start_matches("msg_");
+    let trimmed = trimmed.trim_start_matches("msg-");
+    if trimmed.len() > 8 {
+        format!("T{}", &trimmed[..8])
+    } else {
+        format!("T{trimmed}")
+    }
+}


The short_turn_label function uses string slicing &trimmed[..8] which can panic if the 8th byte is not a character boundary (e.g., if the string contains multi-byte UTF-8 characters). Using .chars().take(8).collect() is safer and avoids potential panics.

fn short_turn_label(turn_id: &str) -> String { // Turn ids on Claude are `msg-...` UUIDs; trim to a short prefix // for the table. Keep the original for JSON output. let trimmed = turn_id.trim_start_matches("msg_").trim_start_matches("msg-"); let short: String = trimmed.chars().take(8).collect(); format!("T{short}") }

Fixed in cf3c61e — short_turn_label now uses chars().take(8).collect() so multi-byte UTF-8 ids never panic on a mid-byte cut.

Generated by Claude Code

gemini-code-assist · 2026-05-26T11:43:43Z

+fn short_agent_label(agent_id: &str) -> String {
+    let trimmed = agent_id.trim_start_matches("agent-");
+    if trimmed.len() > 8 {
+        trimmed[..8].to_string()
+    } else {
+        trimmed.to_string()
+    }
+}


The short_agent_label function uses string slicing trimmed[..8] which can panic if the 8th byte is not a character boundary. Using .chars().take(8).collect() is safer and avoids potential panics.

fn short_agent_label(agent_id: &str) -> String { let trimmed = agent_id.trim_start_matches("agent-"); trimmed.chars().take(8).collect() }

Fixed in cf3c61e — short_agent_label now uses chars().take(8).collect() (same fix as short_turn_label).

Generated by Claude Code

gemini-code-assist · 2026-05-26T11:43:43Z

+fn format_signed_tokens(n: i64) -> String {
+    let sign = if n > 0 { "+" } else { "" };
+    format!("{sign}{}", format_tokens(n.unsigned_abs()))
+}


The format_signed_tokens function does not handle negative numbers correctly. If n is negative, sign is set to "" and n.unsigned_abs() is formatted, resulting in a positive string representation (e.g., -500 becomes "500"). It should format with a "-" sign for negative numbers.

fn format_signed_tokens(n: i64) -> String { let sign = if n > 0 { "+" } else if n < 0 { "-" } else { "" }; format!("{sign}{}", format_tokens(n.unsigned_abs())) }

Fixed in cf3c61e — format_signed_tokens now emits - for negative deltas instead of dropping the sign.

Generated by Claude Code

gemini-code-assist · 2026-05-26T11:43:43Z

+    let mut compactions_sorted: Vec<&CompactionEvent> = compactions.iter().collect();
+    compactions_sorted.sort_by_key(|c| parse_iso_ms(&c.ts).unwrap_or(0));


The sort_by_key method executes the key extraction function parse_iso_ms for every comparison, which is $O(N \log N)$ times. Since parse_iso_ms involves relatively expensive string parsing and calendar calculations, using sort_by_cached_key is much more efficient as it evaluates the key exactly once per element ($O(N)$ times).

Suggested change

let mut compactions_sorted: Vec<&CompactionEvent> = compactions.iter().collect();

compactions_sorted.sort_by_key(|c| parse_iso_ms(&c.ts).unwrap_or(0));

let mut compactions_sorted: Vec<&CompactionEvent> = compactions.iter().collect();

compactions_sorted.sort_by_cached_key(|c| parse_iso_ms(&c.ts).unwrap_or(0));

Fixed in cf3c61e — switched to sort_by_cached_key so parse_iso_ms runs once per element instead of once per comparison.

Generated by Claude Code

gemini-code-assist · 2026-05-26T11:43:43Z

+    }
+}
+
+/// ISO-8601 -> Unix-ms. Mirror of the parsers elsewhere in the SDK; kept


The parse_iso_ms function is duplicated 4 times across different modules in this PR alone (e.g., in context_delta.rs, query_verbs.rs, claude/span_tree.rs, and codex/span_tree.rs). Since all these modules are within the same relayburn-sdk crate, they should share a single internal utility function (e.g., in a shared crate::utils or crate::reader::utils module) to improve maintainability and reduce code duplication.

Fixed in cf3c61e — moved the parser to a single pub(crate) helper at crates/relayburn-sdk/src/util/time.rs. The four ex-copies (analyze/context_delta.rs, query_verbs.rs, reader/claude/span_tree.rs, reader/codex/span_tree.rs) now all use crate::util::time::parse_iso_ms.

Generated by Claude Code

gemini-code-assist · 2026-05-26T11:43:43Z

+fn read_jsonl_values(path: &Path) -> Vec<serde_json::Value> {
+    let bytes = match std::fs::read(path) {
+        Ok(b) => b,
+        Err(_) => return Vec::new(),
+    };
+    let text = match std::str::from_utf8(&bytes) {
+        Ok(s) => s,
+        Err(_) => return Vec::new(),
+    };
+    text.lines()
+        .filter_map(|line| {
+            let t = line.trim();
+            if t.is_empty() {
+                None
+            } else {
+                serde_json::from_str::<serde_json::Value>(t).ok()
+            }
+        })
+        .collect()
+}


The read_jsonl_values function reads the entire file into memory at once using std::fs::read and then converts it to a string. For large JSONL files, this can lead to high memory usage or OOM. Reading the file line-by-line using BufReader is much more memory-efficient and robust.

fn read_jsonl_values(path: &Path) -> Vec<serde_json::Value> { let file = match std::fs::File::open(path) { Ok(f) => f, Err(_) => return Vec::new(), }; let reader = std::io::BufReader::new(file); use std::io::BufRead; reader .lines() .filter_map(|line| { let l = line.ok()?; let t = l.trim(); if t.is_empty() { None } else { serde_json::from_str::<serde_json::Value>(t).ok() } }) .collect() }

Fixed in cf3c61e — read_jsonl_values now opens the file and streams via BufReader::lines() instead of slurping the entire payload into memory.

Generated by Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05f9aa3e6c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T11:46:47Z

+                // of `since` — that flag caps output rows, not the
+                // input scan).
+                let mut ids: BTreeSet<String> = BTreeSet::new();
+                let all = self.inner.query_turns(&Query::default())?;


Apply since filtering when collecting context deltas

ContextDeltaOpts includes a time window, but this path always enumerates every session with query_turns(&Query::default()) and never consults opts.since/effective_since, so old sessions can still dominate overhead deltas results even when callers expect a recent window. This makes the top-N output incorrect for bounded-time analysis and breaks the documented since semantics.

Useful? React with 👍 / 👎.

Fixed in cf3c61e — --since now flows from the parent overhead args through run_deltas into ContextDeltaOpts.since, and the SDK seeds the session-enumeration query_turns with a matching since-scoped Query. Sessions whose latest activity falls outside the window are skipped before their span trees get loaded. New regression test: query_verbs::tests::context_delta_since_filter_excludes_old_sessions.

Generated by Claude Code

chatgpt-codex-connector · 2026-05-26T11:46:47Z

+    if let Some(bytes) = final_event.output_bytes {
+        node.set_attr("output_bytes", AttrValue::Int(bytes as i64));
+    }


Preserve tool-result truncation metadata on span nodes

This mapper copies output_bytes from ToolResultEventRecord but drops output_truncated, so downstream delta attribution never marks intervening tool results as truncated. In practice, overhead deltas --explain can present large tool outputs as fully representative even when ingest detected truncation, which degrades attribution accuracy and operator trust; propagate output_truncated onto the ToolResult span (and mirror the same fix in the Codex builder).

Useful? React with 👍 / 👎.

Fixed in cf3c61e — both the Claude and Codex build_tool_result_node now propagate output_truncated onto the ToolResult span as an AttrValue::Bool attribute. The context-delta InterveningStep::ToolResult.truncated and --explain render already consume it.

Generated by Claude Code

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (2)

CHANGELOG.md (1)
9-20: ⚡ Quick win

Shorten the new [Unreleased] bullets to impact-first release notes.

These entries are currently too implementation-heavy for the changelog style in this repo. Please condense to one short user-impact bullet per change and remove issue-link/backstory density.

As per coding guidelines: “Changelog entries should be concise and impact-first… Drop issue/PR links, internal review notes, implementation backstory, and ‘foundation for…’ phrasing…”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` around lines 9 - 20, The changelog bullets are too
implementation-heavy; rewrite each entry as a single concise, impact-first
sentence and remove PR/issue links and implementation/backstory. For the `burn
overhead deltas` item, replace the paragraph with a short user-facing line like:
"Add context-delta attribution via LedgerHandle::context_delta to show what
changed between consecutive inferences." For the `relayburn-sdk` item, replace
the paragraph with a short user-facing line like: "Add TurnSpanTree projection
APIs (LedgerHandle::turn_span_tree and LedgerHandle::session_span_trees) to
expose per-turn span trees for tooling and analysis." Ensure you drop internal
details (e.g., TurnSpanTree structure, locked attribute list, compaction
behavior, 'orphan' wording) and keep only the user impact and the API symbols.
crates/relayburn-sdk/src/reader/codex/span_tree.rs (1)
276-279: 💤 Low value

Consider adding a wire_str() method to ToolResultStatus for consistency.

Line 75 uses reason.wire_str() for StopReason, but here ToolResultStatus relies on {:?} Debug formatting + lowercase. If the Debug representation changes (e.g., enum variant rename), the wire string silently changes. A dedicated wire_str() method would make the contract explicit.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/relayburn-sdk/src/reader/codex/span_tree.rs` around lines 276 - 279,
final_event.status is being serialized via format!("{:?}").to_ascii_lowercase(),
which couples the wire string to the Debug representation; add a wire_str()
method to the ToolResultStatus enum (similar to StopReason::wire_str()) that
returns the canonical lowercase wire string for each variant, update the code
that sets the node attribute to call final_event.status.wire_str() instead of
using Debug-based formatting, and ensure all other places that serialize
ToolResultStatus use the new wire_str() to make the contract explicit and
stable.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/relayburn-cli/src/commands/overhead.rs`:
- Around line 32-33: The deltas branch currently drops the parent `since` and
constructs query options with `since: None`, so `burn overhead deltas --since
...` is ignored; update the deltas path to accept and propagate the parent
`since` into the deltas query options: modify the call/site around
OverheadAction::Deltas(deltas) / run_deltas(globals, deltas) so `run_deltas` (or
the code that builds the deltas query) receives the `args.since` value and uses
it when building the query options (replace `since: None` with the passed
`since`), and ensure any helper that constructs the query options (the code
around the current `since: None` usage) uses that propagated value.

In `@crates/relayburn-sdk/src/analyze/context_delta.rs`:
- Around line 379-386: The current sort closure used in out.sort_by compares
delta_tokens, turn_id, and inference_idx but still leaves ties dependent on
per_rail insertion order; update the comparator used where out.sort_by(...) is
defined to add additional deterministic tie-breakers by chaining .then_with(||
a.owner_rail.cmp(&b.owner_rail)) and finally .then_with(||
a.session_id.cmp(&b.session_id)) (or vice‑versa if session_id is preferred
first) so the ordering is fully deterministic across HashMap iteration; locate
the sort_by closure that references delta_tokens, turn_id, and inference_idx and
append these comparisons to it.

In `@crates/relayburn-sdk/src/query_verbs.rs`:
- Around line 4581-4593: When opts.session is None, the current session_ids
collection uses Query::default() which ignores opts.since; change it to
construct a Query that sets the since field from opts.since (or equivalent
time-scope) and pass that query to self.inner.query_turns to seed the BTreeSet
of session IDs, then continue to call deltas_for_session (or other full-tree
loaders) only for those session IDs; update the code paths that build the Query
and call query_turns (refer to opts.since, session_ids, self.inner.query_turns,
and deltas_for_session) so the initial session enumeration honors the --since
filter.
- Around line 4238-4240: Replace the unconditional unwrap_or_default() on the
query results so real ledger-read errors are not silently swallowed: for both
self.inner.query_inferences(&session_q) and
self.inner.query_tool_result_events(&session_q) match the Result and only fall
back to the pre-schema default when the error explicitly indicates a missing
table/column/schema; for any other Err return propagate the error (e.g., using
the ? operator or returning Err) so read failures surface. Reference the
query_inferences/query_tool_result_events calls and the
inferences/tool_result_events variables when making this change, or extract a
helper like is_schema_missing(err) to centralize the schema-missing check.

In `@crates/relayburn-sdk/src/reader/claude/span_tree.rs`:
- Around line 462-468: The subagent transcript (sa.records) is being ignored so
Subagent nodes are created as leaves; update the span-tree builder (the code
around the `node` return in build_claude_span_tree / the span-tree construction
for Subagent) to materialize child spans from `sa.records` rather than
discarding them: parse/convert each entry in `sa.records` into proper child span
nodes (including timestamps and status) and attach them to the parent Subagent
node so the ToolUse -> Subagent -> ... hierarchy is preserved; reuse the same
span-construction helpers used for top-level turns to create those child spans
so downstream consumers can still call `build_claude_span_tree` on materialized
child turns if needed.
- Around line 176-180: The loop currently only attaches nodes from
unpaired_subagents and drops any remaining entries in paired_subagents; instead,
after the existing for sa in unpaired_subagents loop, iterate over any leftover
items in paired_subagents and push build_subagent_node(sa, true) onto
root.children (e.g., via paired_subagents.drain(..) or into_iter() depending on
ownership) so unmatched subagents in paired_subagents are surfaced as unattached
root children; update the code that mutates root.children, paired_subagents, and
unpaired_subagents accordingly.

---

Nitpick comments:
In `@CHANGELOG.md`:
- Around line 9-20: The changelog bullets are too implementation-heavy; rewrite
each entry as a single concise, impact-first sentence and remove PR/issue links
and implementation/backstory. For the `burn overhead deltas` item, replace the
paragraph with a short user-facing line like: "Add context-delta attribution via
LedgerHandle::context_delta to show what changed between consecutive
inferences." For the `relayburn-sdk` item, replace the paragraph with a short
user-facing line like: "Add TurnSpanTree projection APIs
(LedgerHandle::turn_span_tree and LedgerHandle::session_span_trees) to expose
per-turn span trees for tooling and analysis." Ensure you drop internal details
(e.g., TurnSpanTree structure, locked attribute list, compaction behavior,
'orphan' wording) and keep only the user impact and the API symbols.

In `@crates/relayburn-sdk/src/reader/codex/span_tree.rs`:
- Around line 276-279: final_event.status is being serialized via
format!("{:?}").to_ascii_lowercase(), which couples the wire string to the Debug
representation; add a wire_str() method to the ToolResultStatus enum (similar to
StopReason::wire_str()) that returns the canonical lowercase wire string for
each variant, update the code that sets the node attribute to call
final_event.status.wire_str() instead of using Debug-based formatting, and
ensure all other places that serialize ToolResultStatus use the new wire_str()
to make the contract explicit and stable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d8fbbd76-7ba9-4d50-970e-78c69c141584

📥 Commits

Reviewing files that changed from the base of the PR and between 131e31e and 05f9aa3.

📒 Files selected for processing (16)

CHANGELOG.md
crates/relayburn-cli/src/cli.rs
crates/relayburn-cli/src/commands/overhead.rs
crates/relayburn-sdk/src/analyze.rs
crates/relayburn-sdk/src/analyze/context_delta.rs
crates/relayburn-sdk/src/analyze/span_tree.rs
crates/relayburn-sdk/src/lib.rs
crates/relayburn-sdk/src/query_verbs.rs
crates/relayburn-sdk/src/reader.rs
crates/relayburn-sdk/src/reader/claude.rs
crates/relayburn-sdk/src/reader/claude/span_tree.rs
crates/relayburn-sdk/src/reader/codex.rs
crates/relayburn-sdk/src/reader/codex/span_tree.rs
tests/fixtures/cli-golden/invocations.json
tests/fixtures/cli-golden/snapshots/overhead-deltas-json.stdout.txt
tests/fixtures/cli-golden/snapshots/overhead-deltas.stdout.txt

coderabbitai · 2026-05-26T11:51:09Z

+    // We do NOT recursively build a span tree from `sa.records` here:
+    // the parser hands us raw `Value` rows, and re-running the parse
+    // pipeline against an in-memory sidecar is the ingest path's job,
+    // not the span-tree builder's. Downstream consumers can call
+    // `build_claude_span_tree` against the materialized child turn(s)
+    // once they're in the ledger and stitch the subtree client-side.
+    node


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The subagent transcript contents are discarded here.

sa.records never become child spans, timestamps, or status on this node, so every Subagent is a leaf. That means the resulting tree cannot actually represent the advertised ToolUse -> Subagent -> ... hierarchy, and any delegated inference/tool activity disappears from the parent turn’s span tree.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-sdk/src/reader/claude/span_tree.rs` around lines 462 - 468, The subagent transcript (sa.records) is being ignored so Subagent nodes are created as leaves; update the span-tree builder (the code around the `node` return in build_claude_span_tree / the span-tree construction for Subagent) to materialize child spans from `sa.records` rather than discarding them: parse/convert each entry in `sa.records` into proper child span nodes (including timestamps and status) and attach them to the parent Subagent node so the ToolUse -> Subagent -> ... hierarchy is preserved; reuse the same span-construction helpers used for top-level turns to create those child spans so downstream consumers can still call `build_claude_span_tree` on materialized child turns if needed.

Acknowledged — this is the documented limitation from PR #451's "Deferred to follow-ups" list ("Content sidecar reads for UserPrompt text body and ToolResult attached file content. The spans are emitted as structural placeholders; future PRs can populate the content without changing tree shape."). The current build_subagent_node deliberately treats Subagent as a leaf with the agent metadata attributes but no child spans derived from sa.records.

Materializing a full child span tree from sa.records is a non-trivial enhancement (requires re-running the parse pipeline against the in-memory sidecar transcript), so it belongs in its own follow-up issue rather than this fix-pass. I'll open a separate issue to track it.

Generated by Claude Code

@willwashburn Thanks for the context — the inline comment at that exact location makes the intent clear, and I should have caught it before raising this. The deferred-to-follow-up semantics are well-documented both in code and in the PR description.

Would you like me to open a follow-up GitHub issue to track "Materialize child span trees from sa.records in build_subagent_node"? I can include backlinks to this PR and the relevant comment so it's easy to pick up later.

✏️ Learnings added

Learnt from: willwashburn Repo: AgentWorkforce/burn PR: 452 File: crates/relayburn-sdk/src/reader/claude/span_tree.rs:494-500 Timestamp: 2026-05-26T12:54:28.449Z Learning: In `crates/relayburn-sdk/src/reader/claude/span_tree.rs`, `build_subagent_node` intentionally treats `Subagent` spans as leaves — `sa.records` (the raw subagent transcript entries) are **not** materialized into child spans. This is a documented design decision deferred from PR `#451` ("Content sidecar reads" follow-up): re-running the parse pipeline against an in-memory sidecar is considered the ingest path's job, not the span-tree builder's. Downstream consumers are expected to call `build_claude_span_tree` on materialized child turns in the ledger and stitch the subtree client-side. Do not flag the absence of recursive child-span construction here as a bug.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

cubic-dev-ai

4 issues found

_{Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic}

New `burn overhead deltas` verb answers "what blew up my context between inference N and inference N+1?" by walking each session's TurnSpanTree timeline, pairing same-rail Inference spans, and attributing the delta in `tokens.input + cache_read + cache_write` to the intervening ToolResult / UserPrompt / SystemReminder leaves. SDK surface: `LedgerHandle::context_delta(ContextDeltaOpts)` returns `Vec<ContextDelta>` with per-step intervening breakdown, attributed cost (charged at cache_read rate — what the *future* will pay for the persisted prefix), and compaction events surfaced as their own row rather than negative deltas. Main-rail deltas never see subagent tool_results and vice versa. Tool-result token estimates use `output_bytes / 4` as a first-cut fallback; documented as approximate in the output. CLI: `burn overhead deltas [--session ID] [--top N] [--min-delta TOK] [--owner main|subagent|all] [--explain] [--json]`. Default top is 20, default min_delta is 1000 tokens. Compaction rows ignore min_delta so they always surface. Tests: 9 unit tests covering the Bash blow-up driver path, compaction- replaces-negative-delta, subagent isolation, owner filter, top cap, min_delta filter, single-inference no-op, and JSON wire format. Two golden snapshots (`overhead-deltas`, `overhead-deltas-json`) anchor the CLI output against the fixture ledger.

In-scope #452 fixes (A-G): - `burn overhead deltas` now honors `--since` (A): thread the parent overhead args' `since` into `ContextDeltaOpts`, parse relative ranges (`24h`/`7d`/`4w`/`2m`) into `Duration`, and use that to scope the session-enumeration `query_turns` seed inside the SDK so the `None`-session path no longer walks every historical session. - Make the per-rail and cross-session `ContextDelta` sort fully deterministic across HashMap iteration order (B): chain `owner_rail` then `session_id` as final tie-breakers. - UTF-8-safe `short_turn_label` / `short_agent_label` (C): switch from byte slicing to `chars().take(8)` so multi-byte ids never panic. - `format_signed_tokens` preserves the negative sign (D): emit `-` for negative deltas instead of dropping it. - Sort compactions with `sort_by_cached_key` (E) so `parse_iso_ms` runs once per element rather than once per comparison. - Dedup four copies of `parse_iso_ms` (F) into `crates/relayburn-sdk/src/util/time.rs`; the analyze/context_delta, query_verbs, reader/claude, and reader/codex copies now share one implementation. - `read_jsonl_values` streams via `BufReader::lines()` (G) rather than reading the entire file into memory. Foundation fixes (carried in #452's diff; will also land on #451): - Propagate `output_truncated` on `ToolResult` span nodes (H) in both the Claude and Codex builders so downstream consumers can flag truncated tool outputs. - Propagate `ToolResult` error status to the parent `ToolUse` (I, J) in both builders — the runtime tool_result is the ground truth, not the assistant row's `is_error` hint. - Don't drop subagents whose `paired_tool_use_id` doesn't match any ToolUse in the turn (K): drain leftover `paired_subagents` after the inference walk and surface them as `unattached` siblings under the turn root. - Stop swallowing real ledger-read failures (M): replace the blanket `unwrap_or_default()` on `query_inferences` / `query_tool_result_events` with a `match` that tolerates only the pre-schema "no such table/column" class of error and propagates every other failure. Tests: `cargo test --workspace` (zero failures, +2 new tests in the `query_verbs` mod for the since filter and helper). Zero warnings on `cargo build --workspace`.

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

cubic-dev-ai Bot reviewed May 26, 2026

View reviewed changes

Comment thread crates/relayburn-sdk/src/reader/claude/span_tree.rs

Comment thread crates/relayburn-cli/src/commands/overhead.rs Outdated

Comment thread crates/relayburn-sdk/src/reader/codex/span_tree.rs

Comment thread crates/relayburn-sdk/src/reader/claude/span_tree.rs Outdated

claude added 2 commits May 26, 2026 12:51

willwashburn force-pushed the claude/burn-432-context-delta branch from 05f9aa3 to cf3c61e Compare May 26, 2026 12:52

willwashburn merged commit e7f1395 into main May 26, 2026
11 checks passed

willwashburn deleted the claude/burn-432-context-delta branch May 26, 2026 13:04

		let mut compactions_sorted: Vec<&CompactionEvent> = compactions.iter().collect();
		compactions_sorted.sort_by_key(\|c\| parse_iso_ms(&c.ts).unwrap_or(0));

Conversation

willwashburn commented May 26, 2026

Summary

SDK

CLI

Key decisions

Known approximations (documented in source + output)

Test plan

Out of scope

Files

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

coderabbitai Bot commented May 26, 2026 •

edited

Loading

coderabbitai Bot May 26, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading