fix(tracing): reliability follow-ups from v0.8.4 review (#901, #902, #903)#905
Conversation
…, #903) Three follow-ups filed during the v0.8.4 release review of the trace-data-loss fix (#895): #901 — reconstructed spans no longer masquerade as failures. When `rehydrateFromFile` force-closes an in-flight generation span (worker restart / cache eviction), the span kept `status: "error"`, so the viewer painted it red and counted it in the session error total — a false alarm on a postmortem surface. Spans now also carry `interrupted: true`; the viewer renders them with an amber "⚠" affordance and an "Interrupted" detail label instead of red, and the session error count (`errSpans`) excludes them. `status: "error"` is kept so the boundary stays visible, so existing rehydrate tests are unaffected. #902 — `getOrCreateTrace` can no longer resurrect a Trace into a cleared cache. The async rehydrate added an await point where a concurrent `startEventStream` (e.g. `setWorkspace`) can abort the stream and `sessionTraces.clear()` while we're suspended. On resume the old code inserted the freshly built Trace into the just-cleared map — an orphan writer under a dead stream. We now capture the owning stream before the await and, if it was replaced, discard the Trace (`endTrace`) and defer to the live stream instead of inserting. #903 — long-lived traces no longer grow `ses_<id>.json` without bound. `snapshot()` rewrites the entire spans array on every event, so an unbounded session paid O(n) per write (O(n^2) overall) and grew the file forever. `capSpansForSerialization` bounds the on-disk projection to `MAX_SERIALIZED_SPANS` (default 5000, override via `ALTIMATE_TRACE_MAX_SPANS`) by keeping the head (prompt + first tools) and tail (recent activity) and eliding the middle with a single marker span. In-memory spans are untouched; summary counters are separate counters, so totals are unaffected. Tests: `test/altimate/tracing-followups.test.ts` — interrupted flag is set on rehydrate; viewer carries the amber/`!sp.interrupted` contract and the flag in embedded data; `capSpansForSerialization` head/tail/marker behavior, no-op under cap, and no-benefit guard; plus a scope-bounded source contract for the #902 guard ordering (await → guard → insert). `worker.ts` edits are wrapped in `altimate_change` markers (upstream-shared). tracing.ts / viewer.ts are altimate-owned. Closes #901 Closes #902 Closes #903
) Addresses findings from the GPT-5.4 + Kimi-K2.5 review panel: #902 (MAJOR) — key the orphan-writer guard on a monotonic `streamGeneration` counter bumped in startEventStream, not on AbortController object identity. The identity check was correct only while startEventStream always allocates a fresh controller; the counter makes ownership explicit and removes that hidden dependency. The check and the cache insert run in one synchronous turn (no await between), so the insert can't race a later startEventStream. #903 (MAJOR) — capSpansForSerialization now explicitly guarantees the structural root (session) span survives the cut even if it isn't in the head slice, instead of relying on "root happens to be index 0". Rehydrate and the viewer tree both require the root, and the elision marker is parented to it. Added table-driven tests: root-not-at-index-0, tiny caps (1–5) with marker-parent validity, the exact head+tail+1===length boundary, and elided-count accuracy. #901 (Kimi blockers) — interrupted spans now render amber (not red) on EVERY viewer surface, not just the preview/detail: waterfall row + icon class, tree-view meta ("interrupted" in orange), and log-view row (⚠ amber). errSpans already excluded them; this makes the visual treatment consistent so a reconstructed trace never looks like a failed one anywhere. Known minor (deferred, noted on PR): a trace containing ONLY interrupted spans no longer contributes to the summary error count (correct) but also shows no "incomplete" hint in the summary; a dedicated interrupted/incomplete banner is a small follow-up.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Looking for one thing? Review this PR in Change Stack to search files, summaries, diffs, and code without losing your place. 📝 WalkthroughWalkthroughThis PR implements three tracing-reliability follow-ups: introducing an ChangesTracing Reliability Multi-Concern
Sequence DiagramsequenceDiagram
participant Session as Trace Session
participant RehydrateAPI as rehydrateFromFile()
participant CappingAPI as capSpansForSerialization()
participant Viewer as Trace Viewer
participant Cache as sessionTraces Map
Session->>RehydrateAPI: await rehydrateFromFile(sessionID)
RehydrateAPI->>RehydrateAPI: close open generation spans
RehydrateAPI->>RehydrateAPI: set interrupted = true
RehydrateAPI-->>Session: trace reconstructed
Session->>CappingAPI: capSpansForSerialization(spans, 5000)
CappingAPI->>CappingAPI: keep head + tail
CappingAPI->>CappingAPI: insert elision marker
CappingAPI-->>Session: capped spans array
Session->>Viewer: renderTraceViewer(cappedTrace)
Viewer->>Viewer: find interrupted spans
Viewer->>Viewer: render as amber/warn (not red)
Viewer->>Viewer: exclude from errSpans count
Viewer-->>Session: viewer HTML (interrupted ≠ error)
Note over Cache: On setWorkspace / stream start
Cache->>Cache: clear() & streamGeneration++
Note over Session: In-flight getOrCreateTrace() checks generation
Session->>Session: if generationAtEntry !== streamGeneration?
Session->>Session: discard trace (don't resurrect)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/opencode/src/altimate/observability/tracing.ts`:
- Around line 382-389: The capSpansForSerialization function can bypass the
configured cap for tiny values; modify capSpansForSerialization so that when cap
< 3 (or any small cap that cannot reserve a marker slot) you enforce the cap by
returning spans.slice(0, cap) instead of returning the full spans array.
Concretely, inside capSpansForSerialization (before computing
headCount/tailCount or the marker logic) add a guard: if (spans.length > cap &&
cap < 3) return spans.slice(0, cap); otherwise keep the existing
headCount/tailCount/marker elision logic so the cap is honored in all cases.
In `@packages/opencode/src/altimate/observability/viewer.ts`:
- Around line 573-577: The markdown summary still counts spans with status ===
'error' even if interrupted; update buildMarkdownSummary() so it uses the same
filtered set as errSpans (or applies the same predicate sp.status === 'error' &&
!sp.interrupted) when computing mdErrCount and any error lists, e.g., derive
mdErrCount from errSpans (or by filtering the input spans with !sp.interrupted)
instead of counting all status === 'error' spans to ensure
reconstructed/interrupted spans are not treated as real errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f2b50973-c956-42fe-9a42-0a5b9ede3a87
📒 Files selected for processing (4)
packages/opencode/src/altimate/observability/tracing.tspackages/opencode/src/altimate/observability/viewer.tspackages/opencode/src/cli/cmd/tui/worker.tspackages/opencode/test/altimate/tracing-followups.test.ts
| export function capSpansForSerialization(spans: TraceSpan[], cap: number = MAX_SERIALIZED_SPANS): TraceSpan[] { | ||
| if (cap <= 0 || spans.length <= cap) return spans | ||
| const headCount = Math.max(1, Math.floor(cap * 0.3)) | ||
| const tailCount = Math.max(1, cap - headCount - 1) // reserve one slot for the marker | ||
| // Only elide if the result is actually smaller than the input (+1 for the | ||
| // marker we'd add) — otherwise there's nothing to gain. | ||
| if (headCount + tailCount + 1 >= spans.length) return spans | ||
| let head = spans.slice(0, headCount) |
There was a problem hiding this comment.
capSpansForSerialization can silently bypass the configured max-cap for tiny values
When cap is very small (e.g. 1 or 2), this branch returns the original array, so serialization is no longer bounded by the configured max. That breaks the stated cap contract and can reintroduce unbounded snapshot growth for misconfigured environments.
💡 Proposed fix
export function capSpansForSerialization(spans: TraceSpan[], cap: number = MAX_SERIALIZED_SPANS): TraceSpan[] {
- if (cap <= 0 || spans.length <= cap) return spans
+ if (cap <= 0 || spans.length <= cap) return spans
+ const rootSpan = spans.find((s) => s.parentSpanId === null) ?? null
+ if (cap === 1) {
+ return rootSpan ? [rootSpan] : [spans[0]]
+ }
+ if (cap === 2) {
+ if (!rootSpan) return spans.slice(-2)
+ const lastNonRoot = [...spans].reverse().find((s) => s.spanId !== rootSpan.spanId)
+ return lastNonRoot ? [rootSpan, lastNonRoot] : [rootSpan]
+ }
const headCount = Math.max(1, Math.floor(cap * 0.3))
const tailCount = Math.max(1, cap - headCount - 1) // reserve one slot for the marker
// Only elide if the result is actually smaller than the input (+1 for the
// marker we'd add) — otherwise there's nothing to gain.
if (headCount + tailCount + 1 >= spans.length) return spans
let head = spans.slice(0, headCount)
const tail = spans.slice(spans.length - tailCount)
- const rootSpan = spans.find((s) => s.parentSpanId === null) ?? null
if (rootSpan && !head.some((s) => s.spanId === rootSpan.spanId) && !tail.some((s) => s.spanId === rootSpan.spanId)) {
head = [rootSpan, ...head.slice(0, headCount - 1)]
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function capSpansForSerialization(spans: TraceSpan[], cap: number = MAX_SERIALIZED_SPANS): TraceSpan[] { | |
| if (cap <= 0 || spans.length <= cap) return spans | |
| const headCount = Math.max(1, Math.floor(cap * 0.3)) | |
| const tailCount = Math.max(1, cap - headCount - 1) // reserve one slot for the marker | |
| // Only elide if the result is actually smaller than the input (+1 for the | |
| // marker we'd add) — otherwise there's nothing to gain. | |
| if (headCount + tailCount + 1 >= spans.length) return spans | |
| let head = spans.slice(0, headCount) | |
| export function capSpansForSerialization(spans: TraceSpan[], cap: number = MAX_SERIALIZED_SPANS): TraceSpan[] { | |
| if (cap <= 0 || spans.length <= cap) return spans | |
| const rootSpan = spans.find((s) => s.parentSpanId === null) ?? null | |
| if (cap === 1) { | |
| return rootSpan ? [rootSpan] : [spans[0]] | |
| } | |
| if (cap === 2) { | |
| if (!rootSpan) return spans.slice(-2) | |
| const lastNonRoot = [...spans].reverse().find((s) => s.spanId !== rootSpan.spanId) | |
| return lastNonRoot ? [rootSpan, lastNonRoot] : [rootSpan] | |
| } | |
| const headCount = Math.max(1, Math.floor(cap * 0.3)) | |
| const tailCount = Math.max(1, cap - headCount - 1) // reserve one slot for the marker | |
| // Only elide if the result is actually smaller than the input (+1 for the | |
| // marker we'd add) — otherwise there's nothing to gain. | |
| if (headCount + tailCount + 1 >= spans.length) return spans | |
| let head = spans.slice(0, headCount) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/src/altimate/observability/tracing.ts` around lines 382 -
389, The capSpansForSerialization function can bypass the configured cap for
tiny values; modify capSpansForSerialization so that when cap < 3 (or any small
cap that cannot reserve a marker slot) you enforce the cap by returning
spans.slice(0, cap) instead of returning the full spans array. Concretely,
inside capSpansForSerialization (before computing headCount/tailCount or the
marker logic) add a guard: if (spans.length > cap && cap < 3) return
spans.slice(0, cap); otherwise keep the existing headCount/tailCount/marker
elision logic so the cap is honored in all cases.
| // Reconstructed (interrupted) spans keep status:'error' for boundary | ||
| // visibility, but they reflect a recorder restart — exclude them from the | ||
| // session error count so a clean session isn't reported as failed. | ||
| var errSpans = nonSession.filter(function(sp) { return sp.status === 'error' && !sp.interrupted; }); | ||
|
|
There was a problem hiding this comment.
Keep interrupted-span exclusion consistent in copied markdown summary
errSpans correctly excludes interrupted, but buildMarkdownSummary() still increments mdErrCount for all status === 'error'. This makes exported summaries report reconstructed spans as real errors.
💡 Proposed fix
- nonSession.forEach(function(sp) {
- if (sp.kind !== 'tool') { if (sp.status === 'error') mdErrCount++; return; }
+ nonSession.forEach(function(sp) {
+ if (sp.kind !== 'tool') { if (sp.status === 'error' && !sp.interrupted) mdErrCount++; return; }
var nm = (sp.name || '').toLowerCase();
var inp = (sp.input && typeof sp.input === 'object') ? sp.input : {};
var fp = inp.file_path || inp.filePath || inp.path || null;
if (nm.indexOf('write') >= 0 || nm.indexOf('edit') >= 0) { if (fp) mdChanged[fp] = nm.indexOf('write') >= 0 ? 'new' : 'edited'; }
else if (nm.indexOf('read') >= 0) { mdReadCount++; }
@@
- if (sp.status === 'error') mdErrCount++;
+ if (sp.status === 'error' && !sp.interrupted) mdErrCount++;
});🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/src/altimate/observability/viewer.ts` around lines 573 -
577, The markdown summary still counts spans with status === 'error' even if
interrupted; update buildMarkdownSummary() so it uses the same filtered set as
errSpans (or applies the same predicate sp.status === 'error' &&
!sp.interrupted) when computing mdErrCount and any error lists, e.g., derive
mdErrCount from errSpans (or by filtering the input spans with !sp.interrupted)
instead of counting all status === 'error' spans to ensure
reconstructed/interrupted spans are not treated as real errors.
There was a problem hiding this comment.
3 issues found across 4 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/observability/tracing.ts">
<violation number="1" location="packages/opencode/src/altimate/observability/tracing.ts:385">
P2: The capping math can return more spans than `cap` (e.g. cap=1/2), so the configured serialization limit is not actually enforced.</violation>
<violation number="2" location="packages/opencode/src/altimate/observability/tracing.ts:390">
P2: Capped serialization can emit orphan tail spans whose missing parents make them disappear from the viewer tree traversal.</violation>
</file>
<file name="packages/opencode/src/altimate/observability/viewer.ts">
<violation number="1" location="packages/opencode/src/altimate/observability/viewer.ts:576">
P2: The `errSpans` filter correctly excludes interrupted spans from the session error count, but `buildMarkdownSummary()` (used for the exported/copied markdown summary) reportedly still counts all `status === 'error'` spans without the `!sp.interrupted` guard. This means the exported markdown will report reconstructed spans as real errors, inconsistent with the in-viewer display.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| export function capSpansForSerialization(spans: TraceSpan[], cap: number = MAX_SERIALIZED_SPANS): TraceSpan[] { | ||
| if (cap <= 0 || spans.length <= cap) return spans | ||
| const headCount = Math.max(1, Math.floor(cap * 0.3)) | ||
| const tailCount = Math.max(1, cap - headCount - 1) // reserve one slot for the marker |
There was a problem hiding this comment.
P2: The capping math can return more spans than cap (e.g. cap=1/2), so the configured serialization limit is not actually enforced.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/observability/tracing.ts, line 385:
<comment>The capping math can return more spans than `cap` (e.g. cap=1/2), so the configured serialization limit is not actually enforced.</comment>
<file context>
@@ -349,6 +357,62 @@ function formatDurationShort(ms: number): string {
+export function capSpansForSerialization(spans: TraceSpan[], cap: number = MAX_SERIALIZED_SPANS): TraceSpan[] {
+ if (cap <= 0 || spans.length <= cap) return spans
+ const headCount = Math.max(1, Math.floor(cap * 0.3))
+ const tailCount = Math.max(1, cap - headCount - 1) // reserve one slot for the marker
+ // Only elide if the result is actually smaller than the input (+1 for the
+ // marker we'd add) — otherwise there's nothing to gain.
</file context>
| // marker we'd add) — otherwise there's nothing to gain. | ||
| if (headCount + tailCount + 1 >= spans.length) return spans | ||
| let head = spans.slice(0, headCount) | ||
| const tail = spans.slice(spans.length - tailCount) |
There was a problem hiding this comment.
P2: Capped serialization can emit orphan tail spans whose missing parents make them disappear from the viewer tree traversal.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/observability/tracing.ts, line 390:
<comment>Capped serialization can emit orphan tail spans whose missing parents make them disappear from the viewer tree traversal.</comment>
<file context>
@@ -349,6 +357,62 @@ function formatDurationShort(ms: number): string {
+ // marker we'd add) — otherwise there's nothing to gain.
+ if (headCount + tailCount + 1 >= spans.length) return spans
+ let head = spans.slice(0, headCount)
+ const tail = spans.slice(spans.length - tailCount)
+ // Guarantee the structural root (session) span survives the cut even if it
+ // isn't in the head slice — rehydrate and the viewer's tree both require it,
</file context>
| // Reconstructed (interrupted) spans keep status:'error' for boundary | ||
| // visibility, but they reflect a recorder restart — exclude them from the | ||
| // session error count so a clean session isn't reported as failed. | ||
| var errSpans = nonSession.filter(function(sp) { return sp.status === 'error' && !sp.interrupted; }); |
There was a problem hiding this comment.
P2: The errSpans filter correctly excludes interrupted spans from the session error count, but buildMarkdownSummary() (used for the exported/copied markdown summary) reportedly still counts all status === 'error' spans without the !sp.interrupted guard. This means the exported markdown will report reconstructed spans as real errors, inconsistent with the in-viewer display.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/observability/viewer.ts, line 576:
<comment>The `errSpans` filter correctly excludes interrupted spans from the session error count, but `buildMarkdownSummary()` (used for the exported/copied markdown summary) reportedly still counts all `status === 'error'` spans without the `!sp.interrupted` guard. This means the exported markdown will report reconstructed spans as real errors, inconsistent with the in-viewer display.</comment>
<file context>
@@ -565,7 +570,10 @@ function showDetail(span) {
+ // Reconstructed (interrupted) spans keep status:'error' for boundary
+ // visibility, but they reflect a recorder restart — exclude them from the
+ // session error count so a clean session isn't reported as failed.
+ var errSpans = nonSession.filter(function(sp) { return sp.status === 'error' && !sp.interrupted; });
// Categorize files: changed (edit/write) vs read
</file context>
What does this PR do?
Three reliability follow-ups filed during the v0.8.4 release review of the trace-data-loss fix (#895), all in the observability subsystem.
#901 — reconstructed spans no longer masquerade as failures. When
rehydrateFromFileforce-closes an in-flight generation span (worker restart / cache eviction), the span keptstatus: "error", so the viewer painted it red and counted it in the session error total — a false alarm on a postmortem surface. Spans now also carryinterrupted: true. The viewer renders them amber ("⚠ / Interrupted") instead of red on every surface (preview, detail, waterfall row, tree view, log view) and the session error count (errSpans) excludes them.status: "error"is kept so the boundary stays visible.#902 —
getOrCreateTracecan no longer resurrect a Trace into a cleared cache. The async rehydrate added an await where a concurrentstartEventStream(e.g.setWorkspace) can abort the stream andsessionTraces.clear()while we're suspended. On resume the old code inserted the freshly built Trace into the just-cleared map — an orphan writer under a dead stream. We now capture a monotonicstreamGenerationbefore the await and, if it changed, discard the Trace (endTrace) and defer to the live stream. Keyed on a counter (notAbortControlleridentity) per review feedback.#903 — long-lived traces no longer grow
ses_<id>.jsonwithout bound.snapshot()rewrites the entire spans array on every event (O(n²) over a session).capSpansForSerializationbounds the on-disk projection toMAX_SERIALIZED_SPANS(default 5000, overrideALTIMATE_TRACE_MAX_SPANS) via head + tail retention + one elision marker. In-memory spans are untouched; summary totals come from separate counters, so they're unaffected. The cut explicitly preserves the structural root span.A two-model review panel (GPT-5.4 + Kimi-K2.5) reviewed the first commit; the second commit applies their MAJOR findings (stream-generation counter, explicit root preservation, amber treatment on all viewer surfaces) plus table-driven edge-case tests.
Known minor (deferred): a trace with only interrupted spans correctly contributes nothing to the error count but shows no "incomplete" hint in the summary; a dedicated incomplete banner is a small follow-up.
Type of change
Issue for this PR
Closes #901
Closes #902
Closes #903
How did you verify your code works?
test/altimate/tracing-followups.test.ts(15 tests): interrupted flag set on rehydrate; viewer carries amber/!sp.interruptedcontract + flag in embedded data;capSpansForSerializationhead/tail/marker, no-op under cap, no-benefit guard, root-not-at-index-0, tiny caps 1–5 with marker-parent validity, exact boundary, elided-count accuracy; scope-bounded source contract for the fix(tracing/worker): getOrCreateTrace can resurrect a Trace into a cleared sessionTraces map on workspace switch #902 guard ordering (await → guard → insert) and thestreamGenerationbump.tsgotypecheck clean. Marker guard (analyze.ts --markers --base main --strict) clean —worker.tsedits wrapped inaltimate_changemarkers.Checklist
ALTIMATE_TRACE_MAX_SPANSdocumented inline)Summary by cubic
Marks reconstructed spans as interrupted (warning, not failure), guards against orphan trace writers across stream restarts, and caps on-disk spans to keep session files bounded. This removes false error counts and improves stability and performance for long sessions.
rehydrateFromFilesetsinterrupted: trueon force-closed generation spans; viewer shows amber "⚠ Interrupted" on all surfaces and excludes them from the session error count (keepsstatus: "error"for boundary visibility).getOrCreateTracekeys ownership to a monotonicstreamGeneration; if the stream changed during rehydrate, it discards the trace instead of inserting into a clearedsessionTracesmap (guard placed after await, before insert).capSpansForSerializationto bound serialized spans toMAX_SERIALIZED_SPANS(default 5000; overrideALTIMATE_TRACE_MAX_SPANS) using head+tail retention with an elision marker; always preserves the root span. In-memory spans and summary totals are unaffected.Written for commit 3dc245b. Summary will update on new commits.
Summary by CodeRabbit
New Features
Bug Fixes