feat: session Approach / Map / Timeline (5/5 from #306, stacked on #367) by anshss · Pull Request #371 · Metabuilder-Labs/tokenjam

anshss · 2026-07-02T14:36:51Z

Carved from #306 as PR 5 of 5 — the marquee feature: drill into how an agent worked.

Stacked on #367 (session lifecycle). This branch is built on PR#1, so its diff-vs-main includes #367's commits. Review the own delta (50 files, +13.3k, from pr1/session-lifecycle..HEAD) and merge after #367.

What's here

Session detail view — drill into one session from the Status page: model mix, context-growth, per-subagent cost breakdown.
Timeline — deterministic transcript play-by-play (recursive subagent logs, no LLM), from core/transcript.py.
Map — graphical board of what the agent did per ask: phase / tools / sub-agents / context / cost swimlanes + a codebase-territory treemap (core/workmap.py, core/sessionmap.py, core/phases.py).
Approach — deterministic method spine (act/delegate/verify/dead_end), recursive, with cross-terminal child splice (core/method_spine.py).
Method persistence — snapshots the reconstructed Story to session_story at close + backfill so a killed agent's method survives transcript pruning (core/method_capture.py); re-couples the capture_session_method call into fix: Claude Code session status + lifecycle (1/5 from #306) #367's close handler.
Cross-session run grouping — tokenjam.run_id/parent_session_id, /api/v1/runs, setup_harness MCP tool (core/runlink.py, api/routes/runs.py).
Per-subagent cost — sub_agent_id on spans + subagent right-sizing analyzer.
On-demand LLM-distilled Map titles via local claude CLI (core/distill.py, no API key).

Migrations

13 run_id + parent_session_id · 14 sub_agent_id on spans · 15 session_story table. (12 intentionally skipped — reserved by #370.)

Cross-PR seams for merge (resolvable)

Shares SessionRecord.cache_write_tokens (model field, default 0 here) and core/backfill.py with fix: Claude Code cost & alert correctness (4/5 from #306) #370 — trivial conflicts when both land. Per-session cache-write totals read 0 until fix: Claude Code cost & alert correctness (4/5 from #306) #370's migration 12 provides the column.
The #18 session_token_cost_rollup primitive lands in fix: Claude Code cost & alert correctness (4/5 from #306) #370; this PR's detail cost uses stored values until then.

Tests

1658 passed, 0 failed, 1 skipped; ruff check clean; index.html module node --check OK; lens string-grep regression suite green.

Not browser-dogfooded

The new tab CSS/markup (Approach / Map / Board / Timeline) was inserted via conflict resolution and validated by node --check + string-grep regression tests, not a running browser. Worth a visual pass before merge.

🤖 Generated with Claude Code

Clicking a Status tile navigated to the global `#/traces` firehose, dropping the clicked session's identity and landing the user on an unfiltered, all-agents trace list. The header promised "details" but zoomed out. Add a real session-scoped destination instead. - GET /api/v1/sessions/{session_id}: per-session rollup (cost/tokens/tools/ alerts/drift) plus the session's own traces, so the UI can drill into the existing waterfall. Read-only; require_api_key; 404 as JSONResponse with response_model=None. Parameterised db.conn SQL only. - UI SessionDetailView + router case + click fix (index.html:680 now routes to `#/sessions/<id>`, archived rows clickable). Plan-tier-honest cost framing: "Implied API value" for subscription, "Local model — no API cost" for local, real cost for API; no invented spend, no fabricated agent tree (the live Claude Code telemetry is flat — documented in the route). - Stop tracking .tj/config.toml: every `tj` run from the repo cwd rewrites it and rotates the committed ingest_secret. The `.tj/` ignore rule already exists; this just drops the stale tracked copy. Tests: +5 integration tests (rollup/tools/traces, unknown→404, subscription plan_tier→pricing_mode, drift baseline, api-key auth). ruff + mypy clean, 644 pass. Layer 1 of the run-autopsy arc; subagent-tree capture at ingest (Layer 2) is the next step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…view Layer 2 of the run-autopsy arc. The session detail view now shows how a session split across models and how its context grew over the run — both derived from existing gen_ai.llm.call spans, so it works on every session (live + backfilled) with no schema change. - GET /api/v1/sessions/{session_id} gains turn_count, model_mix (per-model calls/tokens/cost rollup) and context_series (time-ordered input-token series, downsampled to <=120 points, first+last preserved). Parameterised db.conn SQL; span name from the GenAIAttributes semconv constant. - UI "Models & context" section: model-mix table + a dependency-free CSS bar chart of input (context) tokens per LLM call. Descriptive only — no model- routing-quality claims; subscription caveat reused on cost. Tests: +3 integration tests (model_mix aggregation/order, turn_count, context_series ordering + downsample cap). ruff + mypy clean, 647 pass. Deferred (recorded in .plans): parentUuid within-session branch tree; the cross-session spawn graph (needs a harness-emitted spawn marker — no clean parent->child link exists in on-disk Claude Code data). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…unit Layer 3 of the run-autopsy arc. A fan-out harness (e.g. the meta-repo governor) stamps tokenjam.run_id (and optional tokenjam.parent_session_id) as OTel resource attributes on each worker session it spawns; tj groups those sessions into one Run. Linkage is DECLARED by the spawner — Claude Code OTLP carries no native parent<->child edge, so it is never reverse-engineered. - semconv: TjAttributes.RUN_ID / PARENT_SESSION_ID. - SessionRecord gains run_id + parent_session_id; migration 7 adds the columns (ADD COLUMN IF NOT EXISTS — fresh-DB and upgrade safe). - Ingest captures the markers from resource attributes on both paths (otel/otlp_parsing.py for the spans/OTLP path, api/routes/logs.py for the Claude Code logs path) and self-heals null-on-update (never overwrites). - API: run_id/parent_session_id on GET /api/v1/sessions/{id}; new GET /api/v1/runs/{run_id} (totals + member sessions + parent-edge tree) and a GET /api/v1/runs index. 404 as JSONResponse + response_model=None. - UI: RunDetailView (#/runs/<id>) — run rollup + sessions indented by spawn parent; "Run" link on the session detail. Plan-tier-honest cost framing (mixed -> implied API value, never a hard spend claim). Tests: +7 integration tests (logs+spans ingest capture, session-detail exposure, run grouping/aggregation/tree, unknown->404). ruff + mypy clean, 654 pass. Harness side is a documented contract (set the resource attrs at spawn), applied separately in the user's governor. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

L3 added tokenjam.run_id / tokenjam.parent_session_id resource-attribute capture to the HTTP/OTLP (otlp_parsing.py) and Claude Code logs (logs.py) paths but missed convert_otel_span — so a Python SDK app using the in-process exporter would never join a Run. Extract the markers there too, mirroring the existing service.namespace / service.instance.id extraction, so run grouping works across all three ingest paths (CC logs, HTTP/OTLP incl. the TS SDK, and the in-process Python SDK). +2 unit tests for convert_otel_span resource extraction (also the first direct coverage of that block). ruff + mypy clean, 656 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A "Story" section in the session view that explains, step by step, what a Claude Code session was trying to do and how it went — surfacing the agent's own narration threaded with its literal tool calls and ok/error outcomes. No LLM, no generation: read live from the on-disk CC JSONL transcript, nothing stored in the DB (capture posture unchanged). - core/transcript.py: build_session_story() locates ~/.claude/projects/*/<session_id>.jsonl (session_id == transcript filename, verified 100% across cli + sdk-cli), parses task / steps (narration + per-tool {name, label, status} + is_error/is_retry flags) / outcome. Caps + truncation; NEVER returns full tool inputs/outputs — only a short arg label + ok/error (privacy + bounded payload). - GET /api/v1/sessions/{id}/story (require_api_key, response_model=None); {available: false} at HTTP 200 for SDK/no-transcript sessions. Projects root overridable via app.state / TJ_CLAUDE_PROJECTS_ROOT for tests. - UI Story section: Task callout, step list (expandable narration, tool chips, error tint, ↻ retry marker, omitted markers), Outcome callout. Dependency-free. CC-only by design (SDK sessions have no CC JSONL → graceful unavailable state). +15 tests (unit parser + API available/unavailable). ruff + mypy clean, 671 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…Alerts / Traces) The session card stacked six sections vertically — too much scroll. Split the lower content into tabs while pinning the header + Overview/Cost summary cards at top. Story is the default tab; Tools folds under "Models & context", Behavioral drift under "Alerts". Pure layout change (htm/Preact + CSS), no backend or data-flow change; data still loads as before, only display is gated by the active tab. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Each Agent/Task step in a session's Story now expands into that subagent's own story (task -> steps -> outcome), recursively, so a session's full log includes the work of everything it spawned. Subagent transcripts are read from ~/.claude/projects/<proj>/<session_id>/subagents/agent-<agentId>.jsonl, linked by the agentId carried in the parent step's tool_result (exact match, no heuristic). - core/transcript.py: include_subagents (default true) resolves each Agent/Task step's child agentId from its tool_result, loads agent-<id>.jsonl from the root session's subagents/ dir, and nests its story under the step (step.subagent), recursively. Guards: MAX_SUBAGENT_DEPTH, a shared step-budget across the tree, and an agentId cycle-set; depth/budget caps are surfaced, never silent-dropped. Privacy unchanged at every depth (narration + short tool label + ok/error only). - GET /sessions/{id}/story: Agent steps carry a recursive `subagent`; ?subagents=false returns the flat single-session story. - UI: collapsed "> subagent: <name> - N steps" disclosure under each Agent step; expands to the subagent's task/steps/outcome indented, recursively (its own Agent steps expand too). +9 tests (parent->child->grandchild nesting, agentId resolution, depth/budget caps, cycle guard, ?subagents=false). ruff + mypy clean, 680 pass. Real-data: this session nests 12/12 spawned subagents. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The session activity section was a raw wall of full-card steps (a 356-step session rendered ~32k px tall). Make it scannable and address the naming/order feedback. - Rename the user-facing tab + section "Story" -> "Timeline" (the /story endpoint path is unchanged/internal). - Newest-first display: renderStepsNewestFirst() reverses the step list (top level AND nested subagents) while keeping the #n labels (so Metabuilder-Labs#1 is still the first action). The Task callout stays pinned at top and Outcome at bottom; only the steps reverse. - Compact rows: each step is now a one-line row (#n - time - tool chips ok/error - clamped first line of narration); click to expand the full narration + detail. The repetitive per-row model label is dropped and shown only when the model changes (prevModel). Error tint, retry markers, and the recursive subagent disclosure are preserved. UI-only (tokenjam/ui/index.html); ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The session-detail Timeline (StorySection) fetched /sessions/:id/story once on mount with no polling, unlike every other view. A live session's Timeline froze at page-load and only refreshed on remount (tab switch / re-navigate). Add setInterval(load, 10000) matching the house idiom, preserving the last-good Timeline on transient poll failures. Apply the same polling fix to TraceDetailView, which had the identical once-only fetch (selection is by span_id, so re-fetching spans preserves it). Also remove the pinned top-level Task and Outcome callouts: in a long-running session the first prompt goes stale as new prompts are sent, and Outcome was just the last assistant message (already the top step) mislabeled as a final outcome on a live session. The steps list already shows everything in descending time order. Per-subagent Task/Outcome callouts stay (scoped, not stale). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Each Timeline step's text was hard-trimmed to 400 chars server-side, with the full narration discarded. The UI's expand-a-step feature could therefore only ever reveal 400 chars ending in "…" — "show more" was lying, since the rest was never sent. Raise MAX_STEP_TEXT_CHARS to 100K so it acts as a safety guard against a pathological single blob rather than a preview trim; the UI already shows only the first line collapsed and the full text when expanded. Real assistant responses now render complete. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Two status-page UX fixes: - Scroll memory: returning from a session/run detail page (e.g. clicking an archived session, then Back) landed at the top of the Status page instead of where you were. The window scrolls (.main#app has no own overflow) and the async list load defeats native scroll restoration. Add a useScrollMemory hook that saves window scroll per view and restores it once the list has rendered. - "restore session" button on every archived (closed/stale) and idle session. Clicking copies `claude --resume <session-id>` to the clipboard so the session can be picked back up in the terminal. Copy uses execCommand inside the click gesture (reliable even when the async Clipboard API hangs on a permission prompt) with a best-effort navigator.clipboard upgrade, and stopPropagation so it doesn't trigger the row/tile navigation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The "restore session" button belongs on the session detail page (where you're looking at one archived/idle session), not on the Status list. Move it: render it under the session title in SessionDetailView when the session is not active (closed / stale / idle), and remove it from the Status page archived table and idle tiles. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…g row Move the restore button onto the title row, right-aligned next to the session heading, instead of stacked under the Session id line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add a copy glyph to the restore-session button and a styled on-hover tooltip that tells you to paste & run the command in your terminal, showing the exact `claude --resume <id>` line. Replaces the plain native title tooltip. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Claude Code records each Task-tool subagent's turns in <session>/subagents/agent-<id>.jsonl, tagged with the parent's sessionId. Backfill folded those spans under the parent session but discarded the subagent identity, so a session's cost could not be broken down per subagent -- yet a single research run can spawn 100+ subagents that drive most of the spend (verified on a real session: 66% of $642 across ~147 subagents, previously invisible inside one parent total). - NormalizedSpan.sub_agent_id + spans.sub_agent_id column (migration 11) - backfill sets it from a record's top-level agentId when isSidechain is true; None on the main thread - both span write paths (db.insert_span, backfill) + _row_to_span carry it - make_llm_span factory gains a sub_agent_id arg Enables GROUP BY sub_agent_id for per-subagent breakdown / right-sizing. Known limitation (separate latent bug, not addressed here): backfill upserts the session row once per file with replace semantics, so sessions.total_cost_usd reflects only the last-processed subagent file. Span-derived cost (get_session_cost / get_cost_summary) stays correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

New optimize analyzer that breaks a window's cost down per subagent (sub_agent_id) and flags structural right-sizing candidates: - over_powered: premium (Opus-tier) model, little output, few tool calls - over_provisioned: large context (input + cache) but little output Honesty discipline (CLAUDE.md Rule 14): candidate flags only, never a quality claim; the caveat is surfaced verbatim and the recoverable estimate is left None (we report the spend concentrated in flagged subagents, not a guaranteed saving). Registered as "subagent" (auto-discovered; appended to ANALYZER_ORDER), so it flows through get_optimize_report (MCP) and /api/v1/optimize via a dict round-trip constructor, with a pricing-mode-aware CLI renderer. Verified on a real session: 147 subagents = 66% of a $642 window, of which 25 are flagged ($109) -- Opus subagents fed 600K-1M cache tokens that produced <1K output (over_provisioned). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Backfill is idempotent by span_id, so a plain re-run skips spans already in the DB and never populates sub_agent_id on history ingested before that column existed. --reingest UPDATEs the existing spans in place (sub_agent_id refreshed) instead of skipping them -- no new rows, no duplicates -- so accumulated history becomes attributable per subagent. Surfaced via the new BackfillResult.spans_retagged counter and the CLI summary. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Surfaces the sub_agent_id data in the UI. The session-detail API (GET /api/v1/sessions/{id}) now returns a `subagents` block: per-subagent cost/token rollup carrying the same over_powered / over_provisioned right-sizing flags the optimize analyzer uses (heuristic imported from there so there's a single source of truth). The dashboard SPA gains a "Subagents (N)" tab that renders the breakdown table with flagged rows highlighted and the candidate-only caveat. Verified in a headless browser against a real backfilled session: 147 subagents, 56 flagged, the tab renders with no console errors. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Adds a work-map view to the session detail: a nested tree of the main thread + its subagents, each node annotated with a deterministic activity rollup (files touched, web sources, searches, shell commands, subagents spawned, errors/retries) joined to its cost / tokens / right-sizing flags. It composes two things tj already computes — the pure transcript Story (structure + the agent's own labels, core/transcript) and the span-derived per-subagent breakdown (cost) — into one render-ready tree via a new pure core transform (core/workmap.build_work_map). New route GET /api/v1/sessions/{id}/workmap. The dashboard gains a "Map" tab, placed before Timeline and made the default, so opening a session lands on the bird's-eye graph; Timeline is the drill-down. No LLM, no interpretation: every field is a count, a label the agent produced, or a cost from real spans. Caps the Story applied (depth/budget/ cycle) surface as node markers, and subagents with recorded cost that never made it into the bounded tree are reported as an `unmapped` tail — never silently dropped. Descriptive only: tj reports what happened, the human judges the approach. Also fixes a stray NUL byte in ui/index.html (line 1318) that broke `node --check` and made `file` mis-detect the SPA as binary. Tests: pure-transform unit tests (rollup/dedup/join/cap/unmapped), /workmap route integration tests, and UI static-grep guards (tab present + default + descriptive caveat + no-NUL). ruff + mypy clean; 847 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

The per-node figure on the right of each Map node is now tokens spent — plan-agnostic, so it reads honestly on subscription as well as API. The estimated dollar cost moves to a hover title. The "unmapped subagents" footer switches to tokens too (build_work_map now returns unmapped_tokens). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

The session "task" (shown on the work-map root and in the /story payload) came from the first user message verbatim — which in Claude Code is the human's words buried under injected <system-reminder> blocks (CLAUDE.md, environment, date) and, for slash-command starts, <command-*> / <local-command-*> tag wrappers. _first_user_prompt now strips those: it returns the actual ask, surfaces a "/cmd args" label when only a slash command remains, and falls through to the next user message when one is pure wrapper. Clean prompts pass through unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

Map node detail listed full absolute paths (e.g. /Users/.../aquanode/.claude/context/product.md), which are hard to read in the dim mono list. A shortPath() UI helper now renders each as "…/dir/file.ext" with the full path on hover; URLs and non-path strings are left untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

A Claude Code session isn't one task — it's a sequence of human asks fired into the same terminal until the context window fills. Modeling a session as a single "main task" (with the first prompt as its label) was wrong: the first prompt is just the first of many, often the least representative. New model — the session is a list of asks, newest first: - core/transcript.build_session_asks: segments the transcript at each genuine user message (_is_user_ask, reusing the harness-wrapper stripper), and for each segment builds the steps + nested subagents that ask triggered — reusing the same machinery as build_session_story and sharing one step budget + cycle guard across the session. - core/workmap.build_work_map: now folds the ask-segmented story + per-subagent cost breakdown into a list of ask nodes (newest first), each with its activity rollup, bucketed token total, and the subagent subtree it spawned. - GET /sessions/{id}/workmap returns {asks: [...]}; per-ask tokens/cost are bucketed from LLM-call spans by start_time window. - UI: the Map tab renders asks (WorkMapAsk), newest first, each expandable to its outcome, files, and subagent tree. Subagents and cost are attributed to the ask that spawned them. The single-task session is just the N=1 case. Descriptive only — tj reports the asks; the human judges them. 859 tests pass; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

Claude Code injects background-task completion notices as user-role messages (<task-notification>…</task-notification>) when an async Task/Agent finishes. The ask segmentation was reading each one as a separate human ask, inflating the count (a real session showed 98 "asks", most of them notifications). _strip_harness_wrapper now strips <task-notification> blocks alongside <system-reminder> and the command wrappers, so a notification-only turn isn't a prompt and doesn't start a new ask — the work that follows folds into the preceding ask's segment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

User asks now stand out visually on both views (brand-blue accent), so a session reads as a conversation rather than an undifferentiated stream. - Timeline: the story marks the first step after each genuine human ask with its prompt (_build_steps include_asks, on for the main thread only). The renderer groups steps by ask and shows each prompt as a distinct "You" block (brand accent), newest ask first, its work beneath. - Map: each ask row gets a brand left-border to match. No new data — reuses the harness-wrapper stripper to find genuine asks, so system-reminder / task-notification / command turns aren't marked as prompts. 864 tests pass; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

… label The boxed prompt block with a "You" badge was too heavy. Drop the box, background, and label — the prompt now renders as a distinct brand-colored line, the slight differentiation that was asked for. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

…terminal Within each ask the steps were ordered newest-first, so a prompt was followed by the LAST step of the exchange instead of its first response — the reverse of what you see in the terminal. The Timeline now renders fully chronological: each user prompt, then its responses in order, top to bottom. The Map stays newest-first as the at-a-glance summary. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

The chronological flip moved the latest activity to the bottom. Restore newest-ask-first (the familiar overview), but keep each ask reading top-down in order — prompt, then its responses — so input/output still pair the way the terminal shows them. Like an inbox: newest thread on top, each thread read top-to-bottom. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

Turns the per-ask Map into a readable account of the session. Each ask is now headlined by WHAT THE AGENT DID (its own outcome narration), with your prompt demoted to a dim context line; a deterministic status icon (● did work / ⚠ flagged / ✗ error / · chat) colors the row and its left border; no-work conversational asks recede to a compact dim line; and the Map reads chronologically (oldest first) so it tells the session's story, distinct from the Timeline's newest-first live tail. Token spend stays the visible metric (cost moved to a hover tooltip). Pure UI change over the existing ask-segmented /workmap data — no backend, no LLM. Crisper LLM-distilled headlines are a planned opt-in follow-up; today's headline is the agent's own last narration, occasionally raw. Tests: updated the two ask-Map regression pins the storyline evolves (subtitle wording + status-driven left border) and added a storyline guard (outcome headline, askStatus, chronological order). node --check clean; 865 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

…e headlines, collapsed chat Turns a long session's Map into a 5-second read instead of a wall: - Each ask is headlined by the FIRST clean sentence of its outcome (markdown stripped), hard one line ≤90 chars — no more raw agent narration cut mid-thought. - A summary band tops the Map (total tokens · asks · subagents · flagged, plus the biggest fan-outs) for the at-a-glance read. - Runs of chat-only asks collapse into a "⋯ N quick exchanges" divider (click to expand), so real work stands out from conversation. - Per-ask sub-counts are clamped to the session total on both the row and the summary, so an impossible >total figure can't render (defensive against upstream non-determinism in ask.subagent_count). Built by a delegated worker agent and gated with a Playwright DOM check (not eyeballing): zero console errors; summary band + 3 chat dividers that expand; zero wrapping headlines (white-space:nowrap, ≤90 chars); row and summary sub-counts agree (106 == 106 ≤ 113 total). node --check clean; 870 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw

Revamps the session-detail "Map" tab from the nested text tree into the synchronized-swimlane board, driven by GET /sessions/{id}/sessionmap. MapBoardSection renders, over one shared time axis: - phase bands, a per-event tool strip (colored by category; errors in --error, retries marked), a context-growth area chart and a cost-burn bar chart (inline SVG, viewBox 0 0 100 30 so all lanes stay pixel-aligned), a shared crosshair + exemplar tooltip, an x-axis, and a category legend. - a time ⇄ step toggle that re-spaces every lane from one pair of x accessors (wall-clock offset vs even ordinal). - a codebase-territory treemap (③) aggregated from the read/edit events: files grouped by directory, shaded by touch intensity (edited→--success, read→--brand), with a first-touch order badge and ✎ edited marker. All colors are theme vars (offline-safe; test_ui_offline still green). Falls back to the existing WorkMapSection when /sessionmap has no data, so nothing is lost. The subagent lane / board recursion is intentionally deferred (the /sessionmap events don't carry sub_agent_id yet) — noted in a code comment. Guarded by static-grep regression tests in test_lens_ui_regression.py (MapBoardSection + Map-tab wiring + the four lane labels + territory + time/step toggle); module passes node --check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The Approach tab shipped only the method spine; the approved mock also has a delegation-tree rail, a header stats card, rich delegation cards, and a source legend. This adds them — a small /approach enrichment so the UI reads (not computes) the data, then the UI rework. Backend: - build_method_spine: a delegate move now carries `delegations` (one entry per subagent child: name, agent_id, depth, task, capped, nested spine) instead of a flat `children` list. - /approach joins each delegation to per-subagent cost/tokens/status/flags via _session_subagents (the same breakdown /workmap uses), and returns `agents` (preorder rail summary: main_session + in_session_subagents with status + capture_completeness), `counts` (moves/delegations/dead_ends/ verifies), and `meta` (session cost/tokens). Snapshot read-through kept. UI: - two-column ap-grid: a left ApproachRail (every agent with a status dot, "ended · method kept" badge, provenance line, depth indent + the ephemeral-capture caption) + the method panel. - header card with mandate + outcome + a right-side stats column from counts/meta. - ApproachDelegation: pink cards (↳ name · depth · tokens · $cost · status) expanding into a "how the subagent solved its piece" sub-header + child mandate + the child spine (recursive) on a depth rail; capped → not expanded. - a bottom source-of-each-line legend. Offline-safe (theme vars + emoji only). Tests updated: method-spine delegations shape, /approach agents+counts, and UI static-grep asserts for the rail/header/cards/legend. node --check clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…lass The delegation-tree rail's main-agent status dot used a bare `main` modifier class (`<div class="ap-dot main">`), which also matched the app's global `.main` layout rule (flex:1; margin-left:200px; padding:24px 32px). That inflated the main dot to ~64x48 and floated it out of the rail — the big blue blob. Subagent dots used `sub`/`term` (no global collision), so only the main node broke. Namespaced the dot modifiers to `is-main`/`is-sub`/`is-term` so they can't collide with global classes. Main dot now renders 9x9 in place. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Fixes four visual bugs in the Map board (MapBoardSection) confirmed against the approved mock via screenshot + geometry audit: 1. PHASE band labels were full prompt sentences overlapping into mush. Measure plot width, ellipsize each band label (mb-seg-lab), and only print a label when its band is >= ~60px wide; add title= for hover. Dense zero-width phase slivers now render unlabeled. 2. TOOLS event labels (mb-evlab) collided and overflowed the lane. Walk events left->right and keep a label only when >= ~70px past the last kept one (sparse, non-overlapping subset); clamp each label's x so its max-width box stays inside the lane; add title= for hover. 3. x-axis last tick (x=100%) was clipped past the right edge by its centering transform. Right-align the final tick (and left-align x=0) so ticks stay inside the lane. 4. Territory treemap showed giant empty boxes with absolute-path headers. Drop align-items:stretch and the per-file/per-files vertical flex-grow so dir boxes size to their content and every dir's file rows render. Headers now show only the last 1-2 path segments (ellipsized, full path on hover) instead of long absolute paths. Geometry audit is clean (no mb-* overflow); all 152 UI regression + offline tests pass; extracted inline module passes node --check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Enrich GET /sessions/{id}/sessionmap with a top-level `subagents` array — one entry per in-session subagent with a usable span window: name (from the story, else agent-<id[:8]>), absolute ts range, the window mapped onto main-thread event ordinals, and summed tokens/cost. The ordinal mapping brackets the gap (and clamps to the live edge) when a subagent ran while the main thread emitted no events, so step mode still positions it; returns null only when there are no time-anchored events. `subagents: []` when none. Render the lane in MapBoardSection between the tools and context lanes: each subagent is a positioned bar on the shared axis (time → ts offset; step → ordinal), packed onto rows (stacking overlaps, capped), clamped inside the plot, labelled name + tokens·cost (ellipsized, themed --chart-5). The lane re-spaces with the time⇄step toggle and is omitted when empty. Tested visually against the approved mock in both modes (geometry audit clean); updated the /sessionmap integration test (seeds a subagent span) and added a UI regression static-grep for the lane. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The Map board placed event/phase lanes by transcript step timestamps but the context/cost series + meta.started_at by span timestamps — two clocks on one axis. On a backfilled/resumed session whose span started_at postdates the transcript, every event's wall-clock offset went negative, the UI clamped it to 0, and the whole tool/phase lane collapsed to x=0 while the series spread out. Compute one basis in get_session_map: t0 = min(earliest event ts, earliest span start_time), tEnd = max of the same, duration = tEnd - t0 (floored to 1). Recompute the series t_s and meta.started_at/duration_s against this unified t0 so the event lanes, series, and subagent bars all share one origin and span. _coerce_utc reconciles tz-aware transcript ts with naive DuckDB span datetimes. Normal sessions (transcript ts ≈ span ts) are unchanged; the UI already keyed every lane off meta, so no UI change was needed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The Approach rail showed only in-session subagents (Task sidechains). M2b surfaces a session's cross-terminal CHILDREN too — separate SessionRecords a harness spawned in another terminal, linked only by a declared parent_session_id — honestly marked by what we can recover. /approach now finds child sessions (new runs._child_sessions) and, for each: - adds a rail agent {provenance: cross_terminal_child, depth 1, status, cost_usd, tokens, capture_completeness}. capture_completeness is "full" when the child's own method is recoverable (its live transcript or M1 snapshot via build_session_story/load_session_method → build_method_spine) else "session_level" — we have the cost/identity but not the how. - when the method IS available, splices the child's own spine into a new cross_terminal list so it nests like an in-session delegation; header counts roll the spliced moves/dead-ends/verifies in (+1 delegation per child). cross_terminal is [] and no extra rail agents when a session launched nothing (the common case), so existing payloads are unchanged. UI: ApproachRail renders cross_terminal_child amber (is-term) with a "cross-terminal child · run-linked" sub-line and a "session-level" / "ended · method kept" badge keyed off completeness — visually distinct from the pink in-session subagents. A new ApproachCrossTerminal component renders each spliced child under a "cross-terminal children" divider with the same recursive ApproachMove styling. Tests: integration coverage for the full/session-level split + the spliced spine + the empty common case; static-grep guards for the UI handling. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Verified end-to-end that the Timeline (StorySection) already renders recursive subagent logs: a StoryStep renders a SubagentBlock per spawned subagent, SubagentBlock re-renders the child's steps via renderTimelineSteps, and renderTimelineSteps renders a StoryStep each — a closed cycle, so nested delegations nest arbitrarily deep, each level independently expandable (caret ▾/▸), matching Approach/Map. Caps are honest (depth/size/cycle each map to an explicit "… omitted …"/"… already shown …" note, never a silent drop). Confirmed visually against seeded session 3024587f… (13 subagents): expanding step Metabuilder-Labs#2 surfaces its nested "subagent: Investigate session map + PR 306" block; expanding that renders its Task + 38 own steps. No overflow. No UI change needed (affordance already consistent). Adds two static-grep regression tests asserting the recursion cycle + honest cap markers stay wired, so a future edit can't silently flatten the Timeline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

capture_session_method was wired only into session-close, so backfilled historical sessions never got a session_story snapshot — the very sessions most likely to have their Claude Code transcript pruned later lost their 'how'. Wire capture into ingest_claude_code, snapshotting each newly-ingested session (source="backfill"). Best-effort: capture swallows its own errors and never raises, so it cannot change backfill's result or break ingest. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The Map board's sub-agents lane built its `subagents` array purely from spans grouped by `sub_agent_id`, which is set only by backfill from `isSidechain`+`agentId`. Subagents whose spans aren't tagged were dropped, so the lane silently under-counted delegation (~2 bars) versus the transcript-derived Approach rail (~15). Source the lane's subagent SET from the same transcript subagent subtree the rail uses (`_transcript_subagent_index` over the asks payload), unioned with any span-only `sub_agent_id` so recorded usage is never dropped. Span timing still refines each bar's window when available; otherwise it falls back to the subagent's transcript timing (its own first/last step ts, then its spawn ts). Tokens/cost come from `_session_subagents` and are null when the subagent has no `sub_agent_id` spans. A subagent with neither spans nor resolvable ts is still counted with a null window (UI hides window-less bars in step/time but the lane still renders). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

PR Metabuilder-Labs#306's deferred "active-time (idle-segmented) durations." On a long or resumed session whose wall-clock is mostly idle between turns (e.g. ~32h spanning ~2.8h of real work), raw wall-clock crammed all the work into thin clusters at the right edge while huge idle gaps ate the axis, making time mode near-useless. Backend: _ActiveAxis builds a piecewise-linear real-time -> active-time map over the UNION of event + span timestamps, passing active periods through 1:1 but collapsing every gap > IDLE_GAP_THRESHOLD_S (300s) down to a fixed COLLAPSED_GAP_S (30s). Every event, context/cost series point, and subagent window gets its active_s position; meta exposes active_duration_s and a gaps array ({start_ts, end_ts, duration_s, at_active_frac}) for the break markers. With no idle gaps the active axis equals the wall-clock axis (no behavior change); t_s/duration_s are kept untouched alongside the new fields. UI: time mode positions every lane (events, series, subagent bars) by active_s / active_duration_s, so the work spreads out and idle no longer dominates; x-axis ticks reflect active time. Each collapsed gap draws a faint dashed break marker with a thinned, clamped "⋯ idle Nh/Nm" label (theme vars, offline-safe). Step mode is unchanged. Tests: integration asserts a synthetic session's idle gap is detected + collapsed (active axis skips the idle minutes) and a gapless session is unchanged; a UI regression statically asserts the active-axis wiring + break markers render. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The TOOLS lane printed the tool's full arg label (a long path-ish string) under each sampled tick, hard-clipped to a 90px box. On idle-gap-collapsed (time-mode) and dense (step-mode) sessions the kept labels rendered into unreadable mush — adjacent boxes overlapped and each showed a mid-segment fragment. - New evLabelShort(): a glanceable tail of the arg — the first token (verb) for a command/prompt, or the basename (last /-segment) for a bare path — hard-capped at 12 chars. Full value stays in title= for hover. (shortPath is kept for the file lists + tooltip.) - Collision-proof spacing: cap the label box (MB_EVLAB_MAX 90→72px, CSS max-width too) and bump MB_EVLAB_GAP 70→80 so the min center-to-center spacing exceeds the box width — two kept, center-anchored labels can never touch. - Enforce the gap on each label's *clamped* (rendered) center, not the raw x: the edge-clamp shifts a label inward, so comparing raw x let an edge-clamped label collide with its neighbor (seen in step mode). Validated visually on session 3024587f in both time and step mode: labels short + spaced, DOM audit shows zero .mb-evlab overlap/overflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Make the Map board legible as data, not just shapes: - CONTEXT/COST lanes gain a y-axis gutter (max value at top, 0/$0 at the baseline) plus a mid gridline and a "peak <value>" annotation, read off the already-returned context_series/cost_series via fmtTokens/fmtCost. - Sub-agent bars render "name · tokens · $cost" as one ellipsized run instead of a separate cost span that truncated to a cryptic "1…" stub; null metrics are omitted (transcript-only subagents show just the name). - x-axis row stretches to fill its height (was a zero-height centered ticks box) so the tick labels no longer spill past the board's overflow:hidden bottom; the last tick + legend are fully visible. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…yle interval selector The TOOLS lane rendered each of a session's hundreds of tool calls as a 3px floating tick, so dense bursts smeared into an unreadable smudge and sparse stretches looked empty — the lane conveyed nothing. Time mode now bins events into active-time buckets and renders a stacked-by-category density histogram (Read/Search/Edit/Bash/Task/Web, error stacked on top), so busy bursts are tall bars and idle stretches are honest gaps. A trading-chart-style interval selector (Auto · 1m · 5m · 15m · 1h; Auto targets ~60-100 bars) sets the bin width, and the cost lane re-bins to the SAME bucket edges so a cost spike sits directly under the tool burst that caused it. Step mode is unchanged (per-event ticks remain the event-level view). Frontend-only, no backend. Co-Authored-By: Claude <noreply@anthropic.com>

…-call context, collision-proof labels, de-noised treemap (Metabuilder-Labs#56) The Map board dumped five raw telemetry lanes and deferred all synthesis to hover; on a real long session (74h, 395 tools, ~40 subagents) it was undecodable. Per meta-repo ticket Metabuilder-Labs#56: - Insights strip: deterministic callouts (costliest active stretch, friction errors/retries, top sub-agent by cost, idle share, edit footprint) surface the board's answers by DEFAULT instead of hover-gating them. - CONTEXT lane now plots each call's OWN context occupancy (input+cache per span) instead of a cumulative sum — growth, compaction and resets are visible; the old monotone climb duplicated the total-tokens chip. - Sub-agents lane: labels are px-gated (MB_SUBLAB_MIN_PX) and suppressed on overlapped past-cap bars — bars may overlap under extreme density, text never does; lane grows to 8 packed rows instead of smashing into 3. - Legend now covers every encoding: Other, retry (dashed red = retried step, distinct from solid-red error), and the phase-band tinting. - Territory treemap: temp/scratch reads (TemporaryItems, /tmp, /var/folders) collapse into one muted card and no longer destroy the common-prefix root, so dir labels are workspace-relative; cards/files weight edits over reads; file names keep a readable floor instead of truncating to one letter. Validated: full suite 1718 passed; seeded long-session fixture screenshot- checked in both time and step modes (no label collisions, no console errors). Co-Authored-By: Claude <noreply@anthropic.com>

…-attempt-after-failure (Metabuilder-Labs#58) Founder dogfooding on a real session showed the manual bin ladder produces no relatable change (and 1h bins on a ~40m session collapse the board into one full-width slab), and the dashed retry outline had saturated into noise. - Interval ladder purged: bin width is always auto-resolved for the span and self-described in the cost peak label ('peak $X per 30s' / '/call' in step). - is_retry now requires the previous same-signature step to have FAILED — consecutive successful edits of the same file are normal work, not retries (a real session showed 27 'retries' with ~1 genuine one). Makes every consumer (Map marks, friction chip, method_spine dead-ends, workmap counts) more accurate. New test pins repeat-after-success != retry. - Step mode is the default read (sequence without burst/idle distortion); time stays one click away for cost localization. - Direction note: docs/internal/specs/map-board-direction.md pins the board's job, the purge principles, and the question-driven-zoom north star. Full suite: 1720 passed. Co-Authored-By: Claude <noreply@anthropic.com>

Carve-fallout fixes discovered by the test suite after cherry-picking PR#5: - transcript.py: port session_transcript_path/session_transcript_mtime helpers (runlink.py + sessions.py + status rescue depend on them; their originating commit is outside PR#5's range). - models.py/db.py: add SessionRecord.cache_write_tokens as a model-level default (0) so the session-detail rollup totals work without PR#4's migration 12. - status.py: wire the transcript-mtime live-status rescue (_live_status) so a live CC session with fresh transcript shows active, matching the ebef638 test. - tests: drop out-of-scope Metabuilder-Labs#18 trace-keyed-cost rollup tests and the PR#4 subagent-totals reconcile test (both depend on non-PR5 infrastructure); add missing CaptureConfig/GenAIAttributes imports + reingest mock kwarg; de-dup work-map/user-prompt regression tests.

anilmurty · 2026-07-02T15:06:43Z

Big PR, but the highest-risk surfaces are both clean, which is what matters. SQL injection across the 1,714-line sessions.py + runs.py: PASS — every execute() is $N-bound; the only two {where} interpolations are static literal clause fragments. Offline UI across the +2,472 index.html delta: PASS — zero external URLs. The new subagent analyzer self-registers, and it's honest by construction (estimated_recoverable_usd=None, heuristic confidence, caveat printed verbatim). Pure query-free core modules (transcript/workmap/sessionmap/method_spine) make the bulk very reviewable, and their tests are fixture/factory-based. Two follow-ups, neither blocking: list_runs has an N+1 over runs (bounded by LIMIT, fine for now), and you flagged the new tabs were only node --check'd — let's do a quick visual pass before this merges. Also confirm the session_token_cost_rollup trace-disjunction from #370 behaves once wired here. This is the tip of the stack — merges last, after #367.

…dogfooding round Approach tab: - scrub UTF-8-as-Latin-1 mojibake ("Â\xa0") from /approach and /story payloads once at data-load (mandates rendered "Â how is..." on real transcripts) - render **bold**/*italic*/`code` in labels, quotes and the outcome block as vnodes instead of literal asterisks (escape-first, no links/headings) - when a move's quote merely re-states its 80-char-truncated label (48/78 moves on a real session), un-truncate the headline from the quote's lead sentence instead of printing the same sentence twice - clamp the ✓ outcome block to 3 lines with a click-to-toggle "show all" (outcomes arrive truncated mid-word server-side) - fold runs of >4 consecutive chat-only moves into first + "· N conversational steps" + last (click to expand) so the method doesn't drown in Q&A narration - tag review/verify/audit-mandated delegations with a ✅ verify chip + green accent and recompute the header's verifies stat client-side (the structural backend classifies them as plain delegates -> verifies:0) - a delegate move with cards no longer prints the subagent name three times (label + "Agent <name>" evidence + card) — just the ⑂ marker and the card - rail nodes badge only the exceptions (live/cross-terminal/capped); the default "ended · method kept / in-session subagent" pair moves to title= Map tab: - close the context lane's area fill at the last sample's x — it faded to the right edge as a decaying wedge that read as data - tool-tick labels: keep the distinctive TAIL of over-long filenames (date-prefixed specs printed the same "2026-07-02-…" fragment for every tick) and suppress a label that repeats the last shown one - phase titles: strip leading conversational pleasantries ("Got it — my mistake." -> "My mistake.") and merge adjacent same-normalized-title phases (the Metabuilder-Labs#57 confetti pattern) - in time mode, split any phase band spanning an idle break into segments with a visible gap (one band bridged an 18h idle gulf as continuous work); label only the widest segment - middle-truncate subagent bar labels ("first8…last6") and lower the label min-width gate to 40px — tail-ellipsis made parallel bars undecodable Timeline tab: - prefix each ask with a grey mono "user: " marker so asks carry a speaker label the way steps carry #n + time All observations from a real 22.7M-token / $24 session; regression tests updated + 13 new pattern tests in test_lens_ui_regression.py. Co-Authored-By: Claude <noreply@anthropic.com>

…uilder-Labs#368/Metabuilder-Labs#369/Metabuilder-Labs#370; append migrations 13-15 after 12) # Conflicts: # tests/integration/test_db.py # tests/unit/test_backfill.py # tokenjam/core/backfill.py # tokenjam/core/db.py # tokenjam/core/models.py # tokenjam/otel/semconv.py

anilmurty

Approving per the earlier review. Resolved 6 conflict files against #367/#368/#369/#370: migrations reordered 12→13→14→15 (append-only, no collision); upsert_session INSERT combined to 18 columns (service_namespace/service_instance_id + cache_write_tokens + run_id/parent_session_id, all aligned with ON CONFLICT + params); backfill.py reconciled the two rewrites — kept the #15 bulk executemany for new spans (with sub_agent_id added to _SPAN_INSERT_SQL) AND #371's reingest re-tag loop for existing spans; semconv/models unions; test_backfill.py reconstructed from #371's tests + #370's subagent-totals test. Full suite green (1728 passed), ruff clean.

anshss and others added 30 commits July 2, 2026 18:59

fix(ui): place restore-session button top-right on the session headin…

ba665b2

…g row Move the restore button onto the title row, right-aligned next to the session heading, instead of stacked under the Session id line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

anshss and others added 18 commits July 2, 2026 19:39

chore(pr5-carve): drop uv.lock (not tracked on main)

1d145fc

anshss requested a review from anilmurty as a code owner July 2, 2026 14:36

anshss mentioned this pull request Jul 2, 2026

feat: Claude Code session-status dashboard — lifecycle fix, session detail/timeline/map, run grouping, quickstart #306

Closed

anilmurty mentioned this pull request Jul 2, 2026

fix: Claude Code cost & alert correctness (4/5 from #306) #370

Merged

anshss and others added 2 commits July 2, 2026 20:40

Merge branch 'main' into pr5/approach-map-timeline

9cf39ed

anilmurty mentioned this pull request Jul 2, 2026

fix: Claude Code session status + lifecycle (1/5 from #306) #367

Merged

anilmurty approved these changes Jul 2, 2026

View reviewed changes

anilmurty merged commit 5cffc74 into Metabuilder-Labs:main Jul 2, 2026
4 checks passed

This was referenced Jul 2, 2026

feat(onboard): zero-token statusline for Claude Code/Codex; reserve MCP for SDK (#59) #374

Merged

Bump version to 0.5.3 #399

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: session Approach / Map / Timeline (5/5 from #306, stacked on #367)#371

feat: session Approach / Map / Timeline (5/5 from #306, stacked on #367)#371
anilmurty merged 70 commits into
Metabuilder-Labs:mainfrom
anshss:pr5/approach-map-timeline

anshss commented Jul 2, 2026

Uh oh!

anilmurty commented Jul 2, 2026

Uh oh!

anilmurty left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

anshss commented Jul 2, 2026

What's here

Migrations

Cross-PR seams for merge (resolvable)

Tests

Not browser-dogfooded

Uh oh!

anilmurty commented Jul 2, 2026

Uh oh!

anilmurty left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants