feat: session Approach / Map / Timeline (5/5 from #306, stacked on #367)#371
Conversation
Clicking a Status tile navigated to the global `#/traces` firehose, dropping
the clicked session's identity and landing the user on an unfiltered,
all-agents trace list. The header promised "details" but zoomed out. Add a
real session-scoped destination instead.
- GET /api/v1/sessions/{session_id}: per-session rollup (cost/tokens/tools/
alerts/drift) plus the session's own traces, so the UI can drill into the
existing waterfall. Read-only; require_api_key; 404 as JSONResponse with
response_model=None. Parameterised db.conn SQL only.
- UI SessionDetailView + router case + click fix (index.html:680 now routes to
`#/sessions/<id>`, archived rows clickable). Plan-tier-honest cost framing:
"Implied API value" for subscription, "Local model — no API cost" for local,
real cost for API; no invented spend, no fabricated agent tree (the live
Claude Code telemetry is flat — documented in the route).
- Stop tracking .tj/config.toml: every `tj` run from the repo cwd rewrites it
and rotates the committed ingest_secret. The `.tj/` ignore rule already
exists; this just drops the stale tracked copy.
Tests: +5 integration tests (rollup/tools/traces, unknown→404, subscription
plan_tier→pricing_mode, drift baseline, api-key auth). ruff + mypy clean,
644 pass.
Layer 1 of the run-autopsy arc; subagent-tree capture at ingest (Layer 2) is
the next step.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…view
Layer 2 of the run-autopsy arc. The session detail view now shows how a
session split across models and how its context grew over the run — both
derived from existing gen_ai.llm.call spans, so it works on every session
(live + backfilled) with no schema change.
- GET /api/v1/sessions/{session_id} gains turn_count, model_mix (per-model
calls/tokens/cost rollup) and context_series (time-ordered input-token
series, downsampled to <=120 points, first+last preserved). Parameterised
db.conn SQL; span name from the GenAIAttributes semconv constant.
- UI "Models & context" section: model-mix table + a dependency-free CSS bar
chart of input (context) tokens per LLM call. Descriptive only — no model-
routing-quality claims; subscription caveat reused on cost.
Tests: +3 integration tests (model_mix aggregation/order, turn_count,
context_series ordering + downsample cap). ruff + mypy clean, 647 pass.
Deferred (recorded in .plans): parentUuid within-session branch tree; the
cross-session spawn graph (needs a harness-emitted spawn marker — no clean
parent->child link exists in on-disk Claude Code data).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…unit
Layer 3 of the run-autopsy arc. A fan-out harness (e.g. the meta-repo governor)
stamps tokenjam.run_id (and optional tokenjam.parent_session_id) as OTel resource
attributes on each worker session it spawns; tj groups those sessions into one
Run. Linkage is DECLARED by the spawner — Claude Code OTLP carries no native
parent<->child edge, so it is never reverse-engineered.
- semconv: TjAttributes.RUN_ID / PARENT_SESSION_ID.
- SessionRecord gains run_id + parent_session_id; migration 7 adds the columns
(ADD COLUMN IF NOT EXISTS — fresh-DB and upgrade safe).
- Ingest captures the markers from resource attributes on both paths
(otel/otlp_parsing.py for the spans/OTLP path, api/routes/logs.py for the
Claude Code logs path) and self-heals null-on-update (never overwrites).
- API: run_id/parent_session_id on GET /api/v1/sessions/{id}; new
GET /api/v1/runs/{run_id} (totals + member sessions + parent-edge tree) and a
GET /api/v1/runs index. 404 as JSONResponse + response_model=None.
- UI: RunDetailView (#/runs/<id>) — run rollup + sessions indented by spawn
parent; "Run" link on the session detail. Plan-tier-honest cost framing
(mixed -> implied API value, never a hard spend claim).
Tests: +7 integration tests (logs+spans ingest capture, session-detail
exposure, run grouping/aggregation/tree, unknown->404). ruff + mypy clean,
654 pass.
Harness side is a documented contract (set the resource attrs at spawn),
applied separately in the user's governor.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
L3 added tokenjam.run_id / tokenjam.parent_session_id resource-attribute capture to the HTTP/OTLP (otlp_parsing.py) and Claude Code logs (logs.py) paths but missed convert_otel_span — so a Python SDK app using the in-process exporter would never join a Run. Extract the markers there too, mirroring the existing service.namespace / service.instance.id extraction, so run grouping works across all three ingest paths (CC logs, HTTP/OTLP incl. the TS SDK, and the in-process Python SDK). +2 unit tests for convert_otel_span resource extraction (also the first direct coverage of that block). ruff + mypy clean, 656 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A "Story" section in the session view that explains, step by step, what a Claude
Code session was trying to do and how it went — surfacing the agent's own
narration threaded with its literal tool calls and ok/error outcomes. No LLM, no
generation: read live from the on-disk CC JSONL transcript, nothing stored in the
DB (capture posture unchanged).
- core/transcript.py: build_session_story() locates
~/.claude/projects/*/<session_id>.jsonl (session_id == transcript filename,
verified 100% across cli + sdk-cli), parses task / steps (narration + per-tool
{name, label, status} + is_error/is_retry flags) / outcome. Caps + truncation;
NEVER returns full tool inputs/outputs — only a short arg label + ok/error
(privacy + bounded payload).
- GET /api/v1/sessions/{id}/story (require_api_key, response_model=None);
{available: false} at HTTP 200 for SDK/no-transcript sessions. Projects root
overridable via app.state / TJ_CLAUDE_PROJECTS_ROOT for tests.
- UI Story section: Task callout, step list (expandable narration, tool chips,
error tint, ↻ retry marker, omitted markers), Outcome callout. Dependency-free.
CC-only by design (SDK sessions have no CC JSONL → graceful unavailable state).
+15 tests (unit parser + API available/unavailable). ruff + mypy clean, 671 pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Alerts / Traces) The session card stacked six sections vertically — too much scroll. Split the lower content into tabs while pinning the header + Overview/Cost summary cards at top. Story is the default tab; Tools folds under "Models & context", Behavioral drift under "Alerts". Pure layout change (htm/Preact + CSS), no backend or data-flow change; data still loads as before, only display is gated by the active tab. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Each Agent/Task step in a session's Story now expands into that subagent's own
story (task -> steps -> outcome), recursively, so a session's full log includes
the work of everything it spawned. Subagent transcripts are read from
~/.claude/projects/<proj>/<session_id>/subagents/agent-<agentId>.jsonl, linked by
the agentId carried in the parent step's tool_result (exact match, no heuristic).
- core/transcript.py: include_subagents (default true) resolves each Agent/Task
step's child agentId from its tool_result, loads agent-<id>.jsonl from the root
session's subagents/ dir, and nests its story under the step (step.subagent),
recursively. Guards: MAX_SUBAGENT_DEPTH, a shared step-budget across the tree,
and an agentId cycle-set; depth/budget caps are surfaced, never silent-dropped.
Privacy unchanged at every depth (narration + short tool label + ok/error only).
- GET /sessions/{id}/story: Agent steps carry a recursive `subagent`; ?subagents=false
returns the flat single-session story.
- UI: collapsed "> subagent: <name> - N steps" disclosure under each Agent step;
expands to the subagent's task/steps/outcome indented, recursively (its own Agent
steps expand too).
+9 tests (parent->child->grandchild nesting, agentId resolution, depth/budget caps,
cycle guard, ?subagents=false). ruff + mypy clean, 680 pass. Real-data: this session
nests 12/12 spawned subagents.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The session activity section was a raw wall of full-card steps (a 356-step session rendered ~32k px tall). Make it scannable and address the naming/order feedback. - Rename the user-facing tab + section "Story" -> "Timeline" (the /story endpoint path is unchanged/internal). - Newest-first display: renderStepsNewestFirst() reverses the step list (top level AND nested subagents) while keeping the #n labels (so Metabuilder-Labs#1 is still the first action). The Task callout stays pinned at top and Outcome at bottom; only the steps reverse. - Compact rows: each step is now a one-line row (#n - time - tool chips ok/error - clamped first line of narration); click to expand the full narration + detail. The repetitive per-row model label is dropped and shown only when the model changes (prevModel). Error tint, retry markers, and the recursive subagent disclosure are preserved. UI-only (tokenjam/ui/index.html); ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The session-detail Timeline (StorySection) fetched /sessions/:id/story once on mount with no polling, unlike every other view. A live session's Timeline froze at page-load and only refreshed on remount (tab switch / re-navigate). Add setInterval(load, 10000) matching the house idiom, preserving the last-good Timeline on transient poll failures. Apply the same polling fix to TraceDetailView, which had the identical once-only fetch (selection is by span_id, so re-fetching spans preserves it). Also remove the pinned top-level Task and Outcome callouts: in a long-running session the first prompt goes stale as new prompts are sent, and Outcome was just the last assistant message (already the top step) mislabeled as a final outcome on a live session. The steps list already shows everything in descending time order. Per-subagent Task/Outcome callouts stay (scoped, not stale). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Each Timeline step's text was hard-trimmed to 400 chars server-side, with the full narration discarded. The UI's expand-a-step feature could therefore only ever reveal 400 chars ending in "…" — "show more" was lying, since the rest was never sent. Raise MAX_STEP_TEXT_CHARS to 100K so it acts as a safety guard against a pathological single blob rather than a preview trim; the UI already shows only the first line collapsed and the full text when expanded. Real assistant responses now render complete. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two status-page UX fixes: - Scroll memory: returning from a session/run detail page (e.g. clicking an archived session, then Back) landed at the top of the Status page instead of where you were. The window scrolls (.main#app has no own overflow) and the async list load defeats native scroll restoration. Add a useScrollMemory hook that saves window scroll per view and restores it once the list has rendered. - "restore session" button on every archived (closed/stale) and idle session. Clicking copies `claude --resume <session-id>` to the clipboard so the session can be picked back up in the terminal. Copy uses execCommand inside the click gesture (reliable even when the async Clipboard API hangs on a permission prompt) with a best-effort navigator.clipboard upgrade, and stopPropagation so it doesn't trigger the row/tile navigation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The "restore session" button belongs on the session detail page (where you're looking at one archived/idle session), not on the Status list. Move it: render it under the session title in SessionDetailView when the session is not active (closed / stale / idle), and remove it from the Status page archived table and idle tiles. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…g row Move the restore button onto the title row, right-aligned next to the session heading, instead of stacked under the Session id line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a copy glyph to the restore-session button and a styled on-hover tooltip that tells you to paste & run the command in your terminal, showing the exact `claude --resume <id>` line. Replaces the plain native title tooltip. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude Code records each Task-tool subagent's turns in <session>/subagents/agent-<id>.jsonl, tagged with the parent's sessionId. Backfill folded those spans under the parent session but discarded the subagent identity, so a session's cost could not be broken down per subagent -- yet a single research run can spawn 100+ subagents that drive most of the spend (verified on a real session: 66% of $642 across ~147 subagents, previously invisible inside one parent total). - NormalizedSpan.sub_agent_id + spans.sub_agent_id column (migration 11) - backfill sets it from a record's top-level agentId when isSidechain is true; None on the main thread - both span write paths (db.insert_span, backfill) + _row_to_span carry it - make_llm_span factory gains a sub_agent_id arg Enables GROUP BY sub_agent_id for per-subagent breakdown / right-sizing. Known limitation (separate latent bug, not addressed here): backfill upserts the session row once per file with replace semantics, so sessions.total_cost_usd reflects only the last-processed subagent file. Span-derived cost (get_session_cost / get_cost_summary) stays correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New optimize analyzer that breaks a window's cost down per subagent (sub_agent_id) and flags structural right-sizing candidates: - over_powered: premium (Opus-tier) model, little output, few tool calls - over_provisioned: large context (input + cache) but little output Honesty discipline (CLAUDE.md Rule 14): candidate flags only, never a quality claim; the caveat is surfaced verbatim and the recoverable estimate is left None (we report the spend concentrated in flagged subagents, not a guaranteed saving). Registered as "subagent" (auto-discovered; appended to ANALYZER_ORDER), so it flows through get_optimize_report (MCP) and /api/v1/optimize via a dict round-trip constructor, with a pricing-mode-aware CLI renderer. Verified on a real session: 147 subagents = 66% of a $642 window, of which 25 are flagged ($109) -- Opus subagents fed 600K-1M cache tokens that produced <1K output (over_provisioned). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Backfill is idempotent by span_id, so a plain re-run skips spans already in the DB and never populates sub_agent_id on history ingested before that column existed. --reingest UPDATEs the existing spans in place (sub_agent_id refreshed) instead of skipping them -- no new rows, no duplicates -- so accumulated history becomes attributable per subagent. Surfaced via the new BackfillResult.spans_retagged counter and the CLI summary. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Surfaces the sub_agent_id data in the UI. The session-detail API
(GET /api/v1/sessions/{id}) now returns a `subagents` block: per-subagent
cost/token rollup carrying the same over_powered / over_provisioned
right-sizing flags the optimize analyzer uses (heuristic imported from there
so there's a single source of truth). The dashboard SPA gains a
"Subagents (N)" tab that renders the breakdown table with flagged rows
highlighted and the candidate-only caveat.
Verified in a headless browser against a real backfilled session: 147
subagents, 56 flagged, the tab renders with no console errors.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a work-map view to the session detail: a nested tree of the main
thread + its subagents, each node annotated with a deterministic activity
rollup (files touched, web sources, searches, shell commands, subagents
spawned, errors/retries) joined to its cost / tokens / right-sizing flags.
It composes two things tj already computes — the pure transcript Story
(structure + the agent's own labels, core/transcript) and the span-derived
per-subagent breakdown (cost) — into one render-ready tree via a new pure
core transform (core/workmap.build_work_map). New route
GET /api/v1/sessions/{id}/workmap. The dashboard gains a "Map" tab, placed
before Timeline and made the default, so opening a session lands on the
bird's-eye graph; Timeline is the drill-down.
No LLM, no interpretation: every field is a count, a label the agent
produced, or a cost from real spans. Caps the Story applied (depth/budget/
cycle) surface as node markers, and subagents with recorded cost that never
made it into the bounded tree are reported as an `unmapped` tail — never
silently dropped. Descriptive only: tj reports what happened, the human
judges the approach.
Also fixes a stray NUL byte in ui/index.html (line 1318) that broke
`node --check` and made `file` mis-detect the SPA as binary.
Tests: pure-transform unit tests (rollup/dedup/join/cap/unmapped),
/workmap route integration tests, and UI static-grep guards (tab present +
default + descriptive caveat + no-NUL). ruff + mypy clean; 847 pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
The per-node figure on the right of each Map node is now tokens spent — plan-agnostic, so it reads honestly on subscription as well as API. The estimated dollar cost moves to a hover title. The "unmapped subagents" footer switches to tokens too (build_work_map now returns unmapped_tokens). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
The session "task" (shown on the work-map root and in the /story payload) came from the first user message verbatim — which in Claude Code is the human's words buried under injected <system-reminder> blocks (CLAUDE.md, environment, date) and, for slash-command starts, <command-*> / <local-command-*> tag wrappers. _first_user_prompt now strips those: it returns the actual ask, surfaces a "/cmd args" label when only a slash command remains, and falls through to the next user message when one is pure wrapper. Clean prompts pass through unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
Map node detail listed full absolute paths (e.g. /Users/.../aquanode/.claude/context/product.md), which are hard to read in the dim mono list. A shortPath() UI helper now renders each as "…/dir/file.ext" with the full path on hover; URLs and non-path strings are left untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
A Claude Code session isn't one task — it's a sequence of human asks fired
into the same terminal until the context window fills. Modeling a session as
a single "main task" (with the first prompt as its label) was wrong: the
first prompt is just the first of many, often the least representative.
New model — the session is a list of asks, newest first:
- core/transcript.build_session_asks: segments the transcript at each genuine
user message (_is_user_ask, reusing the harness-wrapper stripper), and for
each segment builds the steps + nested subagents that ask triggered —
reusing the same machinery as build_session_story and sharing one step
budget + cycle guard across the session.
- core/workmap.build_work_map: now folds the ask-segmented story + per-subagent
cost breakdown into a list of ask nodes (newest first), each with its
activity rollup, bucketed token total, and the subagent subtree it spawned.
- GET /sessions/{id}/workmap returns {asks: [...]}; per-ask tokens/cost are
bucketed from LLM-call spans by start_time window.
- UI: the Map tab renders asks (WorkMapAsk), newest first, each expandable to
its outcome, files, and subagent tree.
Subagents and cost are attributed to the ask that spawned them. The
single-task session is just the N=1 case. Descriptive only — tj reports the
asks; the human judges them.
859 tests pass; ruff + mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
Claude Code injects background-task completion notices as user-role messages (<task-notification>…</task-notification>) when an async Task/Agent finishes. The ask segmentation was reading each one as a separate human ask, inflating the count (a real session showed 98 "asks", most of them notifications). _strip_harness_wrapper now strips <task-notification> blocks alongside <system-reminder> and the command wrappers, so a notification-only turn isn't a prompt and doesn't start a new ask — the work that follows folds into the preceding ask's segment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
User asks now stand out visually on both views (brand-blue accent), so a session reads as a conversation rather than an undifferentiated stream. - Timeline: the story marks the first step after each genuine human ask with its prompt (_build_steps include_asks, on for the main thread only). The renderer groups steps by ask and shows each prompt as a distinct "You" block (brand accent), newest ask first, its work beneath. - Map: each ask row gets a brand left-border to match. No new data — reuses the harness-wrapper stripper to find genuine asks, so system-reminder / task-notification / command turns aren't marked as prompts. 864 tests pass; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
… label The boxed prompt block with a "You" badge was too heavy. Drop the box, background, and label — the prompt now renders as a distinct brand-colored line, the slight differentiation that was asked for. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
…terminal Within each ask the steps were ordered newest-first, so a prompt was followed by the LAST step of the exchange instead of its first response — the reverse of what you see in the terminal. The Timeline now renders fully chronological: each user prompt, then its responses in order, top to bottom. The Map stays newest-first as the at-a-glance summary. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
The chronological flip moved the latest activity to the bottom. Restore newest-ask-first (the familiar overview), but keep each ask reading top-down in order — prompt, then its responses — so input/output still pair the way the terminal shows them. Like an inbox: newest thread on top, each thread read top-to-bottom. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
Turns the per-ask Map into a readable account of the session. Each ask is now headlined by WHAT THE AGENT DID (its own outcome narration), with your prompt demoted to a dim context line; a deterministic status icon (● did work / ⚠ flagged / ✗ error / · chat) colors the row and its left border; no-work conversational asks recede to a compact dim line; and the Map reads chronologically (oldest first) so it tells the session's story, distinct from the Timeline's newest-first live tail. Token spend stays the visible metric (cost moved to a hover tooltip). Pure UI change over the existing ask-segmented /workmap data — no backend, no LLM. Crisper LLM-distilled headlines are a planned opt-in follow-up; today's headline is the agent's own last narration, occasionally raw. Tests: updated the two ask-Map regression pins the storyline evolves (subtitle wording + status-driven left border) and added a storyline guard (outcome headline, askStatus, chronological order). node --check clean; 865 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
…e headlines, collapsed chat Turns a long session's Map into a 5-second read instead of a wall: - Each ask is headlined by the FIRST clean sentence of its outcome (markdown stripped), hard one line ≤90 chars — no more raw agent narration cut mid-thought. - A summary band tops the Map (total tokens · asks · subagents · flagged, plus the biggest fan-outs) for the at-a-glance read. - Runs of chat-only asks collapse into a "⋯ N quick exchanges" divider (click to expand), so real work stands out from conversation. - Per-ask sub-counts are clamped to the session total on both the row and the summary, so an impossible >total figure can't render (defensive against upstream non-determinism in ask.subagent_count). Built by a delegated worker agent and gated with a Playwright DOM check (not eyeballing): zero console errors; summary band + 3 chat dividers that expand; zero wrapping headlines (white-space:nowrap, ≤90 chars); row and summary sub-counts agree (106 == 106 ≤ 113 total). node --check clean; 870 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
Revamps the session-detail "Map" tab from the nested text tree into the
synchronized-swimlane board, driven by GET /sessions/{id}/sessionmap.
MapBoardSection renders, over one shared time axis:
- phase bands, a per-event tool strip (colored by category; errors in
--error, retries marked), a context-growth area chart and a cost-burn bar
chart (inline SVG, viewBox 0 0 100 30 so all lanes stay pixel-aligned),
a shared crosshair + exemplar tooltip, an x-axis, and a category legend.
- a time ⇄ step toggle that re-spaces every lane from one pair of x
accessors (wall-clock offset vs even ordinal).
- a codebase-territory treemap (③) aggregated from the read/edit events:
files grouped by directory, shaded by touch intensity (edited→--success,
read→--brand), with a first-touch order badge and ✎ edited marker.
All colors are theme vars (offline-safe; test_ui_offline still green). Falls
back to the existing WorkMapSection when /sessionmap has no data, so nothing
is lost. The subagent lane / board recursion is intentionally deferred (the
/sessionmap events don't carry sub_agent_id yet) — noted in a code comment.
Guarded by static-grep regression tests in test_lens_ui_regression.py
(MapBoardSection + Map-tab wiring + the four lane labels + territory +
time/step toggle); module passes node --check.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Approach tab shipped only the method spine; the approved mock also has a delegation-tree rail, a header stats card, rich delegation cards, and a source legend. This adds them — a small /approach enrichment so the UI reads (not computes) the data, then the UI rework. Backend: - build_method_spine: a delegate move now carries `delegations` (one entry per subagent child: name, agent_id, depth, task, capped, nested spine) instead of a flat `children` list. - /approach joins each delegation to per-subagent cost/tokens/status/flags via _session_subagents (the same breakdown /workmap uses), and returns `agents` (preorder rail summary: main_session + in_session_subagents with status + capture_completeness), `counts` (moves/delegations/dead_ends/ verifies), and `meta` (session cost/tokens). Snapshot read-through kept. UI: - two-column ap-grid: a left ApproachRail (every agent with a status dot, "ended · method kept" badge, provenance line, depth indent + the ephemeral-capture caption) + the method panel. - header card with mandate + outcome + a right-side stats column from counts/meta. - ApproachDelegation: pink cards (↳ name · depth · tokens · $cost · status) expanding into a "how the subagent solved its piece" sub-header + child mandate + the child spine (recursive) on a depth rail; capped → not expanded. - a bottom source-of-each-line legend. Offline-safe (theme vars + emoji only). Tests updated: method-spine delegations shape, /approach agents+counts, and UI static-grep asserts for the rail/header/cards/legend. node --check clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lass The delegation-tree rail's main-agent status dot used a bare `main` modifier class (`<div class="ap-dot main">`), which also matched the app's global `.main` layout rule (flex:1; margin-left:200px; padding:24px 32px). That inflated the main dot to ~64x48 and floated it out of the rail — the big blue blob. Subagent dots used `sub`/`term` (no global collision), so only the main node broke. Namespaced the dot modifiers to `is-main`/`is-sub`/`is-term` so they can't collide with global classes. Main dot now renders 9x9 in place. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes four visual bugs in the Map board (MapBoardSection) confirmed against the approved mock via screenshot + geometry audit: 1. PHASE band labels were full prompt sentences overlapping into mush. Measure plot width, ellipsize each band label (mb-seg-lab), and only print a label when its band is >= ~60px wide; add title= for hover. Dense zero-width phase slivers now render unlabeled. 2. TOOLS event labels (mb-evlab) collided and overflowed the lane. Walk events left->right and keep a label only when >= ~70px past the last kept one (sparse, non-overlapping subset); clamp each label's x so its max-width box stays inside the lane; add title= for hover. 3. x-axis last tick (x=100%) was clipped past the right edge by its centering transform. Right-align the final tick (and left-align x=0) so ticks stay inside the lane. 4. Territory treemap showed giant empty boxes with absolute-path headers. Drop align-items:stretch and the per-file/per-files vertical flex-grow so dir boxes size to their content and every dir's file rows render. Headers now show only the last 1-2 path segments (ellipsized, full path on hover) instead of long absolute paths. Geometry audit is clean (no mb-* overflow); all 152 UI regression + offline tests pass; extracted inline module passes node --check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Enrich GET /sessions/{id}/sessionmap with a top-level `subagents` array —
one entry per in-session subagent with a usable span window: name (from
the story, else agent-<id[:8]>), absolute ts range, the window mapped onto
main-thread event ordinals, and summed tokens/cost. The ordinal mapping
brackets the gap (and clamps to the live edge) when a subagent ran while
the main thread emitted no events, so step mode still positions it; returns
null only when there are no time-anchored events. `subagents: []` when none.
Render the lane in MapBoardSection between the tools and context lanes:
each subagent is a positioned bar on the shared axis (time → ts offset;
step → ordinal), packed onto rows (stacking overlaps, capped), clamped
inside the plot, labelled name + tokens·cost (ellipsized, themed --chart-5).
The lane re-spaces with the time⇄step toggle and is omitted when empty.
Tested visually against the approved mock in both modes (geometry audit
clean); updated the /sessionmap integration test (seeds a subagent span)
and added a UI regression static-grep for the lane.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Map board placed event/phase lanes by transcript step timestamps but the context/cost series + meta.started_at by span timestamps — two clocks on one axis. On a backfilled/resumed session whose span started_at postdates the transcript, every event's wall-clock offset went negative, the UI clamped it to 0, and the whole tool/phase lane collapsed to x=0 while the series spread out. Compute one basis in get_session_map: t0 = min(earliest event ts, earliest span start_time), tEnd = max of the same, duration = tEnd - t0 (floored to 1). Recompute the series t_s and meta.started_at/duration_s against this unified t0 so the event lanes, series, and subagent bars all share one origin and span. _coerce_utc reconciles tz-aware transcript ts with naive DuckDB span datetimes. Normal sessions (transcript ts ≈ span ts) are unchanged; the UI already keyed every lane off meta, so no UI change was needed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Approach rail showed only in-session subagents (Task sidechains). M2b
surfaces a session's cross-terminal CHILDREN too — separate SessionRecords a
harness spawned in another terminal, linked only by a declared
parent_session_id — honestly marked by what we can recover.
/approach now finds child sessions (new runs._child_sessions) and, for each:
- adds a rail agent {provenance: cross_terminal_child, depth 1, status,
cost_usd, tokens, capture_completeness}. capture_completeness is "full" when
the child's own method is recoverable (its live transcript or M1 snapshot via
build_session_story/load_session_method → build_method_spine) else
"session_level" — we have the cost/identity but not the how.
- when the method IS available, splices the child's own spine into a new
cross_terminal list so it nests like an in-session delegation; header counts
roll the spliced moves/dead-ends/verifies in (+1 delegation per child).
cross_terminal is [] and no extra rail agents when a session launched nothing
(the common case), so existing payloads are unchanged.
UI: ApproachRail renders cross_terminal_child amber (is-term) with a
"cross-terminal child · run-linked" sub-line and a "session-level" /
"ended · method kept" badge keyed off completeness — visually distinct from the
pink in-session subagents. A new ApproachCrossTerminal component renders each
spliced child under a "cross-terminal children" divider with the same recursive
ApproachMove styling.
Tests: integration coverage for the full/session-level split + the spliced
spine + the empty common case; static-grep guards for the UI handling.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Verified end-to-end that the Timeline (StorySection) already renders recursive subagent logs: a StoryStep renders a SubagentBlock per spawned subagent, SubagentBlock re-renders the child's steps via renderTimelineSteps, and renderTimelineSteps renders a StoryStep each — a closed cycle, so nested delegations nest arbitrarily deep, each level independently expandable (caret ▾/▸), matching Approach/Map. Caps are honest (depth/size/cycle each map to an explicit "… omitted …"/"… already shown …" note, never a silent drop). Confirmed visually against seeded session 3024587f… (13 subagents): expanding step Metabuilder-Labs#2 surfaces its nested "subagent: Investigate session map + PR 306" block; expanding that renders its Task + 38 own steps. No overflow. No UI change needed (affordance already consistent). Adds two static-grep regression tests asserting the recursion cycle + honest cap markers stay wired, so a future edit can't silently flatten the Timeline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
capture_session_method was wired only into session-close, so backfilled historical sessions never got a session_story snapshot — the very sessions most likely to have their Claude Code transcript pruned later lost their 'how'. Wire capture into ingest_claude_code, snapshotting each newly-ingested session (source="backfill"). Best-effort: capture swallows its own errors and never raises, so it cannot change backfill's result or break ingest. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Map board's sub-agents lane built its `subagents` array purely from spans grouped by `sub_agent_id`, which is set only by backfill from `isSidechain`+`agentId`. Subagents whose spans aren't tagged were dropped, so the lane silently under-counted delegation (~2 bars) versus the transcript-derived Approach rail (~15). Source the lane's subagent SET from the same transcript subagent subtree the rail uses (`_transcript_subagent_index` over the asks payload), unioned with any span-only `sub_agent_id` so recorded usage is never dropped. Span timing still refines each bar's window when available; otherwise it falls back to the subagent's transcript timing (its own first/last step ts, then its spawn ts). Tokens/cost come from `_session_subagents` and are null when the subagent has no `sub_agent_id` spans. A subagent with neither spans nor resolvable ts is still counted with a null window (UI hides window-less bars in step/time but the lane still renders). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PR Metabuilder-Labs#306's deferred "active-time (idle-segmented) durations." On a long or resumed session whose wall-clock is mostly idle between turns (e.g. ~32h spanning ~2.8h of real work), raw wall-clock crammed all the work into thin clusters at the right edge while huge idle gaps ate the axis, making time mode near-useless. Backend: _ActiveAxis builds a piecewise-linear real-time -> active-time map over the UNION of event + span timestamps, passing active periods through 1:1 but collapsing every gap > IDLE_GAP_THRESHOLD_S (300s) down to a fixed COLLAPSED_GAP_S (30s). Every event, context/cost series point, and subagent window gets its active_s position; meta exposes active_duration_s and a gaps array ({start_ts, end_ts, duration_s, at_active_frac}) for the break markers. With no idle gaps the active axis equals the wall-clock axis (no behavior change); t_s/duration_s are kept untouched alongside the new fields. UI: time mode positions every lane (events, series, subagent bars) by active_s / active_duration_s, so the work spreads out and idle no longer dominates; x-axis ticks reflect active time. Each collapsed gap draws a faint dashed break marker with a thinned, clamped "⋯ idle Nh/Nm" label (theme vars, offline-safe). Step mode is unchanged. Tests: integration asserts a synthetic session's idle gap is detected + collapsed (active axis skips the idle minutes) and a gapless session is unchanged; a UI regression statically asserts the active-axis wiring + break markers render. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The TOOLS lane printed the tool's full arg label (a long path-ish string) under each sampled tick, hard-clipped to a 90px box. On idle-gap-collapsed (time-mode) and dense (step-mode) sessions the kept labels rendered into unreadable mush — adjacent boxes overlapped and each showed a mid-segment fragment. - New evLabelShort(): a glanceable tail of the arg — the first token (verb) for a command/prompt, or the basename (last /-segment) for a bare path — hard-capped at 12 chars. Full value stays in title= for hover. (shortPath is kept for the file lists + tooltip.) - Collision-proof spacing: cap the label box (MB_EVLAB_MAX 90→72px, CSS max-width too) and bump MB_EVLAB_GAP 70→80 so the min center-to-center spacing exceeds the box width — two kept, center-anchored labels can never touch. - Enforce the gap on each label's *clamped* (rendered) center, not the raw x: the edge-clamp shifts a label inward, so comparing raw x let an edge-clamped label collide with its neighbor (seen in step mode). Validated visually on session 3024587f in both time and step mode: labels short + spaced, DOM audit shows zero .mb-evlab overlap/overflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make the Map board legible as data, not just shapes: - CONTEXT/COST lanes gain a y-axis gutter (max value at top, 0/$0 at the baseline) plus a mid gridline and a "peak <value>" annotation, read off the already-returned context_series/cost_series via fmtTokens/fmtCost. - Sub-agent bars render "name · tokens · $cost" as one ellipsized run instead of a separate cost span that truncated to a cryptic "1…" stub; null metrics are omitted (transcript-only subagents show just the name). - x-axis row stretches to fill its height (was a zero-height centered ticks box) so the tick labels no longer spill past the board's overflow:hidden bottom; the last tick + legend are fully visible. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…yle interval selector The TOOLS lane rendered each of a session's hundreds of tool calls as a 3px floating tick, so dense bursts smeared into an unreadable smudge and sparse stretches looked empty — the lane conveyed nothing. Time mode now bins events into active-time buckets and renders a stacked-by-category density histogram (Read/Search/Edit/Bash/Task/Web, error stacked on top), so busy bursts are tall bars and idle stretches are honest gaps. A trading-chart-style interval selector (Auto · 1m · 5m · 15m · 1h; Auto targets ~60-100 bars) sets the bin width, and the cost lane re-bins to the SAME bucket edges so a cost spike sits directly under the tool burst that caused it. Step mode is unchanged (per-event ticks remain the event-level view). Frontend-only, no backend. Co-Authored-By: Claude <noreply@anthropic.com>
…-call context, collision-proof labels, de-noised treemap (Metabuilder-Labs#56) The Map board dumped five raw telemetry lanes and deferred all synthesis to hover; on a real long session (74h, 395 tools, ~40 subagents) it was undecodable. Per meta-repo ticket Metabuilder-Labs#56: - Insights strip: deterministic callouts (costliest active stretch, friction errors/retries, top sub-agent by cost, idle share, edit footprint) surface the board's answers by DEFAULT instead of hover-gating them. - CONTEXT lane now plots each call's OWN context occupancy (input+cache per span) instead of a cumulative sum — growth, compaction and resets are visible; the old monotone climb duplicated the total-tokens chip. - Sub-agents lane: labels are px-gated (MB_SUBLAB_MIN_PX) and suppressed on overlapped past-cap bars — bars may overlap under extreme density, text never does; lane grows to 8 packed rows instead of smashing into 3. - Legend now covers every encoding: Other, retry (dashed red = retried step, distinct from solid-red error), and the phase-band tinting. - Territory treemap: temp/scratch reads (TemporaryItems, /tmp, /var/folders) collapse into one muted card and no longer destroy the common-prefix root, so dir labels are workspace-relative; cards/files weight edits over reads; file names keep a readable floor instead of truncating to one letter. Validated: full suite 1718 passed; seeded long-session fixture screenshot- checked in both time and step modes (no label collisions, no console errors). Co-Authored-By: Claude <noreply@anthropic.com>
…-attempt-after-failure (Metabuilder-Labs#58) Founder dogfooding on a real session showed the manual bin ladder produces no relatable change (and 1h bins on a ~40m session collapse the board into one full-width slab), and the dashed retry outline had saturated into noise. - Interval ladder purged: bin width is always auto-resolved for the span and self-described in the cost peak label ('peak $X per 30s' / '/call' in step). - is_retry now requires the previous same-signature step to have FAILED — consecutive successful edits of the same file are normal work, not retries (a real session showed 27 'retries' with ~1 genuine one). Makes every consumer (Map marks, friction chip, method_spine dead-ends, workmap counts) more accurate. New test pins repeat-after-success != retry. - Step mode is the default read (sequence without burst/idle distortion); time stays one click away for cost localization. - Direction note: docs/internal/specs/map-board-direction.md pins the board's job, the purge principles, and the question-driven-zoom north star. Full suite: 1720 passed. Co-Authored-By: Claude <noreply@anthropic.com>
Carve-fallout fixes discovered by the test suite after cherry-picking PR#5: - transcript.py: port session_transcript_path/session_transcript_mtime helpers (runlink.py + sessions.py + status rescue depend on them; their originating commit is outside PR#5's range). - models.py/db.py: add SessionRecord.cache_write_tokens as a model-level default (0) so the session-detail rollup totals work without PR#4's migration 12. - status.py: wire the transcript-mtime live-status rescue (_live_status) so a live CC session with fresh transcript shows active, matching the ebef638 test. - tests: drop out-of-scope Metabuilder-Labs#18 trace-keyed-cost rollup tests and the PR#4 subagent-totals reconcile test (both depend on non-PR5 infrastructure); add missing CaptureConfig/GenAIAttributes imports + reingest mock kwarg; de-dup work-map/user-prompt regression tests.
|
Big PR, but the highest-risk surfaces are both clean, which is what matters. SQL injection across the 1,714-line |
…dogfooding round
Approach tab:
- scrub UTF-8-as-Latin-1 mojibake ("Â\xa0") from /approach and /story payloads
once at data-load (mandates rendered "Â how is..." on real transcripts)
- render **bold**/*italic*/`code` in labels, quotes and the outcome block as
vnodes instead of literal asterisks (escape-first, no links/headings)
- when a move's quote merely re-states its 80-char-truncated label (48/78 moves
on a real session), un-truncate the headline from the quote's lead sentence
instead of printing the same sentence twice
- clamp the ✓ outcome block to 3 lines with a click-to-toggle "show all"
(outcomes arrive truncated mid-word server-side)
- fold runs of >4 consecutive chat-only moves into first + "· N conversational
steps" + last (click to expand) so the method doesn't drown in Q&A narration
- tag review/verify/audit-mandated delegations with a ✅ verify chip + green
accent and recompute the header's verifies stat client-side (the structural
backend classifies them as plain delegates -> verifies:0)
- a delegate move with cards no longer prints the subagent name three times
(label + "Agent <name>" evidence + card) — just the ⑂ marker and the card
- rail nodes badge only the exceptions (live/cross-terminal/capped); the
default "ended · method kept / in-session subagent" pair moves to title=
Map tab:
- close the context lane's area fill at the last sample's x — it faded to the
right edge as a decaying wedge that read as data
- tool-tick labels: keep the distinctive TAIL of over-long filenames
(date-prefixed specs printed the same "2026-07-02-…" fragment for every
tick) and suppress a label that repeats the last shown one
- phase titles: strip leading conversational pleasantries ("Got it — my
mistake." -> "My mistake.") and merge adjacent same-normalized-title phases
(the Metabuilder-Labs#57 confetti pattern)
- in time mode, split any phase band spanning an idle break into segments with
a visible gap (one band bridged an 18h idle gulf as continuous work); label
only the widest segment
- middle-truncate subagent bar labels ("first8…last6") and lower the label
min-width gate to 40px — tail-ellipsis made parallel bars undecodable
Timeline tab:
- prefix each ask with a grey mono "user: " marker so asks carry a speaker
label the way steps carry #n + time
All observations from a real 22.7M-token / $24 session; regression tests
updated + 13 new pattern tests in test_lens_ui_regression.py.
Co-Authored-By: Claude <noreply@anthropic.com>
…uilder-Labs#368/Metabuilder-Labs#369/Metabuilder-Labs#370; append migrations 13-15 after 12) # Conflicts: # tests/integration/test_db.py # tests/unit/test_backfill.py # tokenjam/core/backfill.py # tokenjam/core/db.py # tokenjam/core/models.py # tokenjam/otel/semconv.py
anilmurty
left a comment
There was a problem hiding this comment.
Approving per the earlier review. Resolved 6 conflict files against #367/#368/#369/#370: migrations reordered 12→13→14→15 (append-only, no collision); upsert_session INSERT combined to 18 columns (service_namespace/service_instance_id + cache_write_tokens + run_id/parent_session_id, all aligned with ON CONFLICT + params); backfill.py reconciled the two rewrites — kept the #15 bulk executemany for new spans (with sub_agent_id added to _SPAN_INSERT_SQL) AND #371's reingest re-tag loop for existing spans; semconv/models unions; test_backfill.py reconstructed from #371's tests + #370's subagent-totals test. Full suite green (1728 passed), ruff clean.
Carved from #306 as PR 5 of 5 — the marquee feature: drill into how an agent worked.
What's here
core/transcript.py.core/workmap.py,core/sessionmap.py,core/phases.py).act/delegate/verify/dead_end), recursive, with cross-terminal child splice (core/method_spine.py).session_storyat close + backfill so a killed agent's method survives transcript pruning (core/method_capture.py); re-couples thecapture_session_methodcall into fix: Claude Code session status + lifecycle (1/5 from #306) #367's close handler.tokenjam.run_id/parent_session_id,/api/v1/runs,setup_harnessMCP tool (core/runlink.py,api/routes/runs.py).sub_agent_idon spans + subagent right-sizing analyzer.claudeCLI (core/distill.py, no API key).Migrations
13run_id + parent_session_id ·14sub_agent_id on spans ·15session_story table. (12 intentionally skipped — reserved by #370.)Cross-PR seams for merge (resolvable)
SessionRecord.cache_write_tokens(model field, default 0 here) andcore/backfill.pywith fix: Claude Code cost & alert correctness (4/5 from #306) #370 — trivial conflicts when both land. Per-session cache-write totals read 0 until fix: Claude Code cost & alert correctness (4/5 from #306) #370's migration 12 provides the column.#18session_token_cost_rollupprimitive lands in fix: Claude Code cost & alert correctness (4/5 from #306) #370; this PR's detail cost uses stored values until then.Tests
1658 passed, 0 failed, 1 skipped;
ruff checkclean;index.htmlmodulenode --checkOK; lens string-grep regression suite green.Not browser-dogfooded
The new tab CSS/markup (Approach / Map / Board / Timeline) was inserted via conflict resolution and validated by
node --check+ string-grep regression tests, not a running browser. Worth a visual pass before merge.🤖 Generated with Claude Code