Skip to content

feat: session Approach / Map / Timeline (5/5 from #306, stacked on #367)#371

Merged
anilmurty merged 70 commits into
Metabuilder-Labs:mainfrom
anshss:pr5/approach-map-timeline
Jul 2, 2026
Merged

feat: session Approach / Map / Timeline (5/5 from #306, stacked on #367)#371
anilmurty merged 70 commits into
Metabuilder-Labs:mainfrom
anshss:pr5/approach-map-timeline

Conversation

@anshss

@anshss anshss commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Carved from #306 as PR 5 of 5 — the marquee feature: drill into how an agent worked.

Stacked on #367 (session lifecycle). This branch is built on PR#1, so its diff-vs-main includes #367's commits. Review the own delta (50 files, +13.3k, from pr1/session-lifecycle..HEAD) and merge after #367.

What's here

  • Session detail view — drill into one session from the Status page: model mix, context-growth, per-subagent cost breakdown.
  • Timeline — deterministic transcript play-by-play (recursive subagent logs, no LLM), from core/transcript.py.
  • Map — graphical board of what the agent did per ask: phase / tools / sub-agents / context / cost swimlanes + a codebase-territory treemap (core/workmap.py, core/sessionmap.py, core/phases.py).
  • Approach — deterministic method spine (act/delegate/verify/dead_end), recursive, with cross-terminal child splice (core/method_spine.py).
  • Method persistence — snapshots the reconstructed Story to session_story at close + backfill so a killed agent's method survives transcript pruning (core/method_capture.py); re-couples the capture_session_method call into fix: Claude Code session status + lifecycle (1/5 from #306) #367's close handler.
  • Cross-session run groupingtokenjam.run_id/parent_session_id, /api/v1/runs, setup_harness MCP tool (core/runlink.py, api/routes/runs.py).
  • Per-subagent costsub_agent_id on spans + subagent right-sizing analyzer.
  • On-demand LLM-distilled Map titles via local claude CLI (core/distill.py, no API key).

Migrations

13 run_id + parent_session_id · 14 sub_agent_id on spans · 15 session_story table. (12 intentionally skipped — reserved by #370.)

Cross-PR seams for merge (resolvable)

Tests

1658 passed, 0 failed, 1 skipped; ruff check clean; index.html module node --check OK; lens string-grep regression suite green.

Not browser-dogfooded

The new tab CSS/markup (Approach / Map / Board / Timeline) was inserted via conflict resolution and validated by node --check + string-grep regression tests, not a running browser. Worth a visual pass before merge.

🤖 Generated with Claude Code

anshss and others added 30 commits July 2, 2026 18:59
Clicking a Status tile navigated to the global `#/traces` firehose, dropping
the clicked session's identity and landing the user on an unfiltered,
all-agents trace list. The header promised "details" but zoomed out. Add a
real session-scoped destination instead.

- GET /api/v1/sessions/{session_id}: per-session rollup (cost/tokens/tools/
  alerts/drift) plus the session's own traces, so the UI can drill into the
  existing waterfall. Read-only; require_api_key; 404 as JSONResponse with
  response_model=None. Parameterised db.conn SQL only.
- UI SessionDetailView + router case + click fix (index.html:680 now routes to
  `#/sessions/<id>`, archived rows clickable). Plan-tier-honest cost framing:
  "Implied API value" for subscription, "Local model — no API cost" for local,
  real cost for API; no invented spend, no fabricated agent tree (the live
  Claude Code telemetry is flat — documented in the route).
- Stop tracking .tj/config.toml: every `tj` run from the repo cwd rewrites it
  and rotates the committed ingest_secret. The `.tj/` ignore rule already
  exists; this just drops the stale tracked copy.

Tests: +5 integration tests (rollup/tools/traces, unknown→404, subscription
plan_tier→pricing_mode, drift baseline, api-key auth). ruff + mypy clean,
644 pass.

Layer 1 of the run-autopsy arc; subagent-tree capture at ingest (Layer 2) is
the next step.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…view

Layer 2 of the run-autopsy arc. The session detail view now shows how a
session split across models and how its context grew over the run — both
derived from existing gen_ai.llm.call spans, so it works on every session
(live + backfilled) with no schema change.

- GET /api/v1/sessions/{session_id} gains turn_count, model_mix (per-model
  calls/tokens/cost rollup) and context_series (time-ordered input-token
  series, downsampled to <=120 points, first+last preserved). Parameterised
  db.conn SQL; span name from the GenAIAttributes semconv constant.
- UI "Models & context" section: model-mix table + a dependency-free CSS bar
  chart of input (context) tokens per LLM call. Descriptive only — no model-
  routing-quality claims; subscription caveat reused on cost.

Tests: +3 integration tests (model_mix aggregation/order, turn_count,
context_series ordering + downsample cap). ruff + mypy clean, 647 pass.

Deferred (recorded in .plans): parentUuid within-session branch tree; the
cross-session spawn graph (needs a harness-emitted spawn marker — no clean
parent->child link exists in on-disk Claude Code data).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…unit

Layer 3 of the run-autopsy arc. A fan-out harness (e.g. the meta-repo governor)
stamps tokenjam.run_id (and optional tokenjam.parent_session_id) as OTel resource
attributes on each worker session it spawns; tj groups those sessions into one
Run. Linkage is DECLARED by the spawner — Claude Code OTLP carries no native
parent<->child edge, so it is never reverse-engineered.

- semconv: TjAttributes.RUN_ID / PARENT_SESSION_ID.
- SessionRecord gains run_id + parent_session_id; migration 7 adds the columns
  (ADD COLUMN IF NOT EXISTS — fresh-DB and upgrade safe).
- Ingest captures the markers from resource attributes on both paths
  (otel/otlp_parsing.py for the spans/OTLP path, api/routes/logs.py for the
  Claude Code logs path) and self-heals null-on-update (never overwrites).
- API: run_id/parent_session_id on GET /api/v1/sessions/{id}; new
  GET /api/v1/runs/{run_id} (totals + member sessions + parent-edge tree) and a
  GET /api/v1/runs index. 404 as JSONResponse + response_model=None.
- UI: RunDetailView (#/runs/<id>) — run rollup + sessions indented by spawn
  parent; "Run" link on the session detail. Plan-tier-honest cost framing
  (mixed -> implied API value, never a hard spend claim).

Tests: +7 integration tests (logs+spans ingest capture, session-detail
exposure, run grouping/aggregation/tree, unknown->404). ruff + mypy clean,
654 pass.

Harness side is a documented contract (set the resource attrs at spawn),
applied separately in the user's governor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
L3 added tokenjam.run_id / tokenjam.parent_session_id resource-attribute capture
to the HTTP/OTLP (otlp_parsing.py) and Claude Code logs (logs.py) paths but missed
convert_otel_span — so a Python SDK app using the in-process exporter would never
join a Run. Extract the markers there too, mirroring the existing service.namespace
/ service.instance.id extraction, so run grouping works across all three ingest
paths (CC logs, HTTP/OTLP incl. the TS SDK, and the in-process Python SDK).

+2 unit tests for convert_otel_span resource extraction (also the first direct
coverage of that block). ruff + mypy clean, 656 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A "Story" section in the session view that explains, step by step, what a Claude
Code session was trying to do and how it went — surfacing the agent's own
narration threaded with its literal tool calls and ok/error outcomes. No LLM, no
generation: read live from the on-disk CC JSONL transcript, nothing stored in the
DB (capture posture unchanged).

- core/transcript.py: build_session_story() locates
  ~/.claude/projects/*/<session_id>.jsonl (session_id == transcript filename,
  verified 100% across cli + sdk-cli), parses task / steps (narration + per-tool
  {name, label, status} + is_error/is_retry flags) / outcome. Caps + truncation;
  NEVER returns full tool inputs/outputs — only a short arg label + ok/error
  (privacy + bounded payload).
- GET /api/v1/sessions/{id}/story (require_api_key, response_model=None);
  {available: false} at HTTP 200 for SDK/no-transcript sessions. Projects root
  overridable via app.state / TJ_CLAUDE_PROJECTS_ROOT for tests.
- UI Story section: Task callout, step list (expandable narration, tool chips,
  error tint, ↻ retry marker, omitted markers), Outcome callout. Dependency-free.

CC-only by design (SDK sessions have no CC JSONL → graceful unavailable state).
+15 tests (unit parser + API available/unavailable). ruff + mypy clean, 671 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Alerts / Traces)

The session card stacked six sections vertically — too much scroll. Split the
lower content into tabs while pinning the header + Overview/Cost summary cards at
top. Story is the default tab; Tools folds under "Models & context", Behavioral
drift under "Alerts". Pure layout change (htm/Preact + CSS), no backend or
data-flow change; data still loads as before, only display is gated by the active tab.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Each Agent/Task step in a session's Story now expands into that subagent's own
story (task -> steps -> outcome), recursively, so a session's full log includes
the work of everything it spawned. Subagent transcripts are read from
~/.claude/projects/<proj>/<session_id>/subagents/agent-<agentId>.jsonl, linked by
the agentId carried in the parent step's tool_result (exact match, no heuristic).

- core/transcript.py: include_subagents (default true) resolves each Agent/Task
  step's child agentId from its tool_result, loads agent-<id>.jsonl from the root
  session's subagents/ dir, and nests its story under the step (step.subagent),
  recursively. Guards: MAX_SUBAGENT_DEPTH, a shared step-budget across the tree,
  and an agentId cycle-set; depth/budget caps are surfaced, never silent-dropped.
  Privacy unchanged at every depth (narration + short tool label + ok/error only).
- GET /sessions/{id}/story: Agent steps carry a recursive `subagent`; ?subagents=false
  returns the flat single-session story.
- UI: collapsed "> subagent: <name> - N steps" disclosure under each Agent step;
  expands to the subagent's task/steps/outcome indented, recursively (its own Agent
  steps expand too).

+9 tests (parent->child->grandchild nesting, agentId resolution, depth/budget caps,
cycle guard, ?subagents=false). ruff + mypy clean, 680 pass. Real-data: this session
nests 12/12 spawned subagents.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The session activity section was a raw wall of full-card steps (a 356-step session
rendered ~32k px tall). Make it scannable and address the naming/order feedback.

- Rename the user-facing tab + section "Story" -> "Timeline" (the /story endpoint
  path is unchanged/internal).
- Newest-first display: renderStepsNewestFirst() reverses the step list (top level
  AND nested subagents) while keeping the #n labels (so Metabuilder-Labs#1 is still the first action).
  The Task callout stays pinned at top and Outcome at bottom; only the steps reverse.
- Compact rows: each step is now a one-line row (#n - time - tool chips ok/error -
  clamped first line of narration); click to expand the full narration + detail. The
  repetitive per-row model label is dropped and shown only when the model changes
  (prevModel). Error tint, retry markers, and the recursive subagent disclosure are
  preserved.

UI-only (tokenjam/ui/index.html); ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The session-detail Timeline (StorySection) fetched /sessions/:id/story once
on mount with no polling, unlike every other view. A live session's Timeline
froze at page-load and only refreshed on remount (tab switch / re-navigate).
Add setInterval(load, 10000) matching the house idiom, preserving the
last-good Timeline on transient poll failures. Apply the same polling fix to
TraceDetailView, which had the identical once-only fetch (selection is by
span_id, so re-fetching spans preserves it).

Also remove the pinned top-level Task and Outcome callouts: in a long-running
session the first prompt goes stale as new prompts are sent, and Outcome was
just the last assistant message (already the top step) mislabeled as a final
outcome on a live session. The steps list already shows everything in
descending time order. Per-subagent Task/Outcome callouts stay (scoped, not
stale).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Each Timeline step's text was hard-trimmed to 400 chars server-side, with the
full narration discarded. The UI's expand-a-step feature could therefore only
ever reveal 400 chars ending in "…" — "show more" was lying, since the rest was
never sent. Raise MAX_STEP_TEXT_CHARS to 100K so it acts as a safety guard
against a pathological single blob rather than a preview trim; the UI already
shows only the first line collapsed and the full text when expanded. Real
assistant responses now render complete.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two status-page UX fixes:

- Scroll memory: returning from a session/run detail page (e.g. clicking an
  archived session, then Back) landed at the top of the Status page instead of
  where you were. The window scrolls (.main#app has no own overflow) and the
  async list load defeats native scroll restoration. Add a useScrollMemory hook
  that saves window scroll per view and restores it once the list has rendered.

- "restore session" button on every archived (closed/stale) and idle session.
  Clicking copies `claude --resume <session-id>` to the clipboard so the
  session can be picked back up in the terminal. Copy uses execCommand inside
  the click gesture (reliable even when the async Clipboard API hangs on a
  permission prompt) with a best-effort navigator.clipboard upgrade, and
  stopPropagation so it doesn't trigger the row/tile navigation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The "restore session" button belongs on the session detail page (where you're
looking at one archived/idle session), not on the Status list. Move it: render
it under the session title in SessionDetailView when the session is not active
(closed / stale / idle), and remove it from the Status page archived table and
idle tiles.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…g row

Move the restore button onto the title row, right-aligned next to the session
heading, instead of stacked under the Session id line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a copy glyph to the restore-session button and a styled on-hover tooltip
that tells you to paste & run the command in your terminal, showing the exact
`claude --resume <id>` line. Replaces the plain native title tooltip.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude Code records each Task-tool subagent's turns in
<session>/subagents/agent-<id>.jsonl, tagged with the parent's sessionId.
Backfill folded those spans under the parent session but discarded the
subagent identity, so a session's cost could not be broken down per
subagent -- yet a single research run can spawn 100+ subagents that drive
most of the spend (verified on a real session: 66% of $642 across ~147
subagents, previously invisible inside one parent total).

- NormalizedSpan.sub_agent_id + spans.sub_agent_id column (migration 11)
- backfill sets it from a record's top-level agentId when isSidechain is
  true; None on the main thread
- both span write paths (db.insert_span, backfill) + _row_to_span carry it
- make_llm_span factory gains a sub_agent_id arg

Enables GROUP BY sub_agent_id for per-subagent breakdown / right-sizing.

Known limitation (separate latent bug, not addressed here): backfill
upserts the session row once per file with replace semantics, so
sessions.total_cost_usd reflects only the last-processed subagent file.
Span-derived cost (get_session_cost / get_cost_summary) stays correct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New optimize analyzer that breaks a window's cost down per subagent
(sub_agent_id) and flags structural right-sizing candidates:
  - over_powered:     premium (Opus-tier) model, little output, few tool calls
  - over_provisioned: large context (input + cache) but little output

Honesty discipline (CLAUDE.md Rule 14): candidate flags only, never a
quality claim; the caveat is surfaced verbatim and the recoverable estimate
is left None (we report the spend concentrated in flagged subagents, not a
guaranteed saving).

Registered as "subagent" (auto-discovered; appended to ANALYZER_ORDER), so
it flows through get_optimize_report (MCP) and /api/v1/optimize via a dict
round-trip constructor, with a pricing-mode-aware CLI renderer.

Verified on a real session: 147 subagents = 66% of a $642 window, of which
25 are flagged ($109) -- Opus subagents fed 600K-1M cache tokens that
produced <1K output (over_provisioned).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Backfill is idempotent by span_id, so a plain re-run skips spans already in
the DB and never populates sub_agent_id on history ingested before that
column existed. --reingest UPDATEs the existing spans in place (sub_agent_id
refreshed) instead of skipping them -- no new rows, no duplicates -- so
accumulated history becomes attributable per subagent. Surfaced via the new
BackfillResult.spans_retagged counter and the CLI summary.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Surfaces the sub_agent_id data in the UI. The session-detail API
(GET /api/v1/sessions/{id}) now returns a `subagents` block: per-subagent
cost/token rollup carrying the same over_powered / over_provisioned
right-sizing flags the optimize analyzer uses (heuristic imported from there
so there's a single source of truth). The dashboard SPA gains a
"Subagents (N)" tab that renders the breakdown table with flagged rows
highlighted and the candidate-only caveat.

Verified in a headless browser against a real backfilled session: 147
subagents, 56 flagged, the tab renders with no console errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a work-map view to the session detail: a nested tree of the main
thread + its subagents, each node annotated with a deterministic activity
rollup (files touched, web sources, searches, shell commands, subagents
spawned, errors/retries) joined to its cost / tokens / right-sizing flags.

It composes two things tj already computes — the pure transcript Story
(structure + the agent's own labels, core/transcript) and the span-derived
per-subagent breakdown (cost) — into one render-ready tree via a new pure
core transform (core/workmap.build_work_map). New route
GET /api/v1/sessions/{id}/workmap. The dashboard gains a "Map" tab, placed
before Timeline and made the default, so opening a session lands on the
bird's-eye graph; Timeline is the drill-down.

No LLM, no interpretation: every field is a count, a label the agent
produced, or a cost from real spans. Caps the Story applied (depth/budget/
cycle) surface as node markers, and subagents with recorded cost that never
made it into the bounded tree are reported as an `unmapped` tail — never
silently dropped. Descriptive only: tj reports what happened, the human
judges the approach.

Also fixes a stray NUL byte in ui/index.html (line 1318) that broke
`node --check` and made `file` mis-detect the SPA as binary.

Tests: pure-transform unit tests (rollup/dedup/join/cap/unmapped),
/workmap route integration tests, and UI static-grep guards (tab present +
default + descriptive caveat + no-NUL). ruff + mypy clean; 847 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
The per-node figure on the right of each Map node is now tokens spent —
plan-agnostic, so it reads honestly on subscription as well as API. The
estimated dollar cost moves to a hover title. The "unmapped subagents"
footer switches to tokens too (build_work_map now returns unmapped_tokens).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
The session "task" (shown on the work-map root and in the /story payload)
came from the first user message verbatim — which in Claude Code is the
human's words buried under injected <system-reminder> blocks (CLAUDE.md,
environment, date) and, for slash-command starts, <command-*> /
<local-command-*> tag wrappers.

_first_user_prompt now strips those: it returns the actual ask, surfaces a
"/cmd args" label when only a slash command remains, and falls through to
the next user message when one is pure wrapper. Clean prompts pass through
unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
Map node detail listed full absolute paths (e.g.
/Users/.../aquanode/.claude/context/product.md), which are hard to read in
the dim mono list. A shortPath() UI helper now renders each as
"…/dir/file.ext" with the full path on hover; URLs and non-path strings are
left untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
A Claude Code session isn't one task — it's a sequence of human asks fired
into the same terminal until the context window fills. Modeling a session as
a single "main task" (with the first prompt as its label) was wrong: the
first prompt is just the first of many, often the least representative.

New model — the session is a list of asks, newest first:
- core/transcript.build_session_asks: segments the transcript at each genuine
  user message (_is_user_ask, reusing the harness-wrapper stripper), and for
  each segment builds the steps + nested subagents that ask triggered —
  reusing the same machinery as build_session_story and sharing one step
  budget + cycle guard across the session.
- core/workmap.build_work_map: now folds the ask-segmented story + per-subagent
  cost breakdown into a list of ask nodes (newest first), each with its
  activity rollup, bucketed token total, and the subagent subtree it spawned.
- GET /sessions/{id}/workmap returns {asks: [...]}; per-ask tokens/cost are
  bucketed from LLM-call spans by start_time window.
- UI: the Map tab renders asks (WorkMapAsk), newest first, each expandable to
  its outcome, files, and subagent tree.

Subagents and cost are attributed to the ask that spawned them. The
single-task session is just the N=1 case. Descriptive only — tj reports the
asks; the human judges them.

859 tests pass; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
Claude Code injects background-task completion notices as user-role messages
(<task-notification>…</task-notification>) when an async Task/Agent finishes.
The ask segmentation was reading each one as a separate human ask, inflating
the count (a real session showed 98 "asks", most of them notifications).

_strip_harness_wrapper now strips <task-notification> blocks alongside
<system-reminder> and the command wrappers, so a notification-only turn isn't
a prompt and doesn't start a new ask — the work that follows folds into the
preceding ask's segment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
User asks now stand out visually on both views (brand-blue accent), so a
session reads as a conversation rather than an undifferentiated stream.

- Timeline: the story marks the first step after each genuine human ask with
  its prompt (_build_steps include_asks, on for the main thread only). The
  renderer groups steps by ask and shows each prompt as a distinct "You"
  block (brand accent), newest ask first, its work beneath.
- Map: each ask row gets a brand left-border to match.

No new data — reuses the harness-wrapper stripper to find genuine asks, so
system-reminder / task-notification / command turns aren't marked as prompts.

864 tests pass; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
… label

The boxed prompt block with a "You" badge was too heavy. Drop the box,
background, and label — the prompt now renders as a distinct brand-colored
line, the slight differentiation that was asked for.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
…terminal

Within each ask the steps were ordered newest-first, so a prompt was followed
by the LAST step of the exchange instead of its first response — the reverse of
what you see in the terminal. The Timeline now renders fully chronological:
each user prompt, then its responses in order, top to bottom. The Map stays
newest-first as the at-a-glance summary.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
The chronological flip moved the latest activity to the bottom. Restore
newest-ask-first (the familiar overview), but keep each ask reading top-down
in order — prompt, then its responses — so input/output still pair the way the
terminal shows them. Like an inbox: newest thread on top, each thread read
top-to-bottom.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
Turns the per-ask Map into a readable account of the session. Each ask is
now headlined by WHAT THE AGENT DID (its own outcome narration), with your
prompt demoted to a dim context line; a deterministic status icon (● did
work / ⚠ flagged / ✗ error / · chat) colors the row and its left border;
no-work conversational asks recede to a compact dim line; and the Map reads
chronologically (oldest first) so it tells the session's story, distinct
from the Timeline's newest-first live tail. Token spend stays the visible
metric (cost moved to a hover tooltip).

Pure UI change over the existing ask-segmented /workmap data — no backend,
no LLM. Crisper LLM-distilled headlines are a planned opt-in follow-up;
today's headline is the agent's own last narration, occasionally raw.

Tests: updated the two ask-Map regression pins the storyline evolves
(subtitle wording + status-driven left border) and added a storyline guard
(outcome headline, askStatus, chronological order). node --check clean;
865 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
…e headlines, collapsed chat

Turns a long session's Map into a 5-second read instead of a wall:
- Each ask is headlined by the FIRST clean sentence of its outcome (markdown
  stripped), hard one line ≤90 chars — no more raw agent narration cut
  mid-thought.
- A summary band tops the Map (total tokens · asks · subagents · flagged, plus
  the biggest fan-outs) for the at-a-glance read.
- Runs of chat-only asks collapse into a "⋯ N quick exchanges" divider (click
  to expand), so real work stands out from conversation.
- Per-ask sub-counts are clamped to the session total on both the row and the
  summary, so an impossible >total figure can't render (defensive against
  upstream non-determinism in ask.subagent_count).

Built by a delegated worker agent and gated with a Playwright DOM check (not
eyeballing): zero console errors; summary band + 3 chat dividers that expand;
zero wrapping headlines (white-space:nowrap, ≤90 chars); row and summary
sub-counts agree (106 == 106 ≤ 113 total). node --check clean; 870 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017qQKa76DCYkF68NfeNXtPw
anshss and others added 18 commits July 2, 2026 19:39
Revamps the session-detail "Map" tab from the nested text tree into the
synchronized-swimlane board, driven by GET /sessions/{id}/sessionmap.

MapBoardSection renders, over one shared time axis:
- phase bands, a per-event tool strip (colored by category; errors in
  --error, retries marked), a context-growth area chart and a cost-burn bar
  chart (inline SVG, viewBox 0 0 100 30 so all lanes stay pixel-aligned),
  a shared crosshair + exemplar tooltip, an x-axis, and a category legend.
- a time ⇄ step toggle that re-spaces every lane from one pair of x
  accessors (wall-clock offset vs even ordinal).
- a codebase-territory treemap (③) aggregated from the read/edit events:
  files grouped by directory, shaded by touch intensity (edited→--success,
  read→--brand), with a first-touch order badge and ✎ edited marker.

All colors are theme vars (offline-safe; test_ui_offline still green). Falls
back to the existing WorkMapSection when /sessionmap has no data, so nothing
is lost. The subagent lane / board recursion is intentionally deferred (the
/sessionmap events don't carry sub_agent_id yet) — noted in a code comment.

Guarded by static-grep regression tests in test_lens_ui_regression.py
(MapBoardSection + Map-tab wiring + the four lane labels + territory +
time/step toggle); module passes node --check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Approach tab shipped only the method spine; the approved mock also has
a delegation-tree rail, a header stats card, rich delegation cards, and a
source legend. This adds them — a small /approach enrichment so the UI
reads (not computes) the data, then the UI rework.

Backend:
- build_method_spine: a delegate move now carries `delegations` (one entry
  per subagent child: name, agent_id, depth, task, capped, nested spine)
  instead of a flat `children` list.
- /approach joins each delegation to per-subagent cost/tokens/status/flags
  via _session_subagents (the same breakdown /workmap uses), and returns
  `agents` (preorder rail summary: main_session + in_session_subagents with
  status + capture_completeness), `counts` (moves/delegations/dead_ends/
  verifies), and `meta` (session cost/tokens). Snapshot read-through kept.

UI:
- two-column ap-grid: a left ApproachRail (every agent with a status dot,
  "ended · method kept" badge, provenance line, depth indent + the
  ephemeral-capture caption) + the method panel.
- header card with mandate + outcome + a right-side stats column from
  counts/meta.
- ApproachDelegation: pink cards (↳ name · depth · tokens · $cost · status)
  expanding into a "how the subagent solved its piece" sub-header + child
  mandate + the child spine (recursive) on a depth rail; capped → not
  expanded.
- a bottom source-of-each-line legend.

Offline-safe (theme vars + emoji only). Tests updated: method-spine
delegations shape, /approach agents+counts, and UI static-grep asserts for
the rail/header/cards/legend. node --check clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lass

The delegation-tree rail's main-agent status dot used a bare `main` modifier
class (`<div class="ap-dot main">`), which also matched the app's global
`.main` layout rule (flex:1; margin-left:200px; padding:24px 32px). That
inflated the main dot to ~64x48 and floated it out of the rail — the big
blue blob. Subagent dots used `sub`/`term` (no global collision), so only
the main node broke.

Namespaced the dot modifiers to `is-main`/`is-sub`/`is-term` so they can't
collide with global classes. Main dot now renders 9x9 in place.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes four visual bugs in the Map board (MapBoardSection) confirmed
against the approved mock via screenshot + geometry audit:

1. PHASE band labels were full prompt sentences overlapping into mush.
   Measure plot width, ellipsize each band label (mb-seg-lab), and only
   print a label when its band is >= ~60px wide; add title= for hover.
   Dense zero-width phase slivers now render unlabeled.

2. TOOLS event labels (mb-evlab) collided and overflowed the lane.
   Walk events left->right and keep a label only when >= ~70px past the
   last kept one (sparse, non-overlapping subset); clamp each label's x
   so its max-width box stays inside the lane; add title= for hover.

3. x-axis last tick (x=100%) was clipped past the right edge by its
   centering transform. Right-align the final tick (and left-align x=0)
   so ticks stay inside the lane.

4. Territory treemap showed giant empty boxes with absolute-path
   headers. Drop align-items:stretch and the per-file/per-files vertical
   flex-grow so dir boxes size to their content and every dir's file
   rows render. Headers now show only the last 1-2 path segments
   (ellipsized, full path on hover) instead of long absolute paths.

Geometry audit is clean (no mb-* overflow); all 152 UI regression +
offline tests pass; extracted inline module passes node --check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Enrich GET /sessions/{id}/sessionmap with a top-level `subagents` array —
one entry per in-session subagent with a usable span window: name (from
the story, else agent-<id[:8]>), absolute ts range, the window mapped onto
main-thread event ordinals, and summed tokens/cost. The ordinal mapping
brackets the gap (and clamps to the live edge) when a subagent ran while
the main thread emitted no events, so step mode still positions it; returns
null only when there are no time-anchored events. `subagents: []` when none.

Render the lane in MapBoardSection between the tools and context lanes:
each subagent is a positioned bar on the shared axis (time → ts offset;
step → ordinal), packed onto rows (stacking overlaps, capped), clamped
inside the plot, labelled name + tokens·cost (ellipsized, themed --chart-5).
The lane re-spaces with the time⇄step toggle and is omitted when empty.

Tested visually against the approved mock in both modes (geometry audit
clean); updated the /sessionmap integration test (seeds a subagent span)
and added a UI regression static-grep for the lane.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Map board placed event/phase lanes by transcript step timestamps but
the context/cost series + meta.started_at by span timestamps — two clocks
on one axis. On a backfilled/resumed session whose span started_at
postdates the transcript, every event's wall-clock offset went negative,
the UI clamped it to 0, and the whole tool/phase lane collapsed to x=0
while the series spread out.

Compute one basis in get_session_map: t0 = min(earliest event ts,
earliest span start_time), tEnd = max of the same, duration = tEnd - t0
(floored to 1). Recompute the series t_s and meta.started_at/duration_s
against this unified t0 so the event lanes, series, and subagent bars all
share one origin and span. _coerce_utc reconciles tz-aware transcript ts
with naive DuckDB span datetimes. Normal sessions (transcript ts ≈ span
ts) are unchanged; the UI already keyed every lane off meta, so no UI
change was needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Approach rail showed only in-session subagents (Task sidechains). M2b
surfaces a session's cross-terminal CHILDREN too — separate SessionRecords a
harness spawned in another terminal, linked only by a declared
parent_session_id — honestly marked by what we can recover.

/approach now finds child sessions (new runs._child_sessions) and, for each:
- adds a rail agent {provenance: cross_terminal_child, depth 1, status,
  cost_usd, tokens, capture_completeness}. capture_completeness is "full" when
  the child's own method is recoverable (its live transcript or M1 snapshot via
  build_session_story/load_session_method → build_method_spine) else
  "session_level" — we have the cost/identity but not the how.
- when the method IS available, splices the child's own spine into a new
  cross_terminal list so it nests like an in-session delegation; header counts
  roll the spliced moves/dead-ends/verifies in (+1 delegation per child).

cross_terminal is [] and no extra rail agents when a session launched nothing
(the common case), so existing payloads are unchanged.

UI: ApproachRail renders cross_terminal_child amber (is-term) with a
"cross-terminal child · run-linked" sub-line and a "session-level" /
"ended · method kept" badge keyed off completeness — visually distinct from the
pink in-session subagents. A new ApproachCrossTerminal component renders each
spliced child under a "cross-terminal children" divider with the same recursive
ApproachMove styling.

Tests: integration coverage for the full/session-level split + the spliced
spine + the empty common case; static-grep guards for the UI handling.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Verified end-to-end that the Timeline (StorySection) already renders
recursive subagent logs: a StoryStep renders a SubagentBlock per spawned
subagent, SubagentBlock re-renders the child's steps via renderTimelineSteps,
and renderTimelineSteps renders a StoryStep each — a closed cycle, so nested
delegations nest arbitrarily deep, each level independently expandable
(caret ▾/▸), matching Approach/Map. Caps are honest (depth/size/cycle each
map to an explicit "… omitted …"/"… already shown …" note, never a silent
drop).

Confirmed visually against seeded session 3024587f… (13 subagents): expanding
step Metabuilder-Labs#2 surfaces its nested "subagent: Investigate session map + PR 306"
block; expanding that renders its Task + 38 own steps. No overflow.

No UI change needed (affordance already consistent). Adds two static-grep
regression tests asserting the recursion cycle + honest cap markers stay
wired, so a future edit can't silently flatten the Timeline.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
capture_session_method was wired only into session-close, so backfilled
historical sessions never got a session_story snapshot — the very sessions
most likely to have their Claude Code transcript pruned later lost their
'how'. Wire capture into ingest_claude_code, snapshotting each
newly-ingested session (source="backfill"). Best-effort: capture swallows
its own errors and never raises, so it cannot change backfill's result or
break ingest.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Map board's sub-agents lane built its `subagents` array purely from
spans grouped by `sub_agent_id`, which is set only by backfill from
`isSidechain`+`agentId`. Subagents whose spans aren't tagged were dropped,
so the lane silently under-counted delegation (~2 bars) versus the
transcript-derived Approach rail (~15).

Source the lane's subagent SET from the same transcript subagent subtree
the rail uses (`_transcript_subagent_index` over the asks payload), unioned
with any span-only `sub_agent_id` so recorded usage is never dropped. Span
timing still refines each bar's window when available; otherwise it falls
back to the subagent's transcript timing (its own first/last step ts, then
its spawn ts). Tokens/cost come from `_session_subagents` and are null when
the subagent has no `sub_agent_id` spans. A subagent with neither spans nor
resolvable ts is still counted with a null window (UI hides window-less
bars in step/time but the lane still renders).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PR Metabuilder-Labs#306's deferred "active-time (idle-segmented) durations." On a long or
resumed session whose wall-clock is mostly idle between turns (e.g. ~32h
spanning ~2.8h of real work), raw wall-clock crammed all the work into thin
clusters at the right edge while huge idle gaps ate the axis, making time
mode near-useless.

Backend: _ActiveAxis builds a piecewise-linear real-time -> active-time map
over the UNION of event + span timestamps, passing active periods through 1:1
but collapsing every gap > IDLE_GAP_THRESHOLD_S (300s) down to a fixed
COLLAPSED_GAP_S (30s). Every event, context/cost series point, and subagent
window gets its active_s position; meta exposes active_duration_s and a gaps
array ({start_ts, end_ts, duration_s, at_active_frac}) for the break markers.
With no idle gaps the active axis equals the wall-clock axis (no behavior
change); t_s/duration_s are kept untouched alongside the new fields.

UI: time mode positions every lane (events, series, subagent bars) by active_s
/ active_duration_s, so the work spreads out and idle no longer dominates;
x-axis ticks reflect active time. Each collapsed gap draws a faint dashed break
marker with a thinned, clamped "⋯ idle Nh/Nm" label (theme vars, offline-safe).
Step mode is unchanged.

Tests: integration asserts a synthetic session's idle gap is detected +
collapsed (active axis skips the idle minutes) and a gapless session is
unchanged; a UI regression statically asserts the active-axis wiring + break
markers render.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The TOOLS lane printed the tool's full arg label (a long path-ish
string) under each sampled tick, hard-clipped to a 90px box. On
idle-gap-collapsed (time-mode) and dense (step-mode) sessions the kept
labels rendered into unreadable mush — adjacent boxes overlapped and
each showed a mid-segment fragment.

- New evLabelShort(): a glanceable tail of the arg — the first token
  (verb) for a command/prompt, or the basename (last /-segment) for a
  bare path — hard-capped at 12 chars. Full value stays in title= for
  hover. (shortPath is kept for the file lists + tooltip.)
- Collision-proof spacing: cap the label box (MB_EVLAB_MAX 90→72px,
  CSS max-width too) and bump MB_EVLAB_GAP 70→80 so the min
  center-to-center spacing exceeds the box width — two kept,
  center-anchored labels can never touch.
- Enforce the gap on each label's *clamped* (rendered) center, not the
  raw x: the edge-clamp shifts a label inward, so comparing raw x let
  an edge-clamped label collide with its neighbor (seen in step mode).

Validated visually on session 3024587f in both time and step mode:
labels short + spaced, DOM audit shows zero .mb-evlab overlap/overflow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make the Map board legible as data, not just shapes:
- CONTEXT/COST lanes gain a y-axis gutter (max value at top, 0/$0 at the
  baseline) plus a mid gridline and a "peak <value>" annotation, read off
  the already-returned context_series/cost_series via fmtTokens/fmtCost.
- Sub-agent bars render "name · tokens · $cost" as one ellipsized run
  instead of a separate cost span that truncated to a cryptic "1…" stub;
  null metrics are omitted (transcript-only subagents show just the name).
- x-axis row stretches to fill its height (was a zero-height centered ticks
  box) so the tick labels no longer spill past the board's overflow:hidden
  bottom; the last tick + legend are fully visible.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…yle interval selector

The TOOLS lane rendered each of a session's hundreds of tool calls as a
3px floating tick, so dense bursts smeared into an unreadable smudge and
sparse stretches looked empty — the lane conveyed nothing.

Time mode now bins events into active-time buckets and renders a
stacked-by-category density histogram (Read/Search/Edit/Bash/Task/Web,
error stacked on top), so busy bursts are tall bars and idle stretches
are honest gaps. A trading-chart-style interval selector (Auto · 1m ·
5m · 15m · 1h; Auto targets ~60-100 bars) sets the bin width, and the
cost lane re-bins to the SAME bucket edges so a cost spike sits directly
under the tool burst that caused it. Step mode is unchanged (per-event
ticks remain the event-level view). Frontend-only, no backend.

Co-Authored-By: Claude <noreply@anthropic.com>
…-call context, collision-proof labels, de-noised treemap (Metabuilder-Labs#56)

The Map board dumped five raw telemetry lanes and deferred all synthesis to
hover; on a real long session (74h, 395 tools, ~40 subagents) it was
undecodable. Per meta-repo ticket Metabuilder-Labs#56:

- Insights strip: deterministic callouts (costliest active stretch, friction
  errors/retries, top sub-agent by cost, idle share, edit footprint) surface
  the board's answers by DEFAULT instead of hover-gating them.
- CONTEXT lane now plots each call's OWN context occupancy (input+cache per
  span) instead of a cumulative sum — growth, compaction and resets are
  visible; the old monotone climb duplicated the total-tokens chip.
- Sub-agents lane: labels are px-gated (MB_SUBLAB_MIN_PX) and suppressed on
  overlapped past-cap bars — bars may overlap under extreme density, text
  never does; lane grows to 8 packed rows instead of smashing into 3.
- Legend now covers every encoding: Other, retry (dashed red = retried step,
  distinct from solid-red error), and the phase-band tinting.
- Territory treemap: temp/scratch reads (TemporaryItems, /tmp, /var/folders)
  collapse into one muted card and no longer destroy the common-prefix root,
  so dir labels are workspace-relative; cards/files weight edits over reads;
  file names keep a readable floor instead of truncating to one letter.

Validated: full suite 1718 passed; seeded long-session fixture screenshot-
checked in both time and step modes (no label collisions, no console errors).

Co-Authored-By: Claude <noreply@anthropic.com>
…-attempt-after-failure (Metabuilder-Labs#58)

Founder dogfooding on a real session showed the manual bin ladder produces no
relatable change (and 1h bins on a ~40m session collapse the board into one
full-width slab), and the dashed retry outline had saturated into noise.

- Interval ladder purged: bin width is always auto-resolved for the span and
  self-described in the cost peak label ('peak $X per 30s' / '/call' in step).
- is_retry now requires the previous same-signature step to have FAILED —
  consecutive successful edits of the same file are normal work, not retries
  (a real session showed 27 'retries' with ~1 genuine one). Makes every
  consumer (Map marks, friction chip, method_spine dead-ends, workmap counts)
  more accurate. New test pins repeat-after-success != retry.
- Step mode is the default read (sequence without burst/idle distortion);
  time stays one click away for cost localization.
- Direction note: docs/internal/specs/map-board-direction.md pins the board's
  job, the purge principles, and the question-driven-zoom north star.

Full suite: 1720 passed.

Co-Authored-By: Claude <noreply@anthropic.com>
Carve-fallout fixes discovered by the test suite after cherry-picking PR#5:
- transcript.py: port session_transcript_path/session_transcript_mtime helpers
  (runlink.py + sessions.py + status rescue depend on them; their originating
  commit is outside PR#5's range).
- models.py/db.py: add SessionRecord.cache_write_tokens as a model-level default
  (0) so the session-detail rollup totals work without PR#4's migration 12.
- status.py: wire the transcript-mtime live-status rescue (_live_status) so a
  live CC session with fresh transcript shows active, matching the ebef638 test.
- tests: drop out-of-scope Metabuilder-Labs#18 trace-keyed-cost rollup tests and the PR#4
  subagent-totals reconcile test (both depend on non-PR5 infrastructure); add
  missing CaptureConfig/GenAIAttributes imports + reingest mock kwarg; de-dup
  work-map/user-prompt regression tests.
@anilmurty

Copy link
Copy Markdown
Contributor

Big PR, but the highest-risk surfaces are both clean, which is what matters. SQL injection across the 1,714-line sessions.py + runs.py: PASS — every execute() is $N-bound; the only two {where} interpolations are static literal clause fragments. Offline UI across the +2,472 index.html delta: PASS — zero external URLs. The new subagent analyzer self-registers, and it's honest by construction (estimated_recoverable_usd=None, heuristic confidence, caveat printed verbatim). Pure query-free core modules (transcript/workmap/sessionmap/method_spine) make the bulk very reviewable, and their tests are fixture/factory-based. Two follow-ups, neither blocking: list_runs has an N+1 over runs (bounded by LIMIT, fine for now), and you flagged the new tabs were only node --check'd — let's do a quick visual pass before this merges. Also confirm the session_token_cost_rollup trace-disjunction from #370 behaves once wired here. This is the tip of the stack — merges last, after #367.

anshss and others added 2 commits July 2, 2026 20:40
…dogfooding round

Approach tab:
- scrub UTF-8-as-Latin-1 mojibake ("Â\xa0") from /approach and /story payloads
  once at data-load (mandates rendered "Â how is..." on real transcripts)
- render **bold**/*italic*/`code` in labels, quotes and the outcome block as
  vnodes instead of literal asterisks (escape-first, no links/headings)
- when a move's quote merely re-states its 80-char-truncated label (48/78 moves
  on a real session), un-truncate the headline from the quote's lead sentence
  instead of printing the same sentence twice
- clamp the ✓ outcome block to 3 lines with a click-to-toggle "show all"
  (outcomes arrive truncated mid-word server-side)
- fold runs of >4 consecutive chat-only moves into first + "· N conversational
  steps" + last (click to expand) so the method doesn't drown in Q&A narration
- tag review/verify/audit-mandated delegations with a ✅ verify chip + green
  accent and recompute the header's verifies stat client-side (the structural
  backend classifies them as plain delegates -> verifies:0)
- a delegate move with cards no longer prints the subagent name three times
  (label + "Agent <name>" evidence + card) — just the ⑂ marker and the card
- rail nodes badge only the exceptions (live/cross-terminal/capped); the
  default "ended · method kept / in-session subagent" pair moves to title=

Map tab:
- close the context lane's area fill at the last sample's x — it faded to the
  right edge as a decaying wedge that read as data
- tool-tick labels: keep the distinctive TAIL of over-long filenames
  (date-prefixed specs printed the same "2026-07-02-…" fragment for every
  tick) and suppress a label that repeats the last shown one
- phase titles: strip leading conversational pleasantries ("Got it — my
  mistake." -> "My mistake.") and merge adjacent same-normalized-title phases
  (the Metabuilder-Labs#57 confetti pattern)
- in time mode, split any phase band spanning an idle break into segments with
  a visible gap (one band bridged an 18h idle gulf as continuous work); label
  only the widest segment
- middle-truncate subagent bar labels ("first8…last6") and lower the label
  min-width gate to 40px — tail-ellipsis made parallel bars undecodable

Timeline tab:
- prefix each ask with a grey mono "user: " marker so asks carry a speaker
  label the way steps carry #n + time

All observations from a real 22.7M-token / $24 session; regression tests
updated + 13 new pattern tests in test_lens_ui_regression.py.

Co-Authored-By: Claude <noreply@anthropic.com>
…uilder-Labs#368/Metabuilder-Labs#369/Metabuilder-Labs#370; append migrations 13-15 after 12)

# Conflicts:
#	tests/integration/test_db.py
#	tests/unit/test_backfill.py
#	tokenjam/core/backfill.py
#	tokenjam/core/db.py
#	tokenjam/core/models.py
#	tokenjam/otel/semconv.py

@anilmurty anilmurty left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving per the earlier review. Resolved 6 conflict files against #367/#368/#369/#370: migrations reordered 12→13→14→15 (append-only, no collision); upsert_session INSERT combined to 18 columns (service_namespace/service_instance_id + cache_write_tokens + run_id/parent_session_id, all aligned with ON CONFLICT + params); backfill.py reconciled the two rewrites — kept the #15 bulk executemany for new spans (with sub_agent_id added to _SPAN_INSERT_SQL) AND #371's reingest re-tag loop for existing spans; semconv/models unions; test_backfill.py reconstructed from #371's tests + #370's subagent-totals test. Full suite green (1728 passed), ruff clean.

@anilmurty anilmurty merged commit 5cffc74 into Metabuilder-Labs:main Jul 2, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants