Skip to content

feat: Claude Code session-status dashboard — lifecycle fix, session detail/timeline/map, run grouping, quickstart#306

Closed
anshss wants to merge 108 commits into
Metabuilder-Labs:mainfrom
anshss:fix/claude-code-session-status
Closed

feat: Claude Code session-status dashboard — lifecycle fix, session detail/timeline/map, run grouping, quickstart#306
anshss wants to merge 108 commits into
Metabuilder-Labs:mainfrom
anshss:fix/claude-code-session-status

Conversation

@anshss

@anshss anshss commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Turns the Status page from a flat agent list into a real session-status dashboard for Claude Code — fixing the root bug where live sessions showed as "completed", then building the drill-in surfaces (detail / timeline / map), cross-session run grouping, and a zero-install first run. Rebased onto and merged with current main (v0.5.2).

Summary

  • Root fix: Claude Code/Codex export OTLP logs (not traces); each user_prompt becomes a zero-duration invoke_agent span that ingest mistook for a session completion — so every live session force-completed on its first prompt. Completion is now gated on real duration (end_time > start_time); non-end spans re-activate mistakenly-completed sessions.
  • Session lifecycle: active / idle / stale / closed tiers (SessionRecord.status_at), close-signal endpoint + tj session-end, archive view, per-terminal tiles, display labels (service.instance.id + manual overrides), and a claude shell wrapper for per-terminal naming.
  • Project grouping: dashboard tiles roll up by OTel service.namespace, with a server-side [agents.<id>].project fallback so already-running sessions group without a restart.
  • Session drill-in: a per-session detail view with model-mix + context-growth, a deterministic Timeline (transcript play-by-play incl. recursive subagent logs, no LLM), and a graphical Map of what the agent did per ask.
  • Cross-session Run grouping: a fan-out harness stamps tokenjam.run_id / parent_session_id; tj groups spawned workers into one Run (/api/v1/runs), plus a setup_harness MCP tool + integration doc.
  • Sub-agent cost attribution: sub_agent_id on spans, per-subagent cost breakdown, and a subagent right-sizing analyzer.
  • Zero-install first run + Claude-Code wedge: tj quickstart (npx tj quickstart / uvx), tj context cost diagnostic, tj tokenmaxx quota card, retroactive Opus quota audit, branded home screen on bare tj.
  • Cost/pricing correctness: capture cache-creation tokens separately from reads, current Anthropic/Haiku-4.5 rates, framing block on /api/v1/sessions/{id}, bulk-insert backfill (perf).

Merge with main (v0.5.2)

This branch diverged at v0.5.0 and now merges +107 mainline commits (proxy/policy, Analytics, pricing overrides, Dashboard UI, request-capture #209, resume-dedup #294). Notable reconciliations:

Tests / Verification

  • Full suite green: 1397 passed (pytest tests/unit tests/synthetic tests/agents tests/integration).
  • ruff check + mypy clean across touched files.
  • UI guarded by static-grep regression tests (test_lens_ui_regression.py) + offline test (test_ui_offline.py); module JS validated with node --check.
  • Diff scope: 88 files, +15,552 / −648 vs main.

What's NOT in this PR

  • Live OTel autopsy of subagent trees — CC's live OTLP export is a flat 2-level tree; deep subagent reconstruction is deterministic from the on-disk JSONL only (Timeline path), not the live span path.
  • LLM-generated intent summaries — the Timeline/Map are deterministic (no generation) by design.
  • Active-time (idle-segmented) durations — tiles still show wall-clock; idle-gap segmentation is deferred.
  • Drift on logs-path sessions — needs an idle-sweep to fire session-end hooks; deferred.

🤖 Generated with Claude Code


Session map — capturing how an agent solved a problem (added in this revision)

Builds on the Map/Timeline/Approach surfaces above to answer "how did the agent attempt this" (the method) — not just "what did it do" — and to preserve that method across ephemeral agents (subagents, background agents, child terminals) that are killed when their work finishes. Design + capture-mechanics findings: docs/internal/specs/session-map-method-capture.md.

Backend

  • Method persistence (session_story, migration 14): the reconstructed Story is snapshotted to the DB at session-close and at backfill, with a read-through fallback in /story + /workmap, so a killed agent's method survives Claude Code pruning the on-disk transcript (the cost spans alone don't carry the "how").
  • Provenance + completeness: every work-map node carries provenance (in-session subagent vs cross-terminal child) + capture_completeness (full / capped / session-level), so the UI never claims more than the data supports.
  • GET /sessions/{id}/approachcore/method_spine.py folds the Story into deterministic intent-moves (act / delegate / verify / dead_end), recursive, source-tagged (agent_words / structural; richer labels left to the opt-in distill layer). Enriched with per-delegation cost/status, a rail agents summary, and counts.
  • GET /sessions/{id}/sessionmap — phase/tool-event/context/cost series + subagent windows for the board, on a single time axis unified over event + span clocks (so resumed/backfilled sessions don't collapse in time mode).
  • M2b — cross-terminal splice: run-linked child sessions appear in the Approach tree with honest provenance, their own spine spliced in when their method is recoverable.

UI (tokenjam/ui/index.html, offline-safe)

  • Approach tab — delegation-tree rail (every agent, incl. killed ones, with "method kept" badges) + a method spine with source tags, dead-ends, recursive delegation expansion, and a header stats card.
  • Map board — synchronized swimlanes (phase / tools / sub-agents / context / cost) + a codebase-territory treemap (touch intensity, first-touch order, edited markers), with a time⇄step toggle.
  • Recursion in all three tabs — Approach, Map (delegations + sub-agents lane), and Timeline (nested subagent logs, guarded by regression tests).

Validation: full suite green (1477 passed); every UI change passed a screenshot-vs-mock + geometry-audit gate. Known follow-ups tracked separately (idle-gap time-axis segmentation; sub-agents-lane window fidelity).


Dashboard "Split zones" — coding sessions vs SDK services (added in this revision)

The Status screen observed two very different kinds of agent through one flat "agent" table. TokenJam watches both coding agents (interactive Claude Code / Codex sessions — you care how they worked → Map / Approach / Timeline) and SDK agents (programmatic services — you care about aggregate cost / throughput / error-rate over time, Prometheus-style). One table + a user Cards|List toggle served neither mental model. This revision replaces it with two content-defined zones, and cleans up the archive.

Backend (api/routes/status.py, core/db.py)

  • Kind classification: every agent + archived session is tagged kind: "coding" | "sdk" via is_interactive_coding_agent (the alert engine's existing claude-code / codex prefix classifier). Deliberately not session_id presence — ingest mints a session_id for SDK spans too, so that signal is unreliable.
  • SDK-services block (sdk_services[]): non-interactive agents seen within a 7-day window, each carrying a per-minute cost / calls / err_pct sparkline series (new sdk_service_series db helper — buckets spans by minute, zero-filled to a fixed grid, reusing the existing date_trunc(epoch()) idiom) plus req_per_min, err_rate, and a last-seen-keyed lifecycle: live (≤5m) / went_quiet (≤30m — was steady, just stopped → possible outage) / long_dormant. Best-effort: degrades to [] rather than failing /status. No migration.
  • Archive de-noised: 0-signal "zombie" sessions (0 tokens and 0 tool calls and 0 cost — terminals that opened and did nothing) are filtered out, checked post-rollup so a fan-out session's trace-keyed spend (Polish web UI for demo-readiness #18) still counts as signal; scans a wider candidate window to still surface up to ARCHIVE_LIMIT real sessions.

UI (tokenjam/ui/index.html, offline-safe)

  • Coding sessions zone: active agents as cards (→ #/sessions/<id>), with a collapsible Archived (closed / stale) list keyed on ended-at (collapsed by default so a large archive doesn't bury active work); method stays openable.
  • SDK services zone (SdkServicesPanel): live services as a Prometheus-style panel — cost/min + err% sparklines (reusing the existing Sparkline), req/min and error rate — over a collapsible Inactive (no telemetry) list keyed on last-seen: amber went quiet (possible outage) vs muted long dormant. SDK spend renders as plain dollars (real pay-per-token, not the coding subscription framing).
  • Removed the Cards|List view toggle entirely (zones are content-defined now). Fixed the List rows being unclickable and unified the row's secondary actions.

Validation: full suite green (1691 passed); the split-zones UI was screenshot-validated against an approved mock (both zones populated with seeded SDK data, since a coding-only DB leaves the SDK zone empty by design); UI regression tests updated for the retired toggle + new zones. Design/decision notes in project memory (dashboard-split-zones-direction).

anshss and others added 30 commits May 27, 2026 10:15
…sions

Claude Code / Codex export telemetry as OTLP logs. routes/logs.py maps each
user_prompt event to a zero-duration invoke_agent span (end_time == start_time)
that marks the START of a turn. ingest.py treated any invoke_agent span with a
truthy end_time as a session completion, so every live session was force-
completed on its first prompt and the drift/alert session-end hooks fired on
every turn.

Result on the dashboard: an actively-running session displayed as "completed"
with 0 duration and 0 tokens (the Status tile surfaced whichever empty marker
session was newest), while real cost only showed in the daily roll-up.

Fix:
- ingest._is_session_end(): only treat an invoke_agent span as a completion
  when it has a real duration (end_time > start_time). The SDK @watch() path
  still completes correctly; the logs-path turn markers no longer do.
- Ongoing-activity spans re-activate a session left "completed" (self-heals
  in-flight sessions). Idle sessions are surfaced as "stale" at read time via
  SessionRecord.effective_status (existing SESSION_STALE_THRESHOLD).
- status route: pick the active session by most-recent activity
  (COALESCE(ended_at, started_at) DESC) and report effective_status, so a live
  session wins over a freshly-spawned empty marker.

Tests: factory make_invoke_agent_span(); 4 lifecycle regression tests +
1 DuckDB-backed status-route test proving the live session reads as active.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Status dashboard only showed per-repo agent tiles (claude-code,
claude-code-harness, claude-code-godmode...), because tj names each agent
after the git repo (cmd_onboard._derive_project_name). A user working across
one org's repos saw their work scattered across many tiles with no project
identity.

Add OTel service.namespace as a "project" grouping dimension:

- onboard: derive the git org (_derive_org_name) and write
  service.namespace alongside service.name into the project's
  .claude/settings.json. New --project flag overrides it (e.g. org
  "aquanodeio" -> "aquanode").
- ingest: capture service.namespace from resource attrs on every path —
  logs (Claude Code / Codex), live OTLP spans, and the in-process SDK —
  and persist it on the SessionRecord (late-resolved like plan_tier).
- db: migration 5 adds sessions.service_namespace (nullable); upsert +
  _row_to_session round-trip it.
- status API: each agent now reports its namespace.
- dashboard: tiles group under a project header showing agent count and
  rolled-up cost; agents with no namespace fall under "Ungrouped".

Net effect: every Aquanodeio/* repo rolls up under one "aquanode" project
tile, no per-repo config needed.

Tests: factory service_namespace support; ingest capture + late-resolve;
logs-path resource extraction; status-route end-to-end; org-name parsing
(https/ssh); migration count. Dashboard grouping verified in-browser.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two dashboard-accuracy fixes:

1. Duration: the Status tile picked the "latest completed" session by
   started_at, so a 40s fragment that started 3 min after the real session
   hid a 4.5-hour session that stayed active afterwards. Order
   get_completed_sessions by last activity (COALESCE(ended_at, started_at)
   DESC) instead. Fixes the tile across the status route, MCP, CLI status,
   and metrics — all of which take the limit=1 "current session".

2. Onboard now prompts for a project name (OTel service.namespace) with the
   git repo name as the default, instead of silently deriving the git org.
   A meta-repo (git repo "harness" holding all of "aquanode") can be named
   "aquanode" at onboard time. --project still skips the prompt for
   non-interactive use. Removes the now-unused _derive_org_name.

Tests: ordering regression (long session vs later-started fragment);
make_session gains started_at/ended_at overrides; onboard --project and
interactive project-name prompt; existing onboard CLI tests pass --project.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
service.namespace rides in OTEL_RESOURCE_ATTRIBUTES, which an agent reads at
startup — so an already-running Claude Code session can never start sending it
without a restart. Add a server-side fallback: [agents.<id>].project maps an
agent to a dashboard project, applied by tj regardless of what the wire carries.

- config: AgentConfig.project (round-trips via _dc_to_dict / loader).
- ingest: session.service_namespace falls back to the agent's configured
  project when the span carries none (so running sessions self-heal on their
  next span); an explicit wire namespace still wins.
- status route: namespace falls back to the configured project at query time,
  so even already-completed sessions with NULL namespace group immediately —
  no backfill, no restart.
- onboard: writes the chosen project into [agents.<id>].project too, so both
  the wire (future sessions) and server-side (running sessions) paths are set.

Verified end-to-end: a session sending only service.name=claude-code-harness
(no namespace) reports namespace=aquanode and its true 4h+ duration after just
a daemon restart.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…inals)

Several Claude Code terminals in one repo share a service.name, so they all
report under one agent_id. The Status view collapsed an agent to a single
representative session, hiding concurrent terminals.

Now the status route emits one tile per recently-active session (last span
within SESSION_STALE_THRESHOLD), falling back to the latest completed session
when none are active. Each tile carries its own session_id, tokens, tool
calls, duration, last-seen, per-session cost, and per-session alert count
(when an agent has multiple). The dashboard renders a session-id line per tile
and counts "N sessions" per project group (cost deduped by agent).

Tests: concurrent-sessions-under-one-agent yields one tile each; updated the
live-vs-marker test (both now show as their own tile).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Concurrent terminals show as opaque session-id tiles. Give them names two
ways, resolved in priority order on the status route:
1. [session_labels] config override (full id or prefix match) — names an
   already-running terminal immediately, no relaunch.
2. OTel service.instance.id — durable per-terminal name set at launch via
   OTEL_RESOURCE_ATTRIBUTES; captured like service.namespace.
3. else None (UI falls back to the short session id).

- config: TjConfig.session_labels dict ([session_labels] table).
- service.instance.id captured across logs / OTLP / SDK paths onto
  SessionRecord.service_instance_id (migration 6); late-resolved.
- status route: _session_label() resolves the name; each tile gets "label".
- dashboard: tile header shows the label when present (agent id + short
  session id move to detail rows), else the agent id as before.

Tests: instance.id capture; label priority (config > instance.id > none);
session_labels round-trip; migration count 5->6.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rd wrapper

Add a `claude` shell wrapper (installed by `tj onboard --claude-code`) so each
terminal tags its session with a distinct `service.instance.id`, rendering as
separate dashboard tiles — without hand-editing shell rc and preserving the
project's `service.name` / `service.namespace`.

- New `tj otel-resource-attrs` command (cmd_otel.py): prints the project's OTel
  resource attrs on a single bare line (`service.name=claude-code-<repo>` plus
  `service.namespace=<project>` when the agent has a project configured).
  Registered in main.py and added to `no_db_commands`.
- `tj onboard --claude-code` installs an idempotent `claude()` function into
  `~/.zshrc` (always) and `~/.bashrc` (when present), behind begin/end markers
  so re-onboards replace in place. The wrapper consumes `--as <name>`, derives
  the instance id (--as value, else tty basename, else `unknown`), exports
  `OTEL_RESOURCE_ATTRIBUTES`, and runs `command claude` to avoid recursion.
  Portable across zsh and bash.
- Tests for both the new command (with/without configured project) and the
  wrapper install (presence, idempotency, bashrc-only-when-present).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rchive

Claude Code emits no "session closed" event, so tj can only see
time-since-last-span. Model the lifecycle explicitly and let the `claude`
wrapper report closes:

- models: effective_status gains idle/closed tiers (active <5m, idle <4h,
  stale beyond); status_at(idle_threshold) keeps the property pure while the
  route honours configurable [sessions] idle_minutes (default 240).
- config: TjConfig.session_idle_minutes round-trips via the [sessions] table.
- db: close_sessions_by_instance / close_session_by_id (idempotent, only
  flips status='active' rows, bumps ended_at).
- api: POST /api/v1/sessions/close (ingest-Bearer-protected) marks a
  terminal's active sessions closed; returns {"closed": n}.
- cli: `tj session-end --instance/--session` POSTs best-effort over HTTP
  (no DB), exits 0 silently when the daemon is down.
- wrapper: claude() reports session-end on return and via INT/TERM/HUP trap,
  preserving claude's exit status. Idempotent close makes double-fire safe.
- onboard: stop writing OTEL_RESOURCE_ATTRIBUTES to project settings.json
  (the wrapper owns it per-terminal; settings env would override it and
  collapse tiles) and delete any pre-existing one to migrate older setups.
- status route: only active+idle sessions become tiles (no completed/closed
  fallback), capped at 6 per agent with a surfaced overflow count; new
  `archived` list returns closed+stale sessions (cap 50).
- ui: idle (amber) + closed styles, an Archived sessions table, and a
  "+N more" affordance for capped project groups.

Gates: ruff clean, mypy clean, 639 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Clicking a Status tile navigated to the global `#/traces` firehose, dropping
the clicked session's identity and landing the user on an unfiltered,
all-agents trace list. The header promised "details" but zoomed out. Add a
real session-scoped destination instead.

- GET /api/v1/sessions/{session_id}: per-session rollup (cost/tokens/tools/
  alerts/drift) plus the session's own traces, so the UI can drill into the
  existing waterfall. Read-only; require_api_key; 404 as JSONResponse with
  response_model=None. Parameterised db.conn SQL only.
- UI SessionDetailView + router case + click fix (index.html:680 now routes to
  `#/sessions/<id>`, archived rows clickable). Plan-tier-honest cost framing:
  "Implied API value" for subscription, "Local model — no API cost" for local,
  real cost for API; no invented spend, no fabricated agent tree (the live
  Claude Code telemetry is flat — documented in the route).
- Stop tracking .tj/config.toml: every `tj` run from the repo cwd rewrites it
  and rotates the committed ingest_secret. The `.tj/` ignore rule already
  exists; this just drops the stale tracked copy.

Tests: +5 integration tests (rollup/tools/traces, unknown→404, subscription
plan_tier→pricing_mode, drift baseline, api-key auth). ruff + mypy clean,
644 pass.

Layer 1 of the run-autopsy arc; subagent-tree capture at ingest (Layer 2) is
the next step.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…view

Layer 2 of the run-autopsy arc. The session detail view now shows how a
session split across models and how its context grew over the run — both
derived from existing gen_ai.llm.call spans, so it works on every session
(live + backfilled) with no schema change.

- GET /api/v1/sessions/{session_id} gains turn_count, model_mix (per-model
  calls/tokens/cost rollup) and context_series (time-ordered input-token
  series, downsampled to <=120 points, first+last preserved). Parameterised
  db.conn SQL; span name from the GenAIAttributes semconv constant.
- UI "Models & context" section: model-mix table + a dependency-free CSS bar
  chart of input (context) tokens per LLM call. Descriptive only — no model-
  routing-quality claims; subscription caveat reused on cost.

Tests: +3 integration tests (model_mix aggregation/order, turn_count,
context_series ordering + downsample cap). ruff + mypy clean, 647 pass.

Deferred (recorded in .plans): parentUuid within-session branch tree; the
cross-session spawn graph (needs a harness-emitted spawn marker — no clean
parent->child link exists in on-disk Claude Code data).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…unit

Layer 3 of the run-autopsy arc. A fan-out harness (e.g. the meta-repo governor)
stamps tokenjam.run_id (and optional tokenjam.parent_session_id) as OTel resource
attributes on each worker session it spawns; tj groups those sessions into one
Run. Linkage is DECLARED by the spawner — Claude Code OTLP carries no native
parent<->child edge, so it is never reverse-engineered.

- semconv: TjAttributes.RUN_ID / PARENT_SESSION_ID.
- SessionRecord gains run_id + parent_session_id; migration 7 adds the columns
  (ADD COLUMN IF NOT EXISTS — fresh-DB and upgrade safe).
- Ingest captures the markers from resource attributes on both paths
  (otel/otlp_parsing.py for the spans/OTLP path, api/routes/logs.py for the
  Claude Code logs path) and self-heals null-on-update (never overwrites).
- API: run_id/parent_session_id on GET /api/v1/sessions/{id}; new
  GET /api/v1/runs/{run_id} (totals + member sessions + parent-edge tree) and a
  GET /api/v1/runs index. 404 as JSONResponse + response_model=None.
- UI: RunDetailView (#/runs/<id>) — run rollup + sessions indented by spawn
  parent; "Run" link on the session detail. Plan-tier-honest cost framing
  (mixed -> implied API value, never a hard spend claim).

Tests: +7 integration tests (logs+spans ingest capture, session-detail
exposure, run grouping/aggregation/tree, unknown->404). ruff + mypy clean,
654 pass.

Harness side is a documented contract (set the resource attrs at spawn),
applied separately in the user's governor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
L3 added tokenjam.run_id / tokenjam.parent_session_id resource-attribute capture
to the HTTP/OTLP (otlp_parsing.py) and Claude Code logs (logs.py) paths but missed
convert_otel_span — so a Python SDK app using the in-process exporter would never
join a Run. Extract the markers there too, mirroring the existing service.namespace
/ service.instance.id extraction, so run grouping works across all three ingest
paths (CC logs, HTTP/OTLP incl. the TS SDK, and the in-process Python SDK).

+2 unit tests for convert_otel_span resource extraction (also the first direct
coverage of that block). ruff + mypy clean, 656 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A "Story" section in the session view that explains, step by step, what a Claude
Code session was trying to do and how it went — surfacing the agent's own
narration threaded with its literal tool calls and ok/error outcomes. No LLM, no
generation: read live from the on-disk CC JSONL transcript, nothing stored in the
DB (capture posture unchanged).

- core/transcript.py: build_session_story() locates
  ~/.claude/projects/*/<session_id>.jsonl (session_id == transcript filename,
  verified 100% across cli + sdk-cli), parses task / steps (narration + per-tool
  {name, label, status} + is_error/is_retry flags) / outcome. Caps + truncation;
  NEVER returns full tool inputs/outputs — only a short arg label + ok/error
  (privacy + bounded payload).
- GET /api/v1/sessions/{id}/story (require_api_key, response_model=None);
  {available: false} at HTTP 200 for SDK/no-transcript sessions. Projects root
  overridable via app.state / TJ_CLAUDE_PROJECTS_ROOT for tests.
- UI Story section: Task callout, step list (expandable narration, tool chips,
  error tint, ↻ retry marker, omitted markers), Outcome callout. Dependency-free.

CC-only by design (SDK sessions have no CC JSONL → graceful unavailable state).
+15 tests (unit parser + API available/unavailable). ruff + mypy clean, 671 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Alerts / Traces)

The session card stacked six sections vertically — too much scroll. Split the
lower content into tabs while pinning the header + Overview/Cost summary cards at
top. Story is the default tab; Tools folds under "Models & context", Behavioral
drift under "Alerts". Pure layout change (htm/Preact + CSS), no backend or
data-flow change; data still loads as before, only display is gated by the active tab.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
StatusView's empty-state guard only checked data.agents (active + idle), so once
all sessions aged past the 4h idle window into `archived`, the dashboard showed
the cold-start "No agents found / pip install" screen despite having 50 archived
sessions to display. Guard now also requires `archived` to be empty before
showing the cold-start state, so the (clickable) archived-sessions table renders
whenever there is any history.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Each Agent/Task step in a session's Story now expands into that subagent's own
story (task -> steps -> outcome), recursively, so a session's full log includes
the work of everything it spawned. Subagent transcripts are read from
~/.claude/projects/<proj>/<session_id>/subagents/agent-<agentId>.jsonl, linked by
the agentId carried in the parent step's tool_result (exact match, no heuristic).

- core/transcript.py: include_subagents (default true) resolves each Agent/Task
  step's child agentId from its tool_result, loads agent-<id>.jsonl from the root
  session's subagents/ dir, and nests its story under the step (step.subagent),
  recursively. Guards: MAX_SUBAGENT_DEPTH, a shared step-budget across the tree,
  and an agentId cycle-set; depth/budget caps are surfaced, never silent-dropped.
  Privacy unchanged at every depth (narration + short tool label + ok/error only).
- GET /sessions/{id}/story: Agent steps carry a recursive `subagent`; ?subagents=false
  returns the flat single-session story.
- UI: collapsed "> subagent: <name> - N steps" disclosure under each Agent step;
  expands to the subagent's task/steps/outcome indented, recursively (its own Agent
  steps expand too).

+9 tests (parent->child->grandchild nesting, agentId resolution, depth/budget caps,
cycle guard, ?subagents=false). ruff + mypy clean, 680 pass. Real-data: this session
nests 12/12 spawned subagents.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The session activity section was a raw wall of full-card steps (a 356-step session
rendered ~32k px tall). Make it scannable and address the naming/order feedback.

- Rename the user-facing tab + section "Story" -> "Timeline" (the /story endpoint
  path is unchanged/internal).
- Newest-first display: renderStepsNewestFirst() reverses the step list (top level
  AND nested subagents) while keeping the #n labels (so Metabuilder-Labs#1 is still the first action).
  The Task callout stays pinned at top and Outcome at bottom; only the steps reverse.
- Compact rows: each step is now a one-line row (#n - time - tool chips ok/error -
  clamped first line of narration); click to expand the full narration + detail. The
  repetitive per-row model label is dropped and shown only when the model changes
  (prevModel). Error tint, retry markers, and the recursive subagent disclosure are
  preserved.

UI-only (tokenjam/ui/index.html); ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The session-detail Timeline (StorySection) fetched /sessions/:id/story once
on mount with no polling, unlike every other view. A live session's Timeline
froze at page-load and only refreshed on remount (tab switch / re-navigate).
Add setInterval(load, 10000) matching the house idiom, preserving the
last-good Timeline on transient poll failures. Apply the same polling fix to
TraceDetailView, which had the identical once-only fetch (selection is by
span_id, so re-fetching spans preserves it).

Also remove the pinned top-level Task and Outcome callouts: in a long-running
session the first prompt goes stale as new prompts are sent, and Outcome was
just the last assistant message (already the top step) mislabeled as a final
outcome on a live session. The steps list already shows everything in
descending time order. Per-subagent Task/Outcome callouts stay (scoped, not
stale).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Each Timeline step's text was hard-trimmed to 400 chars server-side, with the
full narration discarded. The UI's expand-a-step feature could therefore only
ever reveal 400 chars ending in "…" — "show more" was lying, since the rest was
never sent. Raise MAX_STEP_TEXT_CHARS to 100K so it acts as a safety guard
against a pathological single blob rather than a preview trim; the UI already
shows only the first line collapsed and the full text when expanded. Real
assistant responses now render complete.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ended_at is a session's last-activity time, shown as "Last seen" in the UI.
close_session(s) advanced it to the close moment (CASE ... ended_at < now),
so a session closed days after its last span reported "Last seen 2h ago" and a
multi-day duration, while its actual last telemetry/transcript message was days
earlier. Closing is an admin action, not telemetry.

- Both close methods now use ended_at = COALESCE(ended_at, now): stamp only
  when NULL, never advance a real last-span time.
- Migration 8 repairs already-closed rows: recompute ended_at from each closed
  session's max span time, lowering only (idempotent, leaves consistent rows).

Tests: close preserves ended_at / stamps when NULL (integration); backfill
lowers bumped rows, leaves consistent rows, ignores active sessions (unit).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two status-page UX fixes:

- Scroll memory: returning from a session/run detail page (e.g. clicking an
  archived session, then Back) landed at the top of the Status page instead of
  where you were. The window scrolls (.main#app has no own overflow) and the
  async list load defeats native scroll restoration. Add a useScrollMemory hook
  that saves window scroll per view and restores it once the list has rendered.

- "restore session" button on every archived (closed/stale) and idle session.
  Clicking copies `claude --resume <session-id>` to the clipboard so the
  session can be picked back up in the terminal. Copy uses execCommand inside
  the click gesture (reliable even when the async Clipboard API hangs on a
  permission prompt) with a best-effort navigator.clipboard upgrade, and
  stopPropagation so it doesn't trigger the row/tile navigation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The "restore session" button belongs on the session detail page (where you're
looking at one archived/idle session), not on the Status list. Move it: render
it under the session title in SessionDetailView when the session is not active
(closed / stale / idle), and remove it from the Status page archived table and
idle tiles.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…g row

Move the restore button onto the title row, right-aligned next to the session
heading, instead of stacked under the Session id line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a copy glyph to the restore-session button and a styled on-hover tooltip
that tells you to paste & run the command in your terminal, showing the exact
`claude --resume <id>` line. Replaces the plain native title tooltip.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude Code (and the OTel/SDK paths) emit cache_creation_tokens, but every
ingest parser dropped them — only cache_read landed in the single cache_tokens
field. The dashboard's "Cache tokens" therefore understated total cache usage
by the cache-write amount, and cache-write tokens were never priced.

Add a first-class cache_creation_tokens field through the stack while keeping
cache_tokens = cache reads (analyzers price it at the cache-read rate):

- models: cache_creation_tokens on NormalizedSpan + SessionRecord
- db: migration 9 adds the column to spans + sessions; wired through
  insert/upsert/row-mappers
- ingest: aggregate cache_creation_tokens per session
- cost: CostEngine now prices cache writes at the cache-write rate
- ingest paths: logs.py (Claude Code), otlp_parsing.py, provider.py; backfill
  realigned to the same convention (was the lone outlier)
- api/dashboard: expose cache_creation_tokens (sessions/runs/traces);
  "Cache tokens" now renders reads + writes

Tests: +2 regression tests; updated tests asserting the old behavior.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reconcile the branch's session-detail / dashboard / transcript / run-grouping
work with 98 commits from main. Notable conflict resolutions:

- Cache read/write split: main shipped the same fix as `cache_write_tokens`
  while this branch used `cache_creation_tokens`. Converged on main's
  `cache_write_tokens` everywhere (models, db, ingest, cost, logs, otlp,
  provider, backfill, api routes, ui). Kept the branch's session-level
  aggregation + dashboard display (main only tracked it at span level).
- Migration number collision: both branches added migration 5+. Kept main's
  released migration 5 (cache_write on spans), renumbered the branch's to
  6-9, and added migration 10 for session-level cache_write (with a defensive
  spans IF NOT EXISTS to repair DBs that consumed v5 as the namespace column).
- backfill: kept the branch's read/write split (more correct than main, which
  still summed both into cache_tokens), under the cache_write_tokens name.
- dashboard: took main's offline-vendored preact/uplot imports + optimize
  route + params plumbing; kept the branch's StatusView (project grouping,
  archived sessions, click-to-detail), SessionDetailView, and run routes.
  "Cache tokens" now shows reads + writes.
- tests: unioned test_cli additions from both sides; kept the branch's
  dynamic migration-count assertions in test_db.

Verified: 836 tests pass, ruff + mypy clean, dashboard loads headless with no
console errors and renders the correct cache-token total.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude Code records each Task-tool subagent's turns in
<session>/subagents/agent-<id>.jsonl, tagged with the parent's sessionId.
Backfill folded those spans under the parent session but discarded the
subagent identity, so a session's cost could not be broken down per
subagent -- yet a single research run can spawn 100+ subagents that drive
most of the spend (verified on a real session: 66% of $642 across ~147
subagents, previously invisible inside one parent total).

- NormalizedSpan.sub_agent_id + spans.sub_agent_id column (migration 11)
- backfill sets it from a record's top-level agentId when isSidechain is
  true; None on the main thread
- both span write paths (db.insert_span, backfill) + _row_to_span carry it
- make_llm_span factory gains a sub_agent_id arg

Enables GROUP BY sub_agent_id for per-subagent breakdown / right-sizing.

Known limitation (separate latent bug, not addressed here): backfill
upserts the session row once per file with replace semantics, so
sessions.total_cost_usd reflects only the last-processed subagent file.
Span-derived cost (get_session_cost / get_cost_summary) stays correct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A Claude Code session is split across files sharing one session_id (main
thread + subagents/agent-*.jsonl). Backfill upserts the session row once
per file, and upsert_session uses replace semantics, so the row ended up
holding only the last-processed subagent file's totals instead of the sum
-- sessions.total_cost_usd and token counts were wrong for any multi-agent
session.

After ingesting, reconcile each touched session row's token + cost
aggregates to SUM over its spans (the source of truth), via the new
DuckDBBackend.recompute_session_totals_from_spans. Scoped to the sessions
backfill actually touched, so live-ingested rows are untouched. Idempotent:
re-running backfill also repairs rows written by an earlier (pre-fix) run.

Span-derived cost (get_session_cost / get_cost_summary) was already
correct; this fixes the stored sessions.* aggregates.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New optimize analyzer that breaks a window's cost down per subagent
(sub_agent_id) and flags structural right-sizing candidates:
  - over_powered:     premium (Opus-tier) model, little output, few tool calls
  - over_provisioned: large context (input + cache) but little output

Honesty discipline (CLAUDE.md Rule 14): candidate flags only, never a
quality claim; the caveat is surfaced verbatim and the recoverable estimate
is left None (we report the spend concentrated in flagged subagents, not a
guaranteed saving).

Registered as "subagent" (auto-discovered; appended to ANALYZER_ORDER), so
it flows through get_optimize_report (MCP) and /api/v1/optimize via a dict
round-trip constructor, with a pricing-mode-aware CLI renderer.

Verified on a real session: 147 subagents = 66% of a $642 window, of which
25 are flagged ($109) -- Opus subagents fed 600K-1M cache tokens that
produced <1K output (over_provisioned).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Backfill is idempotent by span_id, so a plain re-run skips spans already in
the DB and never populates sub_agent_id on history ingested before that
column existed. --reingest UPDATEs the existing spans in place (sub_agent_id
refreshed) instead of skipping them -- no new rows, no duplicates -- so
accumulated history becomes attributable per subagent. Surfaced via the new
BackfillResult.spans_retagged counter and the CLI summary.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
anshss and others added 12 commits July 1, 2026 15:58
… lifecycle

Adds the backend for the dashboard 'Split zones' redesign (PR Metabuilder-Labs#306):

- Tag every /status agent + archived session with kind 'coding' | 'sdk'
  via is_interactive_coding_agent (the alert engine's classifier) — NOT
  session_id presence, which is unreliable (ingest mints session_ids for
  SDK spans too).
- New db helper sdk_service_series(): buckets spans by minute over the
  last N minutes per agent, zero-filled to a fixed grid, returning
  cost/calls/err_pct series + window totals + all-history last_seen.
- New sdk_services[] block in /status: non-interactive agents seen within
  a 7d window, each with sparkline series and a last-seen-keyed lifecycle
  — live (<=5m) / went_quiet (<=30m, possible outage) / long_dormant.
  Best-effort: degrades to [] rather than failing the route.

No migration (all derivable from existing spans columns). 8 new tests.

Co-Authored-By: Claude <noreply@anthropic.com>
Rebuild the Status screen into two content-defined zones (PR Metabuilder-Labs#306),
replacing the Cards|List user toggle:

- Coding sessions zone: active agents (kind=coding) as cards → Map/
  Approach/Timeline, with a collapsible Archived (closed/stale) list
  keyed on ended-at (collapsed by default so a large archive doesn't
  bury the active work + SDK zone).
- SDK services zone (new SdkServicesPanel): live services as a
  Prometheus-style panel with cost/min + err% sparklines (reusing the
  existing Sparkline), req/min and error rate; below it a collapsible
  Inactive list keyed on last-seen — amber 'went quiet' (possible
  outage) vs muted 'long dormant'.

Consumes the kind + sdk_services payload from the status route. SDK
spend renders as plain dollars (real pay-per-token, not the coding
subscription framing). UI regression tests updated for the retired
toggle + new zones.

Co-Authored-By: Claude <noreply@anthropic.com>
A terminal that opened and did nothing (0 tokens, 0 tool calls, 0 cost)
carries no method or cost worth reviewing and only clutters the archive.
Drop these post-rollup (so a fan-out session's trace-keyed spend still
counts as signal even when its stored aggregate reads 0 — Metabuilder-Labs#18), scanning
a wider candidate window so up to ARCHIVE_LIMIT real sessions still
surface. Existing routing/namespace tests that used incidental 0/0/0
sessions get a minimal tool_call_count=1 to clear the filter.

Co-Authored-By: Claude <noreply@anthropic.com>
Rename a session from the dashboard by right-clicking its card (for
sessions launched without --as "name", which show a ttys… fallback).

- Migration 15: session_labels table (session_id PRIMARY KEY).
- db helpers set/delete/get_session_labels (parameterized, idempotent).
- POST /api/v1/sessions/{id}/label (api-key auth): {label} sets it,
  empty/whitespace clears it; stripped + truncated to 120.
- /status overlays the DB label in _session_label with precedence
  config-exact -> config-prefix -> DB rename -> instance-id -> short id,
  so a UI rename beats the ttys… fallback but config stays authoritative.
- UI: right-click a card title -> inline input (Enter saves, Esc cancels,
  blank resets); binds to session_id so it sticks across resumes. No
  window.prompt; on-brand inline edit with a hover ✎ affordance.

Persisting to the DB (not the config TOML) keeps renames a runtime
dashboard action that survives restarts without mangling the user's
config file. New tests + UI-regression grep guard.

Co-Authored-By: Claude <noreply@anthropic.com>
…pencil

The rename onContextMenu was scoped to just the title text, so
right-clicking anywhere else on the card fell through to the browser's
native context menu (confusing — looked like rename didn't exist). Move
onContextMenu to the whole card so right-click anywhere enters edit and
the native menu never wins, and make the ✎ pencil a left-clickable
primary affordance (brightens to brand on hover). Tooltip updated.

Co-Authored-By: Claude <noreply@anthropic.com>
…yle interval selector

The TOOLS lane rendered each of a session's hundreds of tool calls as a
3px floating tick, so dense bursts smeared into an unreadable smudge and
sparse stretches looked empty — the lane conveyed nothing.

Time mode now bins events into active-time buckets and renders a
stacked-by-category density histogram (Read/Search/Edit/Bash/Task/Web,
error stacked on top), so busy bursts are tall bars and idle stretches
are honest gaps. A trading-chart-style interval selector (Auto · 1m ·
5m · 15m · 1h; Auto targets ~60-100 bars) sets the bin width, and the
cost lane re-bins to the SAME bucket edges so a cost spike sits directly
under the tool burst that caused it. Step mode is unchanged (per-event
ticks remain the event-level view). Frontend-only, no backend.

Co-Authored-By: Claude <noreply@anthropic.com>
…-call context, collision-proof labels, de-noised treemap (Metabuilder-Labs#56)

The Map board dumped five raw telemetry lanes and deferred all synthesis to
hover; on a real long session (74h, 395 tools, ~40 subagents) it was
undecodable. Per meta-repo ticket Metabuilder-Labs#56:

- Insights strip: deterministic callouts (costliest active stretch, friction
  errors/retries, top sub-agent by cost, idle share, edit footprint) surface
  the board's answers by DEFAULT instead of hover-gating them.
- CONTEXT lane now plots each call's OWN context occupancy (input+cache per
  span) instead of a cumulative sum — growth, compaction and resets are
  visible; the old monotone climb duplicated the total-tokens chip.
- Sub-agents lane: labels are px-gated (MB_SUBLAB_MIN_PX) and suppressed on
  overlapped past-cap bars — bars may overlap under extreme density, text
  never does; lane grows to 8 packed rows instead of smashing into 3.
- Legend now covers every encoding: Other, retry (dashed red = retried step,
  distinct from solid-red error), and the phase-band tinting.
- Territory treemap: temp/scratch reads (TemporaryItems, /tmp, /var/folders)
  collapse into one muted card and no longer destroy the common-prefix root,
  so dir labels are workspace-relative; cards/files weight edits over reads;
  file names keep a readable floor instead of truncating to one letter.

Validated: full suite 1718 passed; seeded long-session fixture screenshot-
checked in both time and step modes (no label collisions, no console errors).

Co-Authored-By: Claude <noreply@anthropic.com>
…ion (Metabuilder-Labs#56 lesson)

Co-Authored-By: Claude <noreply@anthropic.com>
…-attempt-after-failure (Metabuilder-Labs#58)

Founder dogfooding on a real session showed the manual bin ladder produces no
relatable change (and 1h bins on a ~40m session collapse the board into one
full-width slab), and the dashed retry outline had saturated into noise.

- Interval ladder purged: bin width is always auto-resolved for the span and
  self-described in the cost peak label ('peak $X per 30s' / '/call' in step).
- is_retry now requires the previous same-signature step to have FAILED —
  consecutive successful edits of the same file are normal work, not retries
  (a real session showed 27 'retries' with ~1 genuine one). Makes every
  consumer (Map marks, friction chip, method_spine dead-ends, workmap counts)
  more accurate. New test pins repeat-after-success != retry.
- Step mode is the default read (sequence without burst/idle distortion);
  time stays one click away for cost localization.
- Direction note: docs/internal/specs/map-board-direction.md pins the board's
  job, the purge principles, and the question-driven-zoom north star.

Full suite: 1720 passed.

Co-Authored-By: Claude <noreply@anthropic.com>
map-board-direction, session-map-method-capture and harness-integration are
workspace design notes, not user docs — clutter for OSS consumers. Relocated
to the private meta-workspace.

Co-Authored-By: Claude <noreply@anthropic.com>
…s squatted on npm

npm already has an unrelated 'tj' package (a pub/sub lib at 2.1.0), so the
wrapper could never publish under that name and docs/installation.md linked to
someone else's package. The npm PACKAGE is now 'tokenjam' (free; the TS SDK is
scoped as @tokenjam/sdk so no collision) while the installed BIN stays 'tj'.
Zero-install invocation is 'npx tokenjam'; all docs/comments updated.

Co-Authored-By: Claude <noreply@anthropic.com>
The wrapper's header (and every doc) promised bare npx runs the zero-install
quota report, but main() was pure passthrough — so bare npx hit the branded
home screen, which assumes an installed CLI and dead-ends an npx user
('You're set up', suggesting commands they don't have). Local bare 'tj' keeps
the home screen (Metabuilder-Labs#240); the npx surface routes to quickstart as documented.

Verified: bare wrapper run spawns 'tj quickstart' via uvx; explicit args pass
through untouched; no-runner fallback prints guidance and exits 1.

Co-Authored-By: Claude <noreply@anthropic.com>
@anshss

anshss commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Superseded — this branch has been split into 5 focused, individually test-green PRs carved from these commits (original messages preserved), each rebased onto current main:

Suggested merge order: #367/#368/#370 (independent) → #369 (after #368) → #371 (after #367). Closing in favor of the series.

@anshss anshss closed this Jul 2, 2026
anshss added a commit to anshss/tokenjam that referenced this pull request Jul 2, 2026
…dogfooding round

Approach tab:
- scrub UTF-8-as-Latin-1 mojibake ("Â\xa0") from /approach and /story payloads
  once at data-load (mandates rendered "Â how is..." on real transcripts)
- render **bold**/*italic*/`code` in labels, quotes and the outcome block as
  vnodes instead of literal asterisks (escape-first, no links/headings)
- when a move's quote merely re-states its 80-char-truncated label (48/78 moves
  on a real session), un-truncate the headline from the quote's lead sentence
  instead of printing the same sentence twice
- clamp the ✓ outcome block to 3 lines with a click-to-toggle "show all"
  (outcomes arrive truncated mid-word server-side)
- fold runs of >4 consecutive chat-only moves into first + "· N conversational
  steps" + last (click to expand) so the method doesn't drown in Q&A narration
- tag review/verify/audit-mandated delegations with a ✅ verify chip + green
  accent and recompute the header's verifies stat client-side (the structural
  backend classifies them as plain delegates -> verifies:0)
- a delegate move with cards no longer prints the subagent name three times
  (label + "Agent <name>" evidence + card) — just the ⑂ marker and the card
- rail nodes badge only the exceptions (live/cross-terminal/capped); the
  default "ended · method kept / in-session subagent" pair moves to title=

Map tab:
- close the context lane's area fill at the last sample's x — it faded to the
  right edge as a decaying wedge that read as data
- tool-tick labels: keep the distinctive TAIL of over-long filenames
  (date-prefixed specs printed the same "2026-07-02-…" fragment for every
  tick) and suppress a label that repeats the last shown one
- phase titles: strip leading conversational pleasantries ("Got it — my
  mistake." -> "My mistake.") and merge adjacent same-normalized-title phases
  (the Metabuilder-Labs#57 confetti pattern)
- in time mode, split any phase band spanning an idle break into segments with
  a visible gap (one band bridged an 18h idle gulf as continuous work); label
  only the widest segment
- middle-truncate subagent bar labels ("first8…last6") and lower the label
  min-width gate to 40px — tail-ellipsis made parallel bars undecodable

Timeline tab:
- prefix each ask with a grey mono "user: " marker so asks carry a speaker
  label the way steps carry #n + time

All observations from a real 22.7M-token / $24 session; regression tests
updated + 13 new pattern tests in test_lens_ui_regression.py.

Co-Authored-By: Claude <noreply@anthropic.com>
anilmurty pushed a commit that referenced this pull request Jul 2, 2026
Root fix: Claude Code / Codex export OTLP *logs*, so each user prompt
becomes a zero-duration invoke_agent span (end_time == start_time) that
ingest mistook for a session *completion* — every live session
force-completed on its first prompt (dashboard showed active work as
"completed", 0 duration, and tripped drift/alert false-positives).
Completion is now gated on real duration (end_time > start_time), and a
non-end span re-activates a mistakenly-completed session.

On top of the fix, the Status page gains an honest session model:
- Lifecycle tiers active/idle/stale/closed (SessionRecord.status_at),
  with a transcript-mtime rescue for live-but-span-quiet sessions.
- Explicit close: POST /api/v1/sessions/close + `tj session-end`
  (preserves ended_at / "Last seen"; only stamps status).
- Project grouping via OTel service.namespace + a server-side
  agent->project fallback (groups running sessions without a restart).
- Per-terminal display labels via service.instance.id + manual
  overrides (right-click rename) and `tj otel-resource-attrs`.
- Split-zones Status view: coding sessions (cards + archived) vs
  SDK services (per-minute cost/err sparklines), 0-signal zombies
  filtered from the archive.

Migrations 8-11: service_namespace, service_instance_id, ended_at
repair, session_labels.

Carved from #306 (fix/claude-code-session-status) as the first of five
focused PRs.

Co-Authored-By: Claude <noreply@anthropic.com>
anilmurty added a commit that referenced this pull request Jul 2, 2026
fix: Claude Code session status + lifecycle (1/5 from #306)
anilmurty added a commit that referenced this pull request Jul 2, 2026
feat: tj context cost diagnostics (3/5 from #306)
anilmurty added a commit that referenced this pull request Jul 2, 2026
feat: zero-install / npm package (2/5 from #306, stacked on #368)
anilmurty added a commit that referenced this pull request Jul 2, 2026
fix: Claude Code cost & alert correctness (4/5 from #306)
anilmurty pushed a commit that referenced this pull request Jul 2, 2026
PR #306's deferred "active-time (idle-segmented) durations." On a long or
resumed session whose wall-clock is mostly idle between turns (e.g. ~32h
spanning ~2.8h of real work), raw wall-clock crammed all the work into thin
clusters at the right edge while huge idle gaps ate the axis, making time
mode near-useless.

Backend: _ActiveAxis builds a piecewise-linear real-time -> active-time map
over the UNION of event + span timestamps, passing active periods through 1:1
but collapsing every gap > IDLE_GAP_THRESHOLD_S (300s) down to a fixed
COLLAPSED_GAP_S (30s). Every event, context/cost series point, and subagent
window gets its active_s position; meta exposes active_duration_s and a gaps
array ({start_ts, end_ts, duration_s, at_active_frac}) for the break markers.
With no idle gaps the active axis equals the wall-clock axis (no behavior
change); t_s/duration_s are kept untouched alongside the new fields.

UI: time mode positions every lane (events, series, subagent bars) by active_s
/ active_duration_s, so the work spreads out and idle no longer dominates;
x-axis ticks reflect active time. Each collapsed gap draws a faint dashed break
marker with a thinned, clamped "⋯ idle Nh/Nm" label (theme vars, offline-safe).
Step mode is unchanged.

Tests: integration asserts a synthetic session's idle gap is detected +
collapsed (active axis skips the idle minutes) and a gapless session is
unchanged; a UI regression statically asserts the active-axis wiring + break
markers render.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
anilmurty added a commit that referenced this pull request Jul 2, 2026
feat: session Approach / Map / Timeline (5/5 from #306, stacked on #367)
anilmurty pushed a commit to anshss/tokenjam that referenced this pull request Jul 2, 2026
…o stop silent ingest drops

`run_migrations` keyed purely on the migration version INTEGER: if a version was
recorded-applied under an older or renumbered definition (this repo renumbered
migrations during the main merge — PR Metabuilder-Labs#306), the current SQL for that version
never re-runs and its `ADD COLUMN` silently never lands. `ADD COLUMN IF NOT
EXISTS` only helps while the version is *unrecorded*. A live `telemetry.duckdb`
hit exactly this: migration 7 recorded applied but `spans.request_params` /
`request_tools` absent — so every ingest of a token-bearing record failed with a
DuckDB Binder Error and was dropped, surfacing as a blank/stale Status page with
active sessions showing no data.

Make schema application self-healing rather than trusting the recorded version
set alone:

- `EXPECTED_ADDITIVE_COLUMNS` single-sources the additive columns the code
  depends on; `ensure_expected_columns` re-issues `ADD COLUMN IF NOT EXISTS` for
  the whole set regardless of recorded versions (idempotent no-op when healthy),
  and `run_migrations` calls it on every open so a mismatched DB self-heals and
  logs a WARN naming the columns it reconciled.
- `tj doctor` gains a "Schema integrity" check that WARNs when the live schema is
  missing an expected column and points at the fix; `tj doctor --repair` runs the
  reconcile.

Regression tests simulate the recorded-version/missing-column state (drop
migration 7's columns while leaving version 7 recorded) and assert the DB
self-heals on next open, that a real span carrying `request_params` ingests
instead of being dropped on a Binder Error, and that the doctor check + repair
work through the real CLI.

Closes Metabuilder-Labs#55

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants