Skip to content

fix: Claude Code session status + lifecycle (1/5 from #306)#367

Merged
anilmurty merged 2 commits into
Metabuilder-Labs:mainfrom
anshss:pr1/session-lifecycle
Jul 2, 2026
Merged

fix: Claude Code session status + lifecycle (1/5 from #306)#367
anilmurty merged 2 commits into
Metabuilder-Labs:mainfrom
anshss:pr1/session-lifecycle

Conversation

@anshss

@anshss anshss commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Carved from #306 as PR 1 of 5 — a focused split of the session-status dashboard branch. This one delivers the root bug fix + the honest session lifecycle model it enables.

Root fix

Claude Code / Codex export OTLP logs, not traces — so each user_prompt turn produces a zero-duration invoke_agent span (end_time == start_time). ingest mistook that for a session completion, so every live session force-completed on its first prompt (dashboard showed active work as "completed" with 0 duration, and it tripped drift/alert false-positives). Completion is now gated on real duration (end_time > start_time); a non-end span re-activates a mistakenly-completed session.

Lifecycle model (built on the fix)

  • Tiers active / idle / stale / closed via SessionRecord.status_at() (5 min stale, 4h idle default), with a transcript-mtime rescue for live-but-span-quiet sessions.
  • Explicit closePOST /api/v1/sessions/close + tj session-end (preserves ended_at / "Last seen"; only stamps status).
  • Project grouping — OTel service.namespace + a server-side agent→project fallback (groups already-running sessions without a restart).
  • Display labelsservice.instance.id + manual overrides (right-click rename) + tj otel-resource-attrs.
  • Split-zones Status — coding sessions (cards + archived) vs SDK services (per-minute cost/err sparklines); 0-signal "zombie" sessions filtered from the archive.

Migrations

8 service_namespace · 9 service_instance_id · 10 ended_at repair · 11 session_labels (renumbered contiguous from main's current max of 7).

Notes for the series

  • Decoupled from later PRs: the capture_session_method snapshot on close, the session_token_cost_rollup (Polish web UI for demo-readiness #18) used by the zombie filter, and the core.transcript mtime-rescue are all deferred — PR1's close endpoint just closes; the zombie filter uses stored token/cost. Polish web UI for demo-readiness #18 lands in the cost PR; method-capture in the Approach/Map/Timeline PR.
  • uv.lock (introduced on the branch, untracked on main) is intentionally not included here.

Tests

Full suite green: 1400 passed, 1 skipped (pytest tests/unit tests/synthetic tests/agents tests/integration). ruff check clean; index.html module validated with node --check.

🤖 Generated with Claude Code

Root fix: Claude Code / Codex export OTLP *logs*, so each user prompt
becomes a zero-duration invoke_agent span (end_time == start_time) that
ingest mistook for a session *completion* — every live session
force-completed on its first prompt (dashboard showed active work as
"completed", 0 duration, and tripped drift/alert false-positives).
Completion is now gated on real duration (end_time > start_time), and a
non-end span re-activates a mistakenly-completed session.

On top of the fix, the Status page gains an honest session model:
- Lifecycle tiers active/idle/stale/closed (SessionRecord.status_at),
  with a transcript-mtime rescue for live-but-span-quiet sessions.
- Explicit close: POST /api/v1/sessions/close + `tj session-end`
  (preserves ended_at / "Last seen"; only stamps status).
- Project grouping via OTel service.namespace + a server-side
  agent->project fallback (groups running sessions without a restart).
- Per-terminal display labels via service.instance.id + manual
  overrides (right-click rename) and `tj otel-resource-attrs`.
- Split-zones Status view: coding sessions (cards + archived) vs
  SDK services (per-minute cost/err sparklines), 0-signal zombies
  filtered from the archive.

Migrations 8-11: service_namespace, service_instance_id, ended_at
repair, session_labels.

Carved from Metabuilder-Labs#306 (fix/claude-code-session-status) as the first of five
focused PRs.

Co-Authored-By: Claude <noreply@anthropic.com>
@anilmurty

Copy link
Copy Markdown
Contributor

Reviewed the full lifecycle fix — no blockers from me. Root diagnosis is right: the logs path emits zero-duration invoke_agent markers every turn, and gating completion on end_time > start_time plus re-activating on later activity is the correct fix. Ran the touched suites locally, green. Rules clean — SQL is $N-bound, the new /sessions/close write endpoint is in PROTECTED_PATHS, migrations 8–11 idempotent. Three tiny non-blockers: status_with_transcript_mtime ships unused (fine as scaffolding), _serialise writes a [sessions] block on every config save, and the shell wrapper double-fires session-end on interrupt (idempotent, harmless). This is the base of the stack, so it merges first.

@anilmurty anilmurty left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving per the review above — root lifecycle fix is sound, rules clean, suite green. Merging as the base of the stack (#371 follows).

@anilmurty anilmurty merged commit 65752a6 into Metabuilder-Labs:main Jul 2, 2026
4 checks passed
anilmurty added a commit to anshss/tokenjam that referenced this pull request Jul 2, 2026
anilmurty added a commit to anshss/tokenjam that referenced this pull request Jul 2, 2026
…uilder-Labs#368/Metabuilder-Labs#369; append migration 12 after 8-11)

# Conflicts:
#	tests/integration/test_db.py
#	tokenjam/core/alerts.py
#	tokenjam/core/db.py
anilmurty added a commit to anshss/tokenjam that referenced this pull request Jul 2, 2026
…uilder-Labs#368/Metabuilder-Labs#369/Metabuilder-Labs#370; append migrations 13-15 after 12)

# Conflicts:
#	tests/integration/test_db.py
#	tests/unit/test_backfill.py
#	tokenjam/core/backfill.py
#	tokenjam/core/db.py
#	tokenjam/core/models.py
#	tokenjam/otel/semconv.py
anilmurty added a commit that referenced this pull request Jul 2, 2026
feat: session Approach / Map / Timeline (5/5 from #306, stacked on #367)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants