fix: Claude Code session status + lifecycle (1/5 from #306)#367
Merged
Conversation
Root fix: Claude Code / Codex export OTLP *logs*, so each user prompt becomes a zero-duration invoke_agent span (end_time == start_time) that ingest mistook for a session *completion* — every live session force-completed on its first prompt (dashboard showed active work as "completed", 0 duration, and tripped drift/alert false-positives). Completion is now gated on real duration (end_time > start_time), and a non-end span re-activates a mistakenly-completed session. On top of the fix, the Status page gains an honest session model: - Lifecycle tiers active/idle/stale/closed (SessionRecord.status_at), with a transcript-mtime rescue for live-but-span-quiet sessions. - Explicit close: POST /api/v1/sessions/close + `tj session-end` (preserves ended_at / "Last seen"; only stamps status). - Project grouping via OTel service.namespace + a server-side agent->project fallback (groups running sessions without a restart). - Per-terminal display labels via service.instance.id + manual overrides (right-click rename) and `tj otel-resource-attrs`. - Split-zones Status view: coding sessions (cards + archived) vs SDK services (per-minute cost/err sparklines), 0-signal zombies filtered from the archive. Migrations 8-11: service_namespace, service_instance_id, ended_at repair, session_labels. Carved from Metabuilder-Labs#306 (fix/claude-code-session-status) as the first of five focused PRs. Co-Authored-By: Claude <noreply@anthropic.com>
This was referenced Jul 2, 2026
Contributor
|
Reviewed the full lifecycle fix — no blockers from me. Root diagnosis is right: the logs path emits zero-duration |
anilmurty
added a commit
to anshss/tokenjam
that referenced
this pull request
Jul 2, 2026
…uilder-Labs#368) # Conflicts: # tokenjam/cli/main.py
anilmurty
added a commit
to anshss/tokenjam
that referenced
this pull request
Jul 2, 2026
…uilder-Labs#368/Metabuilder-Labs#369; append migration 12 after 8-11) # Conflicts: # tests/integration/test_db.py # tokenjam/core/alerts.py # tokenjam/core/db.py
anilmurty
added a commit
to anshss/tokenjam
that referenced
this pull request
Jul 2, 2026
…uilder-Labs#368/Metabuilder-Labs#369/Metabuilder-Labs#370; append migrations 13-15 after 12) # Conflicts: # tests/integration/test_db.py # tests/unit/test_backfill.py # tokenjam/core/backfill.py # tokenjam/core/db.py # tokenjam/core/models.py # tokenjam/otel/semconv.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Carved from #306 as PR 1 of 5 — a focused split of the session-status dashboard branch. This one delivers the root bug fix + the honest session lifecycle model it enables.
Root fix
Claude Code / Codex export OTLP logs, not traces — so each
user_promptturn produces a zero-durationinvoke_agentspan (end_time == start_time).ingestmistook that for a session completion, so every live session force-completed on its first prompt (dashboard showed active work as "completed" with 0 duration, and it tripped drift/alert false-positives). Completion is now gated on real duration (end_time > start_time); a non-end span re-activates a mistakenly-completed session.Lifecycle model (built on the fix)
SessionRecord.status_at()(5 min stale, 4h idle default), with a transcript-mtime rescue for live-but-span-quiet sessions.POST /api/v1/sessions/close+tj session-end(preservesended_at/ "Last seen"; only stamps status).service.namespace+ a server-sideagent→projectfallback (groups already-running sessions without a restart).service.instance.id+ manual overrides (right-click rename) +tj otel-resource-attrs.Migrations
8service_namespace ·9service_instance_id ·10ended_at repair ·11session_labels (renumbered contiguous from main's current max of 7).Notes for the series
capture_session_methodsnapshot on close, thesession_token_cost_rollup(Polish web UI for demo-readiness #18) used by the zombie filter, and thecore.transcriptmtime-rescue are all deferred — PR1's close endpoint just closes; the zombie filter uses stored token/cost. Polish web UI for demo-readiness #18 lands in the cost PR; method-capture in the Approach/Map/Timeline PR.uv.lock(introduced on the branch, untracked onmain) is intentionally not included here.Tests
Full suite green: 1400 passed, 1 skipped (
pytest tests/unit tests/synthetic tests/agents tests/integration).ruff checkclean;index.htmlmodule validated withnode --check.🤖 Generated with Claude Code