Skip to content

observability: per-turn telemetry for harness sessions#26

Merged
OmGuptaIND merged 1 commit into
mainfrom
OmGuptaIND/loading-stops-debug
May 4, 2026
Merged

observability: per-turn telemetry for harness sessions#26
OmGuptaIND merged 1 commit into
mainfrom
OmGuptaIND/loading-stops-debug

Conversation

@OmGuptaIND
Copy link
Copy Markdown
Contributor

Why

A "loading then stops" report on a codex turn exposed two diagnostic gaps:

  • The 107 s gap between the last shell tool call and the model interruption looked identical in logs to a real stall — there was no signal that reasoning deltas were flowing.
  • A subsequent turn ran python3 -m http.server over real index.html / styles.css files, but no apply_patch operation showed up anywhere in the log. File writes were invisible.

Both gaps are generic to every harness we run, not codex-specific.

What

Adds packages/agent-core/src/harness/turn-telemetry.ts — a small provider-agnostic helper used by both CodexHarnessSession and HarnessSession (claude-code). It owns:

  • Per-turn counters derived from SessionEvent types: thinking → reasoningChars, text → messageChars, tool_call → toolCalls, artifact (artifactType: 'file') → fileChanges.
  • Idle watchdog (5 s tick, 30 s threshold, debounced): warns with running counters when the event stream goes silent — distinguishes "model is reasoning" from "CLI/network is stuck".
  • turn started / turn completed log lines, with the summary attributing wall-clock time to what the model actually did.

Codex keeps its protocol-specific lifecycle logs (codex reasoning started/completed with per-item char counts, codex file_change with paths/kinds) inline since they need the raw item/* shapes the generic SessionEvent stream doesn't expose.

What this changes about debugging

Replaying the original incident, the turn completed log now answers "what did this turn do?" at a glance:

  • Turn 1 (interrupted): reasoningChars: ~5k, messageChars: 1261, fileChanges: 0, toolCalls: 5 — planned, didn't build.
  • Turn 2 (the one that worked): … fileChanges: 4, toolCalls: 7 — built and served the site.
  • Turn 3 (the failed follow-up): messageChars: 542, fileChanges: 0, toolCalls: 0 — talked, did nothing.

Previously all three looked similar in logs.

Test plan

  • tsc --noEmit passes (packages/agent-core)
  • biome check passes on changed files
  • Send a multi-turn message in codex harness; confirm turn started / codex reasoning started / codex file_change / turn completed appear with sensible counters
  • Send a message in claude-code harness; confirm turn started / turn completed appear
  • Run a long-thinking turn and confirm turn idle warns ~30 s in with non-zero reasoningChars
  • Cancel a turn mid-stream and confirm only one turn completed line lands (no duplicate from the finally-block safety net)

🤖 Generated with Claude Code

A "loading then stops" report exposed two gaps in harness logs: long
silent stretches were indistinguishable from real stalls (no signal
during reasoning), and `apply_patch` file writes never appeared in the
log at all. Both made post-mortems on slow/hung turns guesswork.

Adds a TurnTelemetry helper used by both CodexHarnessSession and
HarnessSession (claude-code). Counters (reasoningChars, messageChars,
toolCalls, fileChanges) are derived from the SessionEvent stream so the
same instance fits any harness without leaking provider details. An
idle watchdog warns at >30s of silence with the running counters,
distinguishing "model is reasoning" (counters keep growing) from
"CLI/network is stuck" (counters frozen). A `turn completed` line
attributes wall-clock time to what the model actually did, making
"talked but did nothing" runs visible at a glance.

Codex keeps its protocol-specific lifecycle logs (`codex reasoning
started/completed` with item char counts, `codex file_change` with
paths/kinds) inline since they need raw `item/*` shapes that the
generic SessionEvent stream doesn't expose.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@OmGuptaIND OmGuptaIND merged commit e80a107 into main May 4, 2026
OmGuptaIND added a commit that referenced this pull request May 5, 2026
### Fixes
- docx rendering and fileupload flow
- issues

### Other
- Fix universal upload progress (#31)
- Polish markdown table typography
- Structure file artifacts (#29)
- fix(routines): add missing .conv-back styles so back buttons render inline (#27)
- fix(webhooks): unify session-options factory across desktop/telegram/slack (#28)
- observability: add per-turn telemetry to harness sessions (#26)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant