observability: per-turn telemetry for harness sessions#26
Merged
Conversation
A "loading then stops" report exposed two gaps in harness logs: long silent stretches were indistinguishable from real stalls (no signal during reasoning), and `apply_patch` file writes never appeared in the log at all. Both made post-mortems on slow/hung turns guesswork. Adds a TurnTelemetry helper used by both CodexHarnessSession and HarnessSession (claude-code). Counters (reasoningChars, messageChars, toolCalls, fileChanges) are derived from the SessionEvent stream so the same instance fits any harness without leaking provider details. An idle watchdog warns at >30s of silence with the running counters, distinguishing "model is reasoning" (counters keep growing) from "CLI/network is stuck" (counters frozen). A `turn completed` line attributes wall-clock time to what the model actually did, making "talked but did nothing" runs visible at a glance. Codex keeps its protocol-specific lifecycle logs (`codex reasoning started/completed` with item char counts, `codex file_change` with paths/kinds) inline since they need raw `item/*` shapes that the generic SessionEvent stream doesn't expose. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OmGuptaIND
added a commit
that referenced
this pull request
May 5, 2026
### Fixes - docx rendering and fileupload flow - issues ### Other - Fix universal upload progress (#31) - Polish markdown table typography - Structure file artifacts (#29) - fix(routines): add missing .conv-back styles so back buttons render inline (#27) - fix(webhooks): unify session-options factory across desktop/telegram/slack (#28) - observability: add per-turn telemetry to harness sessions (#26)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A "loading then stops" report on a codex turn exposed two diagnostic gaps:
python3 -m http.serverover realindex.html/styles.cssfiles, but noapply_patchoperation showed up anywhere in the log. File writes were invisible.Both gaps are generic to every harness we run, not codex-specific.
What
Adds
packages/agent-core/src/harness/turn-telemetry.ts— a small provider-agnostic helper used by bothCodexHarnessSessionandHarnessSession(claude-code). It owns:SessionEventtypes:thinking → reasoningChars,text → messageChars,tool_call → toolCalls,artifact (artifactType: 'file') → fileChanges.turn started/turn completedlog lines, with the summary attributing wall-clock time to what the model actually did.Codex keeps its protocol-specific lifecycle logs (
codex reasoning started/completedwith per-item char counts,codex file_changewith paths/kinds) inline since they need the rawitem/*shapes the generic SessionEvent stream doesn't expose.What this changes about debugging
Replaying the original incident, the
turn completedlog now answers "what did this turn do?" at a glance:reasoningChars: ~5k, messageChars: 1261, fileChanges: 0, toolCalls: 5— planned, didn't build.… fileChanges: 4, toolCalls: 7— built and served the site.messageChars: 542, fileChanges: 0, toolCalls: 0— talked, did nothing.Previously all three looked similar in logs.
Test plan
tsc --noEmitpasses (packages/agent-core)biome checkpasses on changed filesturn started/codex reasoning started/codex file_change/turn completedappear with sensible countersturn started/turn completedappearturn idlewarns ~30 s in with non-zeroreasoningCharsturn completedline lands (no duplicate from the finally-block safety net)🤖 Generated with Claude Code