feat: per-generation token telemetry#336
Conversation
📝 WalkthroughWalkthroughThe pull request updates token accounting and telemetry tracking in the AI agent system. Changes include modifying cache token calculations to include write tokens, adding per-step generation telemetry with token breakdown tracking, and introducing comprehensive test coverage for token accounting validation and telemetry event emission. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use oxc to improve the quality of JavaScript and TypeScript code reviews.Add a configuration file to your project to customize how CodeRabbit runs oxc. |
- Emit `generation` telemetry event on every LLM step-finish with model_id, provider_id, agent, finish_reason, cost, duration_ms, and token breakdown - Token fields are flat to comply with Azure App Insights custom measurements schema: `tokens_input`, `tokens_output`, and optionally `tokens_reasoning`, `tokens_cache_read`, `tokens_cache_write` - Optional token fields are only included when the provider actually returns them — reasoning only for reasoning models, cache fields only when active - Remove unused `TokensPayload` type and special-case serializer handler - Step duration tracked from `start-step` to `finish-step` events - Update telemetry.md with accurate generation event field description - Update existing tests for flat token field shape Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5dd52cf to
34d6047
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds per-generation telemetry emission to the session processor and fixes token accounting issues related to multi-step generations and prompt caching (including context window usage when cache writes are involved).
Changes:
- Emit
generationtelemetry events on everyfinish-stepwith per-step token buckets, cost, and duration. - Fix assistant message token tracking to accumulate output/reasoning across multi-step generations.
- Update ACP context window
usedcalculation to includecache.writetokens; add a new Bun test suite covering caching semantics and telemetry payload expectations.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/opencode/test/altimate/token-telemetry.test.ts | Adds a new test suite validating prompt-caching token semantics, generation telemetry payload shape, and context window calculation expectations. |
| packages/opencode/src/session/processor.ts | Accumulates output/reasoning tokens across steps and emits generation telemetry events per step with token buckets + duration. |
| packages/opencode/src/acp/agent.ts | Fixes context window usage (used) to include cache.write tokens. |
| packages/drivers/src/sqlserver.ts | Removes a now-unneeded TypeScript suppression comment on the dynamic mssql import. |
Comments suppressed due to low confidence (1)
packages/opencode/src/session/processor.ts:297
- The new behavior here (emitting a
generationtelemetry event perfinish-step, and accumulatingassistantMessage.tokens.output/reasoningacross steps) isn't exercised by tests that execute thisfinish-stephandler. The added tests callTelemetry.trackdirectly, which validates the event shape but not thatprocessor.tsactually emits it or that the accumulation logic works end-to-end. Consider extending the existingpackages/opencode/test/session/processor.test.ts(which already mirrors processor telemetry paths) to cover this newgenerationevent emission and token accumulation semantics.
type: "step-start",
})
break
case "finish-step":
const usage = Session.getUsage({
model: input.model,
usage: value.usage,
metadata: value.providerMetadata,
})
input.assistantMessage.finish = value.finishReason
input.assistantMessage.cost += usage.cost
input.assistantMessage.tokens = usage.tokens
// altimate_change start — emit per-generation telemetry with token breakdown
// Optional fields are only included when the provider actually returns them.
Telemetry.track({
type: "generation",
timestamp: Date.now(),
session_id: input.sessionID,
message_id: input.assistantMessage.id,
model_id: input.model.id,
provider_id: input.model.providerID,
agent: input.assistantMessage.agent,
finish_reason: value.finishReason ?? "unknown",
cost: usage.cost,
duration_ms: Date.now() - stepStartTime,
tokens_input: usage.tokens.input,
tokens_output: usage.tokens.output,
...(value.usage.reasoningTokens !== undefined && { tokens_reasoning: usage.tokens.reasoning }),
...(value.usage.cachedInputTokens !== undefined && { tokens_cache_read: usage.tokens.cache.read }),
...(usage.tokens.cache.write > 0 && { tokens_cache_write: usage.tokens.cache.write }),
})
// altimate_change end
await Session.updatePart({
id: PartID.ascending(),
reason: value.finishReason,
snapshot: await Snapshot.track(),
messageID: input.assistantMessage.id,
sessionID: input.assistantMessage.sessionID,
type: "step-finish",
tokens: usage.tokens,
cost: usage.cost,
})
await Session.updateMessage(input.assistantMessage)
if (snapshot) {
const patch = await Snapshot.patch(snapshot)
if (patch.files.length) {
await Session.updatePart({
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/session/processor.ts`:
- Around line 258-264: The accumulation of token counts is incorrect because
input.assistantMessage.tokens is spread from usage.tokens so tokens.total stays
the per-step value; update the assignment in the block that modifies
input.assistantMessage.tokens (where usage.tokens is accessed) to compute total
from the accumulated fields (e.g., total = input.assistantMessage.tokens.output
+ input.assistantMessage.tokens.reasoning + any other token categories you
maintain) instead of copying usage.tokens.total, so after each step
assistantMessage.tokens.total reflects the new cumulative output and reasoning
sums.
In `@packages/opencode/test/altimate/token-telemetry.test.ts`:
- Around line 216-388: Tests currently assert behavior by calling
Telemetry.track(), Session.getUsage(), and inlining arithmetic instead of
exercising the real production flows; update the tests to drive the actual code
paths: create a SessionProcessor via SessionProcessor.create and call
SessionProcessor.process (stub LLM.stream to emit the desired steps) so that
Telemetry.track is invoked by the real processor rather than directly, assert
that assistantMessage.tokens is mutated across multi-step messages (verifying
accumulation from the processor/ACP update path), and for context-window
behavior call the ACP usage-update function in acp/agent (or run the processor
path that applies that update) to verify cache.write is included in the computed
used/context window rather than reimplementing the math inline; keep
Telemetry.track spying to capture emitted events from the processor and use the
real Session.getUsage outputs as produced by the processor to validate cost and
token totals.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0094cc30-a95a-4183-b492-dc2ebb7e95bd
📒 Files selected for processing (3)
packages/opencode/src/acp/agent.tspackages/opencode/src/session/processor.tspackages/opencode/test/altimate/token-telemetry.test.ts
Summary
generationtelemetry events on every LLM step inprocessor.ts— capturinginput,output,reasoning,cache_read,cache_write,cost, andduration_msper step to Azure App Insights. Thegenerationevent type existed in the type union but was never emitted before this PR — zero per-generation data was reaching telemetry.assistantMessage.tokenswas overwritten each step, losing all prior output tokens. Now correctly accumulatesoutputandreasoningwhile keeping last-stepinput/cachevalues.usedinacp/agent.tsto includecache.writetokens. Due toapplyCaching(), the user's question is taggedcache_control:ephemeraland reported ascache_creation_input_tokens— making it invisible to the priorused = input + cache.readformula. This is also an upstream opencode bug.test/altimate/token-telemetry.test.tscovering Anthropic prompt caching semantics, generation event payload, multi-step accumulation, and the context window fix.Verified end-to-end
Ran 2 live sessions with the built binary and queried App Insights —
generationevents confirmed landing with correct 5-bucket token breakdown:Test plan
bun test packages/opencode/test/altimate/token-telemetry.test.tspassesgenerationevents appear in App Insights with non-zerotokens_cache_writetokens_cache_readon step 2 matchestokens_cache_writefrom step 1🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
New Features
Tests