Skip to content

feat: per-generation token telemetry#336

Merged
anandgupta42 merged 1 commit intomainfrom
worktree-tokens-telemetry
Mar 20, 2026
Merged

feat: per-generation token telemetry#336
anandgupta42 merged 1 commit intomainfrom
worktree-tokens-telemetry

Conversation

@suryaiyer95
Copy link
Contributor

@suryaiyer95 suryaiyer95 commented Mar 20, 2026

Summary

  • Fire generation telemetry events on every LLM step in processor.ts — capturing input, output, reasoning, cache_read, cache_write, cost, and duration_ms per step to Azure App Insights. The generation event type existed in the type union but was never emitted before this PR — zero per-generation data was reaching telemetry.
  • Fix output token accumulation across multi-step messages: previously assistantMessage.tokens was overwritten each step, losing all prior output tokens. Now correctly accumulates output and reasoning while keeping last-step input/cache values.
  • Fix context window used in acp/agent.ts to include cache.write tokens. Due to applyCaching(), the user's question is tagged cache_control:ephemeral and reported as cache_creation_input_tokens — making it invisible to the prior used = input + cache.read formula. This is also an upstream opencode bug.
  • 11-test suite in test/altimate/token-telemetry.test.ts covering Anthropic prompt caching semantics, generation event payload, multi-step accumulation, and the context window fix.

Verified end-to-end

Ran 2 live sessions with the built binary and queried App Insights — generation events confirmed landing with correct 5-bucket token breakdown:

step finish_reason input output cache_read cache_write
1 stop 2 6 0 34,600
1 tool-calls 2 84 0 34,601
2 stop 1 471 34,601 273

Test plan

  • bun test packages/opencode/test/altimate/token-telemetry.test.ts passes
  • Run a session and verify generation events appear in App Insights with non-zero tokens_cache_write
  • Multi-turn session: confirm tokens_cache_read on step 2 matches tokens_cache_write from step 1

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Corrected token usage accounting to include cache write tokens in usage calculations.
  • New Features

    • Added per-step generation telemetry tracking with detailed token breakdown, including cache metrics and step duration.
  • Tests

    • Added comprehensive test coverage for token accounting, prompt caching behavior, tiered pricing, and generation event telemetry.

Copilot AI review requested due to automatic review settings March 20, 2026 20:01
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review.

@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

The pull request updates token accounting and telemetry tracking in the AI agent system. Changes include modifying cache token calculations to include write tokens, adding per-step generation telemetry with token breakdown tracking, and introducing comprehensive test coverage for token accounting validation and telemetry event emission.

Changes

Cohort / File(s) Summary
Token Accounting Updates
packages/opencode/src/acp/agent.ts, packages/opencode/src/session/processor.ts
Modified used token calculation in agent context accounting to include cache write tokens. Enhanced processor with per-step telemetry tracking: captures step start time, computes duration, and emits generation events with token breakdown (input, output, reasoning, cache_read, cache_write) and cost metrics.
Token Telemetry Tests
packages/opencode/test/altimate/token-telemetry.test.ts
New comprehensive test suite validating token accounting behavior with mocked Anthropic and non-Anthropic models, prompt caching scenarios, cache read/write pricing, tiered pricing thresholds, NaN/Infinity guards, telemetry event payload structure, multi-step message token accumulation, and context-window usage calculations including cache write tokens.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Cache writes now counted with care,
Token flow tracked everywhere,
Step by step, telemetry bright,
Generation events shining light,
Tests ensure math is right! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description covers all required sections: summary explains the changes and rationale, test plan details verification steps, and checklist items are addressed with testing confirmations.
Title check ✅ Passed The PR title 'feat: per-generation token telemetry' accurately reflects the main feature addition of emitting generation telemetry events on every LLM step.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-tokens-telemetry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use oxc to improve the quality of JavaScript and TypeScript code reviews.

Add a configuration file to your project to customize how CodeRabbit runs oxc.

- Emit `generation` telemetry event on every LLM step-finish with model_id,
  provider_id, agent, finish_reason, cost, duration_ms, and token breakdown
- Token fields are flat to comply with Azure App Insights custom measurements
  schema: `tokens_input`, `tokens_output`, and optionally `tokens_reasoning`,
  `tokens_cache_read`, `tokens_cache_write`
- Optional token fields are only included when the provider actually returns
  them — reasoning only for reasoning models, cache fields only when active
- Remove unused `TokensPayload` type and special-case serializer handler
- Step duration tracked from `start-step` to `finish-step` events
- Update telemetry.md with accurate generation event field description
- Update existing tests for flat token field shape

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@suryaiyer95 suryaiyer95 force-pushed the worktree-tokens-telemetry branch from 5dd52cf to 34d6047 Compare March 20, 2026 20:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds per-generation telemetry emission to the session processor and fixes token accounting issues related to multi-step generations and prompt caching (including context window usage when cache writes are involved).

Changes:

  • Emit generation telemetry events on every finish-step with per-step token buckets, cost, and duration.
  • Fix assistant message token tracking to accumulate output/reasoning across multi-step generations.
  • Update ACP context window used calculation to include cache.write tokens; add a new Bun test suite covering caching semantics and telemetry payload expectations.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

File Description
packages/opencode/test/altimate/token-telemetry.test.ts Adds a new test suite validating prompt-caching token semantics, generation telemetry payload shape, and context window calculation expectations.
packages/opencode/src/session/processor.ts Accumulates output/reasoning tokens across steps and emits generation telemetry events per step with token buckets + duration.
packages/opencode/src/acp/agent.ts Fixes context window usage (used) to include cache.write tokens.
packages/drivers/src/sqlserver.ts Removes a now-unneeded TypeScript suppression comment on the dynamic mssql import.
Comments suppressed due to low confidence (1)

packages/opencode/src/session/processor.ts:297

  • The new behavior here (emitting a generation telemetry event per finish-step, and accumulating assistantMessage.tokens.output/reasoning across steps) isn't exercised by tests that execute this finish-step handler. The added tests call Telemetry.track directly, which validates the event shape but not that processor.ts actually emits it or that the accumulation logic works end-to-end. Consider extending the existing packages/opencode/test/session/processor.test.ts (which already mirrors processor telemetry paths) to cover this new generation event emission and token accumulation semantics.
                    type: "step-start",
                  })
                  break

                case "finish-step":
                  const usage = Session.getUsage({
                    model: input.model,
                    usage: value.usage,
                    metadata: value.providerMetadata,
                  })
                  input.assistantMessage.finish = value.finishReason
                  input.assistantMessage.cost += usage.cost
                  input.assistantMessage.tokens = usage.tokens
                  // altimate_change start — emit per-generation telemetry with token breakdown
                  // Optional fields are only included when the provider actually returns them.
                  Telemetry.track({
                    type: "generation",
                    timestamp: Date.now(),
                    session_id: input.sessionID,
                    message_id: input.assistantMessage.id,
                    model_id: input.model.id,
                    provider_id: input.model.providerID,
                    agent: input.assistantMessage.agent,
                    finish_reason: value.finishReason ?? "unknown",
                    cost: usage.cost,
                    duration_ms: Date.now() - stepStartTime,
                    tokens_input: usage.tokens.input,
                    tokens_output: usage.tokens.output,
                    ...(value.usage.reasoningTokens !== undefined && { tokens_reasoning: usage.tokens.reasoning }),
                    ...(value.usage.cachedInputTokens !== undefined && { tokens_cache_read: usage.tokens.cache.read }),
                    ...(usage.tokens.cache.write > 0 && { tokens_cache_write: usage.tokens.cache.write }),
                  })
                  // altimate_change end
                  await Session.updatePart({
                    id: PartID.ascending(),
                    reason: value.finishReason,
                    snapshot: await Snapshot.track(),
                    messageID: input.assistantMessage.id,
                    sessionID: input.assistantMessage.sessionID,
                    type: "step-finish",
                    tokens: usage.tokens,
                    cost: usage.cost,
                  })
                  await Session.updateMessage(input.assistantMessage)
                  if (snapshot) {
                    const patch = await Snapshot.patch(snapshot)
                    if (patch.files.length) {
                      await Session.updatePart({

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/session/processor.ts`:
- Around line 258-264: The accumulation of token counts is incorrect because
input.assistantMessage.tokens is spread from usage.tokens so tokens.total stays
the per-step value; update the assignment in the block that modifies
input.assistantMessage.tokens (where usage.tokens is accessed) to compute total
from the accumulated fields (e.g., total = input.assistantMessage.tokens.output
+ input.assistantMessage.tokens.reasoning + any other token categories you
maintain) instead of copying usage.tokens.total, so after each step
assistantMessage.tokens.total reflects the new cumulative output and reasoning
sums.

In `@packages/opencode/test/altimate/token-telemetry.test.ts`:
- Around line 216-388: Tests currently assert behavior by calling
Telemetry.track(), Session.getUsage(), and inlining arithmetic instead of
exercising the real production flows; update the tests to drive the actual code
paths: create a SessionProcessor via SessionProcessor.create and call
SessionProcessor.process (stub LLM.stream to emit the desired steps) so that
Telemetry.track is invoked by the real processor rather than directly, assert
that assistantMessage.tokens is mutated across multi-step messages (verifying
accumulation from the processor/ACP update path), and for context-window
behavior call the ACP usage-update function in acp/agent (or run the processor
path that applies that update) to verify cache.write is included in the computed
used/context window rather than reimplementing the math inline; keep
Telemetry.track spying to capture emitted events from the processor and use the
real Session.getUsage outputs as produced by the processor to validate cost and
token totals.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0094cc30-a95a-4183-b492-dc2ebb7e95bd

📥 Commits

Reviewing files that changed from the base of the PR and between df24e73 and 5dd52cf.

📒 Files selected for processing (3)
  • packages/opencode/src/acp/agent.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/test/altimate/token-telemetry.test.ts

@suryaiyer95 suryaiyer95 changed the title feat: per-generation token telemetry + fix token tracking bugs feat: per-generation token telemetry Mar 20, 2026
@anandgupta42 anandgupta42 merged commit bd56988 into main Mar 20, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants