fix(zen): stop double-counting reasoning_tokens in oa-compat usage#24441
Open
tiffanychum wants to merge 1 commit intoanomalyco:devfrom
Open
fix(zen): stop double-counting reasoning_tokens in oa-compat usage#24441tiffanychum wants to merge 1 commit intoanomalyco:devfrom
tiffanychum wants to merge 1 commit intoanomalyco:devfrom
Conversation
…nomalyco#24268) The OpenAI chat-completions usage spec says `completion_tokens` already includes `completion_tokens_details.reasoning_tokens`. Zen's downstream `calculateCost` bills `outputCost + reasoningCost` separately, so when the oa-compat normalizer reported `outputTokens = completion_tokens` and `reasoningTokens = reasoning_tokens`, reasoning was billed twice. Mirror the OpenAI Responses helper (openai.ts) and subtract reasoning from completion before returning. Clamp at 0 because some providers (e.g. Moonshot Kimi K2.6) report `reasoning_tokens > completion_tokens`. Adds unit tests for the reporter's exact payloads: - Kimi K2.6 "Hi": prompt 22 / completion 77 / reasoning 78 -> output 0 - Real session: prompt N / completion 1226 / reasoning 790 -> output 436 - No-reasoning case: outputTokens unchanged - Parity with `openaiHelper.normalizeUsage` for the same logical usage
Contributor
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #24268
Re-opens #24367 (auto-closed by
needs:compliancefor missing template sections); also addresses follow-up review feedback from the reporter.Type of change
What does this PR do?
Stops Zen from double-counting reasoning tokens for OpenAI-compatible providers (Moonshot, Kimi, etc.) so the user-facing usage panel and cost calculation match what the upstream API actually billed.
Per the OpenAI chat-completions usage spec,
completion_tokensalready includescompletion_tokens_details.reasoning_tokens. Zen'soaCompatHelper.normalizeUsage(inpackages/console/app/src/routes/zen/util/provider/openai-compatible.ts) was reportingoutputTokens = completion_tokensandreasoningTokens = reasoning_tokenswithout subtracting, so when downstreamcalculateCostbillsoutputCost + reasoningCostseparately at the samecost.outputrate, reasoning was billed twice.The reporter's evidence:
prompt_tokens=22, completion_tokens=77, completion_tokens_details.reasoning_tokens=78for a single "Hi"output: 436, reasoning: 790for a real session (correctly excludes reasoning from output)output: 1226, reasoning: 790, total: 2016— i.e.completion_tokens (1226) + reasoning_tokens (790), double-countingThe fix mirrors
openaiHelper.normalizeUsagefrom the OpenAI Responses helper (openai.ts) and subtractsreasoning_tokensfromcompletion_tokensbefore returning. It then enforces the invariantoutputTokens + reasoningTokens === completion_tokens, which is what the upstream API actually charges against. To make that hold even under the OA-compat provider quirk wherereasoning_tokens > completion_tokens(e.g. Moonshot Kimi K2.6 returningreasoning=78, completion=77), the PR clampsreasoningTokensdown tocompletion_tokens. So for the reporter's "Hi" case:outputTokens=77, reasoningTokens=78→ bills 155 (double-counts; also exceeds the 77 the upstream API charged)outputTokens=0, reasoningTokens=78→ bills 78 (still 1 unit over what upstream charged)outputTokens=0, reasoningTokens=77→ bills 77 (matches upstream exactly)Clamping reasoning (not just flooring output at 0) is the refinement raised by the reporter @ceshine in review of the previous attempt at this fix in #24367, and it keeps the invariant the upstream API charges against. Once reasoning is clamped, the
Math.max(0, …)on output is no longer needed sincereasoning <= completionguaranteesoutput >= 0.I deliberately did not also touch
openaiHelper.normalizeUsage(which already subtracts but does not clamp), since the OpenAI Responses API doesn't exhibit thereasoning > completionquirk and changing it is out of scope.How did you verify your code works?
Added 4 unit tests in
packages/console/app/test/zen-usage.test.tsthat lock in the wire-level invariant; all 4 fail against the old code and pass after this fix:completion=1226, reasoning=790→outputTokens=436, reasoningTokens=790, sum=1226completion=77, reasoning=78→outputTokens=0, reasoningTokens=77, sum=77(clamped, matches upstream)outputTokens=77left untouched,reasoningTokensundefinedopenaiHelper.normalizeUsagefor the same logical usageThen ran the full console/app suite and full repo typecheck:
bun testinpackages/console/app/— 7/7 passbun turbo typecheckfrom repo root — all 13 tasks successful, no new lint errors on the modified fileScreenshots / recordings
N/A — backend billing-math change, no UI surface.
Checklist