-
Notifications
You must be signed in to change notification settings - Fork 28
π€ fix: accurate cost estimation for multi-step tool usage #831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When using mux-gateway (e.g., mux-gateway:openai/gpt-5.1), the OpenAI provider detection failed because the model string doesn't start with 'openai:'. This caused cached tokens to be double-counted in cost calculations, resulting in ~2.5x overestimation. Fix: normalize gateway model strings before provider detection so 'mux-gateway:openai/gpt-5.1' correctly triggers OpenAI-specific handling (subtract cachedInputTokens from inputTokens).
Google/Gemini, like OpenAI, reports inputTokens INCLUSIVE of cachedInputTokens. Extend the subtraction logic to also handle Google models to avoid double-counting cached tokens.
Member
Author
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep them coming! βΉοΈ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with π. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
github-merge-queue bot
pushed a commit
that referenced
this pull request
Dec 2, 2025
## Problem The application was severely underestimating costs for conversations involving tool calls. ### Root Cause The Vercel AI SDK provides two usage metrics: - `streamResult.usage` β Token usage from **the last step only** - `streamResult.totalUsage` β Sum of token usage across **all steps** The application was using `usage` instead of `totalUsage`. For a conversation with 10 tool calls, only ~1/10th of actual consumption was reported. A $5 conversation would display as $0.50. ### The Complicating Factor Two UI elements use token data with different semantic requirements: | Display | Needs | Why | |---------|-------|-----| | **Cost** | Sum of all steps | If model read context 10 times, you paid for 10 reads | | **Context window** | Last step only | Shows "how full is the conversation now" for the next request | Simply switching to `totalUsage` would fix costs but break context display (showing 500% utilization after many tool calls). ### Cache Creation Tokens Anthropic's cache creation tokens (`cacheCreationInputTokens`) are: - Only in provider-specific metadata, not normalized usage - Need to be summed across all steps - Not automatically aggregated by the AI SDK Even with `totalUsage`, cache creation costs were lost unless manually aggregated from each step's provider metadata. ## Solution Track both values with different semantic purposes: **For cost calculation:** - `usage` / `cumulativeUsage` β total across all steps - `providerMetadata` / `cumulativeProviderMetadata` β aggregated cache creation tokens **For context window display:** - `contextUsage` / `lastContextUsage` β last step only - `contextProviderMetadata` β last step only ### Key Changes 1. **Backend** (`streamManager.ts`): Use `totalUsage` for cost, track `lastStepUsage` for context, aggregate provider metadata across steps 2. **Types**: Extended `StreamEndEvent`, `MuxMetadata`, `UsageDeltaEvent` with dual fields 3. **Frontend**: `StreamingMessageAggregator` tracks both cumulative and per-step usage 4. **Store**: `WorkspaceUsageState` provides `usageHistory` (cost) and `lastContextUsage` (context window) 5. **UI**: Components use appropriate field for their purpose ### Also Fixed - **OpenAI cached token double-counting**: Gateway models (`mux-gateway:openai/gpt-5.1`) weren't recognized as OpenAI, causing cached tokens to be counted in both "Cache Read" and "Input". Now normalizes gateway model strings before provider detection. - **Google/Gemini cached token double-counting**: Google, like OpenAI, reports `inputTokens` inclusive of `cachedInputTokens`. Extended the subtraction logic to handle Google models. --- _Generated with `mux`_
5 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The application was severely underestimating costs for conversations involving tool calls.
Root Cause
The Vercel AI SDK provides two usage metrics:
streamResult.usageβ Token usage from the last step onlystreamResult.totalUsageβ Sum of token usage across all stepsThe application was using
usageinstead oftotalUsage. For a conversation with 10 tool calls, only ~1/10th of actual consumption was reported. A $5 conversation would display as $0.50.The Complicating Factor
Two UI elements use token data with different semantic requirements:
Simply switching to
totalUsagewould fix costs but break context display (showing 500% utilization after many tool calls).Cache Creation Tokens
Anthropic's cache creation tokens (
cacheCreationInputTokens) are:Even with
totalUsage, cache creation costs were lost unless manually aggregated from each step's provider metadata.Solution
Track both values with different semantic purposes:
For cost calculation:
usage/cumulativeUsageβ total across all stepsproviderMetadata/cumulativeProviderMetadataβ aggregated cache creation tokensFor context window display:
contextUsage/lastContextUsageβ last step onlycontextProviderMetadataβ last step onlyKey Changes
streamManager.ts): UsetotalUsagefor cost, tracklastStepUsagefor context, aggregate provider metadata across stepsStreamEndEvent,MuxMetadata,UsageDeltaEventwith dual fieldsStreamingMessageAggregatortracks both cumulative and per-step usageWorkspaceUsageStateprovidesusageHistory(cost) andlastContextUsage(context window)Also Fixed
OpenAI cached token double-counting: Gateway models (
mux-gateway:openai/gpt-5.1) weren't recognized as OpenAI, causing cached tokens to be counted in both "Cache Read" and "Input". Now normalizes gateway model strings before provider detection.Google/Gemini cached token double-counting: Google, like OpenAI, reports
inputTokensinclusive ofcachedInputTokens. Extended the subtraction logic to handle Google models.Generated with
mux