🤖 fix: accurate cost estimation for multi-step tool usage #831

ethanndickson · 2025-12-02T02:19:40Z

Problem

The application was severely underestimating costs for conversations involving tool calls.

Root Cause

The Vercel AI SDK provides two usage metrics:

streamResult.usage — Token usage from the last step only
streamResult.totalUsage — Sum of token usage across all steps

The application was using usage instead of totalUsage. For a conversation with 10 tool calls, only ~1/10th of actual consumption was reported. A $5 conversation would display as $0.50.

The Complicating Factor

Two UI elements use token data with different semantic requirements:

Display	Needs	Why
Cost	Sum of all steps	If model read context 10 times, you paid for 10 reads
Context window	Last step only	Shows "how full is the conversation now" for the next request

Simply switching to totalUsage would fix costs but break context display (showing 500% utilization after many tool calls).

Cache Creation Tokens

Anthropic's cache creation tokens (cacheCreationInputTokens) are:

Only in provider-specific metadata, not normalized usage
Need to be summed across all steps
Not automatically aggregated by the AI SDK

Even with totalUsage, cache creation costs were lost unless manually aggregated from each step's provider metadata.

Solution

Track both values with different semantic purposes:

For cost calculation:

usage / cumulativeUsage — total across all steps
providerMetadata / cumulativeProviderMetadata — aggregated cache creation tokens

For context window display:

contextUsage / lastContextUsage — last step only
contextProviderMetadata — last step only

Key Changes

Backend (streamManager.ts): Use totalUsage for cost, track lastStepUsage for context, aggregate provider metadata across steps
Types: Extended StreamEndEvent, MuxMetadata, UsageDeltaEvent with dual fields
Frontend: StreamingMessageAggregator tracks both cumulative and per-step usage
Store: WorkspaceUsageState provides usageHistory (cost) and lastContextUsage (context window)
UI: Components use appropriate field for their purpose

Also Fixed

OpenAI cached token double-counting: Gateway models (mux-gateway:openai/gpt-5.1) weren't recognized as OpenAI, causing cached tokens to be counted in both "Cache Read" and "Input". Now normalizes gateway model strings before provider detection.
Google/Gemini cached token double-counting: Google, like OpenAI, reports inputTokens inclusive of cachedInputTokens. Extended the subtraction logic to handle Google models.

Generated with mux

When using mux-gateway (e.g., mux-gateway:openai/gpt-5.1), the OpenAI provider detection failed because the model string doesn't start with 'openai:'. This caused cached tokens to be double-counted in cost calculations, resulting in ~2.5x overestimation. Fix: normalize gateway model strings before provider detection so 'mux-gateway:openai/gpt-5.1' correctly triggers OpenAI-specific handling (subtract cachedInputTokens from inputTokens).

Google/Gemini, like OpenAI, reports inputTokens INCLUSIVE of cachedInputTokens. Extend the subtraction logic to also handle Google models to avoid double-counting cached tokens.

ethanndickson · 2025-12-02T02:40:00Z

@codex review

chatgpt-codex-connector · 2025-12-02T02:47:58Z

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…handling

## Problem The application was severely underestimating costs for conversations involving tool calls. ### Root Cause The Vercel AI SDK provides two usage metrics: - `streamResult.usage` — Token usage from **the last step only** - `streamResult.totalUsage` — Sum of token usage across **all steps** The application was using `usage` instead of `totalUsage`. For a conversation with 10 tool calls, only ~1/10th of actual consumption was reported. A $5 conversation would display as $0.50. ### The Complicating Factor Two UI elements use token data with different semantic requirements: | Display | Needs | Why | |---------|-------|-----| | **Cost** | Sum of all steps | If model read context 10 times, you paid for 10 reads | | **Context window** | Last step only | Shows "how full is the conversation now" for the next request | Simply switching to `totalUsage` would fix costs but break context display (showing 500% utilization after many tool calls). ### Cache Creation Tokens Anthropic's cache creation tokens (`cacheCreationInputTokens`) are: - Only in provider-specific metadata, not normalized usage - Need to be summed across all steps - Not automatically aggregated by the AI SDK Even with `totalUsage`, cache creation costs were lost unless manually aggregated from each step's provider metadata. ## Solution Track both values with different semantic purposes: **For cost calculation:** - `usage` / `cumulativeUsage` — total across all steps - `providerMetadata` / `cumulativeProviderMetadata` — aggregated cache creation tokens **For context window display:** - `contextUsage` / `lastContextUsage` — last step only - `contextProviderMetadata` — last step only ### Key Changes 1. **Backend** (`streamManager.ts`): Use `totalUsage` for cost, track `lastStepUsage` for context, aggregate provider metadata across steps 2. **Types**: Extended `StreamEndEvent`, `MuxMetadata`, `UsageDeltaEvent` with dual fields 3. **Frontend**: `StreamingMessageAggregator` tracks both cumulative and per-step usage 4. **Store**: `WorkspaceUsageState` provides `usageHistory` (cost) and `lastContextUsage` (context window) 5. **UI**: Components use appropriate field for their purpose ### Also Fixed - **OpenAI cached token double-counting**: Gateway models (`mux-gateway:openai/gpt-5.1`) weren't recognized as OpenAI, causing cached tokens to be counted in both "Cache Read" and "Input". Now normalizes gateway model strings before provider detection. - **Google/Gemini cached token double-counting**: Google, like OpenAI, reports `inputTokens` inclusive of `cachedInputTokens`. Extended the subtraction logic to handle Google models. --- _Generated with `mux`_

ethanndickson changed the title ~~🤖 fix: normalize gateway models for OpenAI cost calculation~~ 🤖 fix: accurate cost estimation for multi-step tool usage Dec 2, 2025

🤖 fix: handle Google/Gemini cached token subtraction

9284cba

Google/Gemini, like OpenAI, reports inputTokens INCLUSIVE of cachedInputTokens. Extend the subtraction logic to also handle Google models to avoid double-counting cached tokens.

🤖 fix: rename test describe block for provider-specific cached token …

b71c179

…handling

ethanndickson enabled auto-merge December 2, 2025 03:11

ethanndickson added this pull request to the merge queue Dec 2, 2025

Merged via the queue into main with commit b3be437 Dec 2, 2025
13 checks passed

ethanndickson deleted the fix-cost-estimation-tool-usage branch December 2, 2025 03:46

ethanndickson mentioned this pull request Dec 5, 2025

Improve session cost & context usage UX #653

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 fix: accurate cost estimation for multi-step tool usage #831

🤖 fix: accurate cost estimation for multi-step tool usage #831

Uh oh!

ethanndickson commented Dec 2, 2025 •

edited

Loading

Uh oh!

ethanndickson commented Dec 2, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🤖 fix: accurate cost estimation for multi-step tool usage #831

🤖 fix: accurate cost estimation for multi-step tool usage #831

Uh oh!

Conversation

ethanndickson commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

The Complicating Factor

Cache Creation Tokens

Solution

Key Changes

Also Fixed

Uh oh!

ethanndickson commented Dec 2, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ethanndickson commented Dec 2, 2025 •

edited

Loading