Patch release. Investigated as a Gemini bug, shipped as the cross-provider normalization the toolkit was missing.
The bug
A chat to gemini-3.1-pro-preview-customtools with MaxTokens=1024 returned OutputTokens=36, StopReason=MAX_TOKENS. Confusing because 36 is nowhere near 1024 — the model had silently burned ~988 tokens on thinking and the toolkit dropped them. Worse, the bug wasn't Gemini-specific: every provider's wire output_tokens had a different meaning, so Usage.OutputTokens was silently inconsistent across providers.
Fix (ADR-0016)
Two new fields on llms.Usage:
ReasoningTokens int // separately-metered thinking; ADDITIVE to OutputTokens
BillableOutputTokens int // what the provider charges as \"output\"OutputTokens is redefined as visible-only across every adapter. Budget math holds: a length truncation fires when Input + Output + Reasoning approaches the cap. Cost math: read BillableOutputTokens (one field, one semantic, no per-provider branching).
| Provider | OutputTokens | ReasoningTokens | BillableOutputTokens |
|---|---|---|---|
| Gemini | CandidatesTokenCount |
ThoughtsTokenCount |
Candidates + Thoughts |
| OpenAI Responses | wire − OutputTokensDetails.ReasoningTokens |
OutputTokensDetails.ReasoningTokens |
wire.OutputTokens |
| OpenAI Chat Completions (vLLM) | same subtraction via CompletionTokensDetails |
same | wire.CompletionTokens |
| Anthropic | wire.OutputTokens (includes thinking when on; not separable) |
0 (SDK has no field) | mirrors OutputTokens |
| Ollama | eval_count |
0 | mirrors OutputTokens |
Breaking semantic for OpenAI o-series
Pre-v0.7.1, Usage.OutputTokens was the OpenAI wire output_tokens, which for o-series silently included reasoning. Now it's visible-only. Migration: code reading OutputTokens as the billing-relevant number should read BillableOutputTokens instead (carries the exact same value the old field did). Non-reasoning OpenAI calls are unaffected — reasoning_tokens=0 makes the subtraction a no-op.
Pre-1.0 is the right time to make this consistent; carrying the inconsistency to v1.0 would have been worse.
Demo bonus (examples/chat)
The bubble footer surfaces the breakdown so a max_tokens stop on a small visible output is self-explanatory:
gemini-3.1-pro-preview-customtools · max tokens · 12 in / 102 out / 918 reasoning · 1020 billable · 10875 ms
Non-reasoning models keep the terse <in> in / <out> out form (billable would just duplicate out).
Compatibility
Additive new fields. Existing fields keep their names. The only semantic shift is OpenAI o-series OutputTokens. No MAESTRO_DIVERGENCES.md row — Maestro never surfaced either field; nothing to diverge from.
Pre-1.0; v0.x minor versions may break.
🤖 Generated with Claude Code