Release v0.7.1 — reasoning + billable output normalization (ADR-0016) · SnapdragonPartners/maestro-llms

Patch release. Investigated as a Gemini bug, shipped as the cross-provider normalization the toolkit was missing.

The bug

A chat to gemini-3.1-pro-preview-customtools with MaxTokens=1024 returned OutputTokens=36, StopReason=MAX_TOKENS. Confusing because 36 is nowhere near 1024 — the model had silently burned ~988 tokens on thinking and the toolkit dropped them. Worse, the bug wasn't Gemini-specific: every provider's wire output_tokens had a different meaning, so Usage.OutputTokens was silently inconsistent across providers.

Fix (ADR-0016)

Two new fields on llms.Usage:

ReasoningTokens      int  // separately-metered thinking; ADDITIVE to OutputTokens
BillableOutputTokens int  // what the provider charges as \"output\"

OutputTokens is redefined as visible-only across every adapter. Budget math holds: a length truncation fires when Input + Output + Reasoning approaches the cap. Cost math: read BillableOutputTokens (one field, one semantic, no per-provider branching).

Provider	OutputTokens	ReasoningTokens	BillableOutputTokens
Gemini	`CandidatesTokenCount`	`ThoughtsTokenCount`	`Candidates + Thoughts`
OpenAI Responses	`wire − OutputTokensDetails.ReasoningTokens`	`OutputTokensDetails.ReasoningTokens`	`wire.OutputTokens`
OpenAI Chat Completions (vLLM)	same subtraction via `CompletionTokensDetails`	same	`wire.CompletionTokens`
Anthropic	`wire.OutputTokens` (includes thinking when on; not separable)	0 (SDK has no field)	mirrors OutputTokens
Ollama	`eval_count`	0	mirrors OutputTokens

Breaking semantic for OpenAI o-series

Pre-v0.7.1, Usage.OutputTokens was the OpenAI wire output_tokens, which for o-series silently included reasoning. Now it's visible-only. Migration: code reading OutputTokens as the billing-relevant number should read BillableOutputTokens instead (carries the exact same value the old field did). Non-reasoning OpenAI calls are unaffected — reasoning_tokens=0 makes the subtraction a no-op.

Pre-1.0 is the right time to make this consistent; carrying the inconsistency to v1.0 would have been worse.

Demo bonus (`examples/chat`)

The bubble footer surfaces the breakdown so a max_tokens stop on a small visible output is self-explanatory:

gemini-3.1-pro-preview-customtools · max tokens · 12 in / 102 out / 918 reasoning · 1020 billable · 10875 ms

Non-reasoning models keep the terse <in> in / <out> out form (billable would just duplicate out).

Compatibility

Additive new fields. Existing fields keep their names. The only semantic shift is OpenAI o-series OutputTokens. No MAESTRO_DIVERGENCES.md row — Maestro never surfaced either field; nothing to diverge from.

Pre-1.0; v0.x minor versions may break.

🤖 Generated with Claude Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.1 — reasoning + billable output normalization (ADR-0016)

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

The bug

Fix (ADR-0016)

Breaking semantic for OpenAI o-series

Demo bonus (`examples/chat`)

Compatibility

Uh oh!

v0.7.1 — reasoning + billable output normalization (ADR-0016)

The bug

Fix (ADR-0016)

Breaking semantic for OpenAI o-series

Demo bonus (examples/chat)

Compatibility

Uh oh!

Demo bonus (`examples/chat`)