Problem
OpenRouter returns the authoritative per-request USD cost inline in every chat completion response (usage.cost, plus usage.cost_details with upstream_inference_cost and cache_discount) — see https://openrouter.ai/docs/use-cases/usage-accounting. TanStack AI's @tanstack/ai-openrouter adapter currently discards this data, so apps that rely on OpenRouter for prod traffic have no first-class way to observe spend from the adapter's event stream.
Two things make this tricky to solve at the call site:
- The
@openrouter/sdk Zod parser strips the fields. The SDK's response schema doesn't declare cost / cost_details, so by the time a chunk reaches the adapter, the fields are gone. A separate /generation?id=… lookup is possible but costs an extra HTTP round-trip per request.
- Cost can't be reconstructed locally from tokens × price. OpenRouter routes the same model id to different upstream providers (primary, fallback, BYOK), each with different prices, and applies cache discounts and BYOK upstream costs the SDK can't see. A static price table would silently drift and produce wrong numbers.
Proposal
Capture cost out-of-band, from the same response, via the SDK's public HTTPClient.addHook('response', …) API:
- The hook calls
Response.clone() (tees the body via ReadableStream.tee()) and parses the clone to pull usage.cost / usage.cost_details before the Zod parser sees them.
- The SDK's stream consumer reads the other branch untouched — no extra HTTP request, no added end-of-stream latency. Cost arrives in the trailing SSE chunk, which the adapter was already waiting on for final token usage.
- Captured values are attached to
RUN_FINISHED under usage.cost (USD) and usage.costDetails.{upstreamInferenceCost, cacheDiscount}.
Type surface (additive, backwards-compatible)
Extend RunFinishedEvent.usage in @tanstack/ai with optional cost and costDetails. The middleware UsageInfo (onUsage) and FinishInfo.usage (onFinish) reuse the same shape so middleware authors can read cost without casts. Adapters that don't populate cost are unaffected.
interface UsageTotals {
promptTokens: number
completionTokens: number
totalTokens: number
cost?: number
costDetails?: {
upstreamInferenceCost?: number
cacheDiscount?: number
}
}
Consumer example
for await (const chunk of stream) {
if (chunk.type === 'RUN_FINISHED') {
console.log('USD:', chunk.usage?.cost)
console.log('Cache discount:', chunk.usage?.costDetails?.cacheDiscount)
}
}
Scope
@tanstack/ai-openrouter — adapter attaches the cost-capture hook, defers RUN_FINISHED until the stream fully drains so trailing usage-only chunks are included.
@tanstack/ai — UsageTotals type + additive fields on RunFinishedEvent.usage, UsageInfo, FinishInfo.usage.
- No change to other adapters. Other providers can opt in later via the same type surface.
Alternatives considered
/generation?id=… lookup after the stream ends — adds an extra HTTP round-trip per request; also tends to race OpenRouter's internal cost aggregation and can return null if called too quickly.
- Local tokens × price table — unreliable (see above).
- New event type (
COST_REPORTED) instead of extending usage — more intrusive on consumers; usage is the natural home for cost and keeps the agent loop and middleware unaware of the new field if they don't want it.
Happy to open a PR — branch is ready and full PR suite (lint, types, build, tests, docs, knip, sherif) is green across all 40 projects.
Problem
OpenRouter returns the authoritative per-request USD cost inline in every chat completion response (
usage.cost, plususage.cost_detailswithupstream_inference_costandcache_discount) — see https://openrouter.ai/docs/use-cases/usage-accounting. TanStack AI's@tanstack/ai-openrouteradapter currently discards this data, so apps that rely on OpenRouter for prod traffic have no first-class way to observe spend from the adapter's event stream.Two things make this tricky to solve at the call site:
@openrouter/sdkZod parser strips the fields. The SDK's response schema doesn't declarecost/cost_details, so by the time a chunk reaches the adapter, the fields are gone. A separate/generation?id=…lookup is possible but costs an extra HTTP round-trip per request.Proposal
Capture cost out-of-band, from the same response, via the SDK's public
HTTPClient.addHook('response', …)API:Response.clone()(tees the body viaReadableStream.tee()) and parses the clone to pullusage.cost/usage.cost_detailsbefore the Zod parser sees them.RUN_FINISHEDunderusage.cost(USD) andusage.costDetails.{upstreamInferenceCost, cacheDiscount}.Type surface (additive, backwards-compatible)
Extend
RunFinishedEvent.usagein@tanstack/aiwith optionalcostandcostDetails. The middlewareUsageInfo(onUsage) andFinishInfo.usage(onFinish) reuse the same shape so middleware authors can read cost without casts. Adapters that don't populate cost are unaffected.Consumer example
Scope
@tanstack/ai-openrouter— adapter attaches the cost-capture hook, defersRUN_FINISHEDuntil the stream fully drains so trailing usage-only chunks are included.@tanstack/ai—UsageTotalstype + additive fields onRunFinishedEvent.usage,UsageInfo,FinishInfo.usage.Alternatives considered
/generation?id=…lookup after the stream ends — adds an extra HTTP round-trip per request; also tends to race OpenRouter's internal cost aggregation and can returnnullif called too quickly.COST_REPORTED) instead of extendingusage— more intrusive on consumers;usageis the natural home for cost and keeps the agent loop and middleware unaware of the new field if they don't want it.Happy to open a PR — branch is ready and full PR suite (lint, types, build, tests, docs, knip, sherif) is green across all 40 projects.