feat(ai-openrouter): capture per-request cost from chat responses#469
feat(ai-openrouter): capture per-request cost from chat responses#469season179 wants to merge 11 commits intoTanStack:mainfrom
Conversation
OpenRouter returns authoritative per-request USD cost inline in the chat completion response (SSE trailing chunk or non-streaming JSON), but the @openrouter/sdk Zod parser strips the `cost` and `cost_details` fields since its schema doesn't declare them. Cost also can't be reconstructed locally from token counts — OpenRouter routes the same model id to different upstream providers with different pricing and applies cache discounts the SDK can't see. Capture cost via the SDK's public `HTTPClient` response hook: `Response.clone()` tees the body, a background parser reads the clone to pull `usage.cost`/`cost_details` out-of-band, and the adapter reads the result when emitting RUN_FINISHED. The SDK's stream consumer sees an untouched response, so there's no extra HTTP request and no added end-of-stream latency. Adapter changes: - `OpenRouterTextAdapter` always allocates a `CostStore` and wraps the caller's `HTTPClient` via `attachCostCapture`, cloning it so any caller-registered hooks/fetchers are preserved and the caller's instance stays untouched. - Defer the `RUN_FINISHED` emission until after the upstream stream fully drains, so a trailing usage-only chunk (empty `choices`) is included in `usage` instead of being dropped. Type changes (`@tanstack/ai`, additive + backwards-compatible): - New `UsageTotals` named type with optional `cost` and `costDetails`. - `RunFinishedEvent.usage` and middleware `UsageInfo`/`FinishInfo.usage` reuse `UsageTotals` so they can't drift. Tests: 53/53 in `@tanstack/ai-openrouter` (new `cost-capture.test.ts` covering SSE/JSON parsing, `attachCostCapture` clone semantics, `CostStore` race regression, resilience when a preceding response hook disturbs the body). Full PR suite green across 40 projects.
… CRLF SSE Two correctness gaps surfaced during review: 1. Concurrent requests on the same adapter blocked each other. `take(id)` fell through to `Promise.allSettled([...pendingParses])`, which waited for every in-flight parse — so an unrelated long-running stream could delay or effectively hang an already-completed request's RUN_FINISHED. Parses now announce their response id (`announceId`) as soon as they see it, and `take(id)` awaits only the matching parse, falling back to a race between the next announcement and the current wave of parses draining when no match has been announced yet. 2. The SSE parser only split on `\n\n`, so spec-compliant CRLF-framed streams (`\r\n\r\n`) yielded no events and cost was silently dropped. Normalize CR/CRLF to LF at decode time before splitting. Adds two regression tests: per-id isolation on CostStore, and a CRLF-framed body through the full hook.
… frame Two more correctness holes surfaced in review of the previous fix: 1. A parse that announced its id but produced no `usage.cost` used to clear its `idToParse` entry in `parse.finally()`, so a subsequent `take(id)` found no match and fell through to the pending-parses wait — reintroducing head-of-line blocking on unrelated concurrent streams, exactly for the common "response had no cost field" case. `idToParse` now keeps settled entries around briefly (TTL cleanup) so `take(id)` can resolve undefined without waiting on other parses. 2. The SSE loop only processed `\n\n`-delimited frames and then broke on EOF, so an EOF-terminated response missing the trailing blank line dropped its final usage chunk. Flush whatever is left in the buffer as a final event when `done` fires, reusing a small `applyEvent` helper so the inline and flush paths don't diverge. Adds regression tests for both paths: a no-cost completion on a shared store (proves it doesn't block on an in-flight unrelated parse) and an EOF-terminated SSE body (proves the trailing frame is flushed).
The adapter-wide HTTPClient hook is shared by every `chat.send` call, including `structuredOutput()` which uses `stream: false`. That path never reads `costStore`, so cloning and second-parsing the JSON body was pure overhead — extra allocation for every structured-output response, plus a short-lived entry held in `CostStore` until TTL. Gate the hook on `content-type: text/event-stream` and drop the now-unreachable JSON parse branch. The existing URL filter narrows to `/chat/completions`; the content-type check further narrows to streaming responses only. Updates the `non-streaming JSON` test to verify the hook leaves those responses untouched.
…ate stream abort Deferring RUN_FINISHED until after the for-await loop drains (so the trailing usage chunk and tee'd cost parse can settle) meant that any stream error after `finishReason` but before the stream closed would fall into the outer catch and surface RUN_ERROR — even though the run was logically complete and the SDK had already delivered the final choice. Pre-patch, processChoice emitted RUN_FINISHED inline on finishReason, so a late abort was harmless. Wrap the for-await in an inner try/catch. If the stream errors before a terminal finishReason was seen, rethrow so the outer catch emits RUN_ERROR unchanged. If it errors after, swallow and fall through to the RUN_FINISHED emission — cost is read via the usual costStore.take path and is simply omitted if the parser didn't make it that far. Adds a regression test with an async iterable that throws after yielding the finishReason chunk; asserts RUN_FINISHED is present and RUN_ERROR is not.
… if cost captured
When the stream aborts after finishReason but before the trailing
usage chunk, the tee'd cost parser can still have finished earlier
(the branches are independent) and populated `costStore`. The previous
helper turned a missing `finalUsage` into `{0, 0, 0, cost: X}` — a
"successful run with 0 tokens but $X cost" signal that's actively
wrong for downstream billing and telemetry consumers.
Drop the zero-fill: if no token counts arrived, return `undefined`
for `usage` even if `costInfo` was captured. Losing cost in this rare
edge case is better than fabricating token counts that never existed.
Adds a regression test that stubs an async iterable throwing after
finishReason (with no usage on that chunk) and pre-seeds costStore to
simulate the tee beating the SDK consumer. Asserts RUN_FINISHED's
usage is absent.
…k CRLF normalize The previous `\r\n?` → `\n` normalization ran per chunk, which is unsafe when CRLF straddles a read boundary. A lone `\r` at the end of chunk N got rewritten to `\n` before the paired `\n` arrived in chunk N+1; concatenating the two produced a false `\n\n` frame separator in the middle of a line. Match all spec-compliant separator patterns (`\r\n\r\n`, `\r\r`, `\n\n`) against the un-normalized buffer via regex, and split `extractDataPayload` lines on any of `\r\n|\r|\n`. Frame detection no longer depends on byte- level chunk alignment, and frames keep their original bytes so the data payload is parsed as the server sent it.
…code Clarify in the adapter why RUN_FINISHED is held until the stream fully drains: cost/tokens are part of the cross-package `RUN_FINISHED.usage` contract (and the middleware `onUsage` hook fires off that exact payload), so emitting early would force callers into a separate event or drop cost altogether. The added wait is bounded by the server flushing its trailing usage chunk on the same SSE connection (~10-20ms), not an additional network round-trip.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds out-of-band per-request USD cost capture for the OpenRouter adapter by cloning and parsing SSE responses via an HTTPClient response hook, buffering results in a CostStore, and attaching Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Adapter as OpenRouter Adapter
participant Hook as HTTPClient Hook
participant SSE as SSE Stream
participant CostStore as CostStore
participant App
Client->>Adapter: chatStream(request)
Adapter->>Hook: build client with cost-capture hook
Adapter->>SSE: Open chat completion stream
SSE->>Hook: HTTP Response (text/event-stream)
Hook->>Hook: clone Response body (tee)
Hook->>SSE: return untouched stream to adapter
Hook->>Hook: parse cloned SSE (extract usage.cost & id)
Hook->>CostStore: set(id, {cost, costDetails})
par Streaming messages and cost extraction
SSE->>Adapter: yield chunk deltas (content, token usage)
Adapter->>Adapter: accumulate finalUsage, finishReason
and
Hook->>CostStore: store extracted cost
end
SSE->>Adapter: trailing usage-only chunk (finalUsage)
Adapter->>CostStore: take(responseId)
CostStore-->>Adapter: {cost, costDetails} | undefined
Adapter->>Adapter: buildRunFinishedUsage(finalUsage, costInfo)
Adapter->>App: emit RUN_FINISHED with usage (+ cost)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
packages/typescript/ai/src/types.ts (1)
796-806:costDetailscomment says "loosely typed" but the shape is actually locked to OpenRouter's two fields.The JSDoc argues the type must be loose to accommodate provider divergence (BYOK upstream, cache discounts, per-tier rates, ...), but
costDetailsonly declaresupstreamInferenceCostandcacheDiscount. Any future adapter that reports, say, a tier-specific rate would be forced to either omit it oras any-cast — exactly what the comment says should be avoided.If the intent is genuinely "loose", consider adding an index signature so additional provider-specific keys are type-legal without a cast. Otherwise, tighten the comment to match the actual (narrow, OpenRouter-shaped) contract.
♻️ Suggested adjustment
costDetails?: { upstreamInferenceCost?: number | null cacheDiscount?: number | null + [key: string]: number | null | undefined }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai/src/types.ts` around lines 796 - 806, The JSDoc promises a "loosely typed" costDetails but the declared shape only allows upstreamInferenceCost and cacheDiscount; update the type for costDetails (in packages/typescript/ai/src/types.ts, symbol: costDetails) to include an index signature (or use Record<string, number | null>) while keeping the existing named keys so provider-specific numeric fields are allowed without casting; this ensures adapters can add tier/rate/BYOK fields legally without changing the comment.packages/typescript/ai-openrouter/src/adapters/cost-capture.ts (1)
161-167: Minor: content-type check is case-sensitive.
content-typeheader values are case-insensitive per RFC; OpenRouter today uses lowercasetext/event-stream, but a proxy on the path could legitimately return e.g.Text/Event-Streamand cost capture would silently skip. Cheap to harden:♻️ Proposed tweak
- const contentType = res.headers.get('content-type') ?? '' + const contentType = (res.headers.get('content-type') ?? '').toLowerCase() // Cost capture is only wired for streaming chat completions. Non-SSE // responses on `/chat/completions` (e.g. `structuredOutput()` which // calls `chat.send({ stream: false })`) never consume `costStore` — // skipping them here avoids cloning the response and second-parsing // potentially large JSON bodies for no downstream consumer. if (!contentType.includes('text/event-stream')) return🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-openrouter/src/adapters/cost-capture.ts` around lines 161 - 167, The content-type check in the cost capture branch is case-sensitive and may miss valid SSE responses; update the check around the contentType variable so it compares case-insensitively (e.g. normalize contentType with toLowerCase() or use a case-insensitive regex) before testing for 'text/event-stream' and keep the early return behavior unchanged to avoid extra parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/typescript/ai-openrouter/tests/openrouter-adapter.test.ts`:
- Around line 6-19: Merge the two separate type imports from
'../src/adapters/text' into one line (exported types OpenRouterTextAdapter and
OpenRouterTextModelOptions) to remove import/no-duplicates, and replace the
typed vi.importActual call in the vi.mock helper (currently using await
vi.importActual<typeof import('@openrouter/sdk')>('@openrouter/sdk')) with an
untyped runtime import (await vi.importActual('@openrouter/sdk')) or cast the
result to any so you avoid the banned import() type annotation; keep the other
type imports (CostStore, StreamChunk, Tool) and the module-scope mockSend
variable unchanged.
---
Nitpick comments:
In `@packages/typescript/ai-openrouter/src/adapters/cost-capture.ts`:
- Around line 161-167: The content-type check in the cost capture branch is
case-sensitive and may miss valid SSE responses; update the check around the
contentType variable so it compares case-insensitively (e.g. normalize
contentType with toLowerCase() or use a case-insensitive regex) before testing
for 'text/event-stream' and keep the early return behavior unchanged to avoid
extra parsing.
In `@packages/typescript/ai/src/types.ts`:
- Around line 796-806: The JSDoc promises a "loosely typed" costDetails but the
declared shape only allows upstreamInferenceCost and cacheDiscount; update the
type for costDetails (in packages/typescript/ai/src/types.ts, symbol:
costDetails) to include an index signature (or use Record<string, number |
null>) while keeping the existing named keys so provider-specific numeric fields
are allowed without casting; this ensures adapters can add tier/rate/BYOK fields
legally without changing the comment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 90bbcd91-81f2-44c8-9463-74a171a180c3
📒 Files selected for processing (9)
.changeset/openrouter-cost-tracking.mdREADME.mddocs/adapters/openrouter.mdpackages/typescript/ai-openrouter/src/adapters/cost-capture.tspackages/typescript/ai-openrouter/src/adapters/text.tspackages/typescript/ai-openrouter/tests/cost-capture.test.tspackages/typescript/ai-openrouter/tests/openrouter-adapter.test.tspackages/typescript/ai/src/activities/chat/middleware/types.tspackages/typescript/ai/src/types.ts
Merge the two separate type imports from '../src/adapters/text' into
one statement (import/no-duplicates) and replace the inline `typeof
import('@openrouter/sdk')` type annotation with a top-level
`import type * as OpenRouterSDK` (@typescript-eslint/consistent-type-imports).
Picked up in CodeRabbit review of TanStack#469.
Content-Type header values are case-insensitive per RFC 9110. OpenRouter today serves lowercase `text/event-stream` but a proxy on the path could return a different casing, which would make cost capture silently skip a real SSE response. Lowercase the header before the substring match. Picked up in CodeRabbit review of TanStack#469.
…etails The JSDoc promised "loosely typed" costDetails to accommodate provider divergence (BYOK upstream costs, cache discounts, per-tier rates) but the declared shape only allowed two OpenRouter-specific keys, so any other adapter would have been forced to `as any`-cast to report its own breakdown. Add a numeric index signature so additional keys are type-legal without a cast, matching the documented intent. Picked up in CodeRabbit review of TanStack#469.
🎯 Changes
Surface OpenRouter's authoritative per-request USD cost on
RUN_FINISHED. OpenRouter returnsusage.costandusage.cost_detailsinline in every chat response (docs), but the@openrouter/sdkZod parser strips those fields because the SDK schema doesn't declare them. Cost also can't be reconstructed locally (different upstream routes → different prices, plus cache discounts the SDK can't see).Closes #468
How
@tanstack/ai-openrouterattaches a hook on the SDK's publicHTTPClient(addHook('response', …)). The hook callsResponse.clone()to tee the body and parses the clone out-of-band to pullcost/cost_detailsbefore Zod strips them. The SDK's stream consumer reads the other branch untouched — no extra HTTP request, no added latency.text/event-streamchat responses, so structured-output and non-streaming paths aren't cloned.OpenRouterTextAdapterreads the captured cost when the stream ends and emits it onRUN_FINISHED.usage.{cost, costDetails}.RUN_FINISHEDis deferred until the upstream stream fully drains so the trailing usage-only chunk (emptychoices) is included inusage.httpClientis preserved: the adapter clones the caller's client (inheriting their fetcher, retries, tracing, and any pre-registered hooks) and appends cost capture to the clone. The caller's instance is never mutated.Types (
@tanstack/ai, additive + backwards-compatible)UsageTotalstype with optionalcostandcostDetails(upstreamInferenceCost,cacheDiscount).RunFinishedEvent.usage, middlewareUsageInfo(consumed byonUsage), andFinishInfo.usage(consumed byonFinish) all reuseUsageTotalsso they can't drift. No required field changes; adapters that don't populate cost keep working without modification.Correctness
The out-of-band parse runs concurrently with the main stream, so the implementation handles:
take(id)— one slow stream doesn't block another'sRUN_FINISHED.\n\n,\r\n\r\n, and\r\revent separators including when\r/\nstraddle read-chunk boundaries, plus EOF-terminated frames with no trailing separator.finishReasonis captured does not downgrade the run toRUN_ERROR.take(id)fast-paths toundefinedas soon as the matching parse settles without recording cost; cost-less providers or non-cost responses aren't penalized.Docs + changeset
docs/adapters/openrouter.md.@tanstack/ai-openrouterand@tanstack/aias minor.Tests
cost-capture.test.ts(19 new tests): SSE parse w/ and w/o details, non-streaming skip, CRLF separators, EOF flush, preceding-hook body-disturbed path, per-id isolation, race regressions.openrouter-adapter.test.ts(+7 tests): basic attach, cost w/o details, trailing usage-only chunk, consume-once, customhttpClientpreservation, late-abort still emitsRUN_FINISHED, zero-token not fabricated.@tanstack/ai-openrouterunit tests green; full PR suite (lint, types, build, tests, docs, knip, sherif) green across 40 projects.✅ Checklist
pnpm run test:pr.🚀 Release Impact
Summary by CodeRabbit
New Features
RUN_FINISHEDnow may includeusage.costand optionalusage.costDetails(upstream inference cost, cache discounts).Documentation
Tests