Skip to content

feat(ai-openrouter): capture per-request cost from chat responses#469

Open
season179 wants to merge 11 commits intoTanStack:mainfrom
season179:feat/openrouter-cost-tracking
Open

feat(ai-openrouter): capture per-request cost from chat responses#469
season179 wants to merge 11 commits intoTanStack:mainfrom
season179:feat/openrouter-cost-tracking

Conversation

@season179
Copy link
Copy Markdown

@season179 season179 commented Apr 18, 2026

🎯 Changes

Surface OpenRouter's authoritative per-request USD cost on RUN_FINISHED. OpenRouter returns usage.cost and usage.cost_details inline in every chat response (docs), but the @openrouter/sdk Zod parser strips those fields because the SDK schema doesn't declare them. Cost also can't be reconstructed locally (different upstream routes → different prices, plus cache discounts the SDK can't see).

Closes #468

How

  • Capture: @tanstack/ai-openrouter attaches a hook on the SDK's public HTTPClient (addHook('response', …)). The hook calls Response.clone() to tee the body and parses the clone out-of-band to pull cost / cost_details before Zod strips them. The SDK's stream consumer reads the other branch untouched — no extra HTTP request, no added latency.
  • Scope guard: hook only fires on text/event-stream chat responses, so structured-output and non-streaming paths aren't cloned.
  • Propagate: OpenRouterTextAdapter reads the captured cost when the stream ends and emits it on RUN_FINISHED.usage.{cost, costDetails}. RUN_FINISHED is deferred until the upstream stream fully drains so the trailing usage-only chunk (empty choices) is included in usage.
  • Caller-provided httpClient is preserved: the adapter clones the caller's client (inheriting their fetcher, retries, tracing, and any pre-registered hooks) and appends cost capture to the clone. The caller's instance is never mutated.

Types (@tanstack/ai, additive + backwards-compatible)

  • New UsageTotals type with optional cost and costDetails (upstreamInferenceCost, cacheDiscount).
  • RunFinishedEvent.usage, middleware UsageInfo (consumed by onUsage), and FinishInfo.usage (consumed by onFinish) all reuse UsageTotals so they can't drift. No required field changes; adapters that don't populate cost keep working without modification.

Correctness

The out-of-band parse runs concurrently with the main stream, so the implementation handles:

  • Per-request isolation: each request's cost is keyed by its response id and only its own parse is awaited on take(id) — one slow stream doesn't block another's RUN_FINISHED.
  • SSE framing: parser handles \n\n, \r\n\r\n, and \r\r event separators including when \r / \n straddle read-chunk boundaries, plus EOF-terminated frames with no trailing separator.
  • Late stream aborts: a stream error after finishReason is captured does not downgrade the run to RUN_ERROR.
  • Missing tokens with cost present: usage is omitted entirely rather than fabricating zero-token counts alongside a captured cost.
  • No-cost responses: take(id) fast-paths to undefined as soon as the matching parse settles without recording cost; cost-less providers or non-cost responses aren't penalized.

Docs + changeset

  • New "Cost Tracking" section in docs/adapters/openrouter.md.
  • Changeset marks @tanstack/ai-openrouter and @tanstack/ai as minor.

Tests

  • cost-capture.test.ts (19 new tests): SSE parse w/ and w/o details, non-streaming skip, CRLF separators, EOF flush, preceding-hook body-disturbed path, per-id isolation, race regressions.
  • openrouter-adapter.test.ts (+7 tests): basic attach, cost w/o details, trailing usage-only chunk, consume-once, custom httpClient preservation, late-abort still emits RUN_FINISHED, zero-token not fabricated.
  • 59/59 @tanstack/ai-openrouter unit tests green; full PR suite (lint, types, build, tests, docs, knip, sherif) green across 40 projects.

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

  • New Features

    • Added per-request USD cost tracking for OpenRouter. RUN_FINISHED now may include usage.cost and optional usage.costDetails (upstream inference cost, cache discounts).
  • Documentation

    • Added cost-tracking docs and README entry with examples showing how to read cost from streamed output.
  • Tests

    • Added comprehensive tests covering SSE and non-streaming cost capture, edge cases, and integration with custom HTTP clients.

OpenRouter returns authoritative per-request USD cost inline in the
chat completion response (SSE trailing chunk or non-streaming JSON),
but the @openrouter/sdk Zod parser strips the `cost` and `cost_details`
fields since its schema doesn't declare them. Cost also can't be
reconstructed locally from token counts — OpenRouter routes the same
model id to different upstream providers with different pricing and
applies cache discounts the SDK can't see.

Capture cost via the SDK's public `HTTPClient` response hook:
`Response.clone()` tees the body, a background parser reads the clone
to pull `usage.cost`/`cost_details` out-of-band, and the adapter reads
the result when emitting RUN_FINISHED. The SDK's stream consumer sees
an untouched response, so there's no extra HTTP request and no added
end-of-stream latency.

Adapter changes:
- `OpenRouterTextAdapter` always allocates a `CostStore` and wraps the
  caller's `HTTPClient` via `attachCostCapture`, cloning it so any
  caller-registered hooks/fetchers are preserved and the caller's
  instance stays untouched.
- Defer the `RUN_FINISHED` emission until after the upstream stream
  fully drains, so a trailing usage-only chunk (empty `choices`) is
  included in `usage` instead of being dropped.

Type changes (`@tanstack/ai`, additive + backwards-compatible):
- New `UsageTotals` named type with optional `cost` and `costDetails`.
- `RunFinishedEvent.usage` and middleware `UsageInfo`/`FinishInfo.usage`
  reuse `UsageTotals` so they can't drift.

Tests: 53/53 in `@tanstack/ai-openrouter` (new `cost-capture.test.ts`
covering SSE/JSON parsing, `attachCostCapture` clone semantics,
`CostStore` race regression, resilience when a preceding response hook
disturbs the body). Full PR suite green across 40 projects.
… CRLF SSE

Two correctness gaps surfaced during review:

1. Concurrent requests on the same adapter blocked each other. `take(id)`
   fell through to `Promise.allSettled([...pendingParses])`, which waited
   for every in-flight parse — so an unrelated long-running stream could
   delay or effectively hang an already-completed request's RUN_FINISHED.
   Parses now announce their response id (`announceId`) as soon as they
   see it, and `take(id)` awaits only the matching parse, falling back
   to a race between the next announcement and the current wave of
   parses draining when no match has been announced yet.

2. The SSE parser only split on `\n\n`, so spec-compliant CRLF-framed
   streams (`\r\n\r\n`) yielded no events and cost was silently dropped.
   Normalize CR/CRLF to LF at decode time before splitting.

Adds two regression tests: per-id isolation on CostStore, and a
CRLF-framed body through the full hook.
… frame

Two more correctness holes surfaced in review of the previous fix:

1. A parse that announced its id but produced no `usage.cost` used to
   clear its `idToParse` entry in `parse.finally()`, so a subsequent
   `take(id)` found no match and fell through to the pending-parses
   wait — reintroducing head-of-line blocking on unrelated concurrent
   streams, exactly for the common "response had no cost field" case.
   `idToParse` now keeps settled entries around briefly (TTL cleanup)
   so `take(id)` can resolve undefined without waiting on other parses.

2. The SSE loop only processed `\n\n`-delimited frames and then broke
   on EOF, so an EOF-terminated response missing the trailing blank
   line dropped its final usage chunk. Flush whatever is left in the
   buffer as a final event when `done` fires, reusing a small
   `applyEvent` helper so the inline and flush paths don't diverge.

Adds regression tests for both paths: a no-cost completion on a shared
store (proves it doesn't block on an in-flight unrelated parse) and an
EOF-terminated SSE body (proves the trailing frame is flushed).
The adapter-wide HTTPClient hook is shared by every `chat.send` call,
including `structuredOutput()` which uses `stream: false`. That path
never reads `costStore`, so cloning and second-parsing the JSON body
was pure overhead — extra allocation for every structured-output
response, plus a short-lived entry held in `CostStore` until TTL.

Gate the hook on `content-type: text/event-stream` and drop the
now-unreachable JSON parse branch. The existing URL filter narrows
to `/chat/completions`; the content-type check further narrows to
streaming responses only. Updates the `non-streaming JSON` test to
verify the hook leaves those responses untouched.
…ate stream abort

Deferring RUN_FINISHED until after the for-await loop drains (so the
trailing usage chunk and tee'd cost parse can settle) meant that any
stream error after `finishReason` but before the stream closed would
fall into the outer catch and surface RUN_ERROR — even though the run
was logically complete and the SDK had already delivered the final
choice. Pre-patch, processChoice emitted RUN_FINISHED inline on
finishReason, so a late abort was harmless.

Wrap the for-await in an inner try/catch. If the stream errors before
a terminal finishReason was seen, rethrow so the outer catch emits
RUN_ERROR unchanged. If it errors after, swallow and fall through to
the RUN_FINISHED emission — cost is read via the usual costStore.take
path and is simply omitted if the parser didn't make it that far.

Adds a regression test with an async iterable that throws after
yielding the finishReason chunk; asserts RUN_FINISHED is present and
RUN_ERROR is not.
… if cost captured

When the stream aborts after finishReason but before the trailing
usage chunk, the tee'd cost parser can still have finished earlier
(the branches are independent) and populated `costStore`. The previous
helper turned a missing `finalUsage` into `{0, 0, 0, cost: X}` — a
"successful run with 0 tokens but $X cost" signal that's actively
wrong for downstream billing and telemetry consumers.

Drop the zero-fill: if no token counts arrived, return `undefined`
for `usage` even if `costInfo` was captured. Losing cost in this rare
edge case is better than fabricating token counts that never existed.

Adds a regression test that stubs an async iterable throwing after
finishReason (with no usage on that chunk) and pre-seeds costStore to
simulate the tee beating the SDK consumer. Asserts RUN_FINISHED's
usage is absent.
…k CRLF normalize

The previous `\r\n?` → `\n` normalization ran per chunk, which is unsafe
when CRLF straddles a read boundary. A lone `\r` at the end of chunk N
got rewritten to `\n` before the paired `\n` arrived in chunk N+1;
concatenating the two produced a false `\n\n` frame separator in the
middle of a line.

Match all spec-compliant separator patterns (`\r\n\r\n`, `\r\r`, `\n\n`)
against the un-normalized buffer via regex, and split `extractDataPayload`
lines on any of `\r\n|\r|\n`. Frame detection no longer depends on byte-
level chunk alignment, and frames keep their original bytes so the data
payload is parsed as the server sent it.
…code

Clarify in the adapter why RUN_FINISHED is held until the stream fully
drains: cost/tokens are part of the cross-package `RUN_FINISHED.usage`
contract (and the middleware `onUsage` hook fires off that exact
payload), so emitting early would force callers into a separate event
or drop cost altogether. The added wait is bounded by the server
flushing its trailing usage chunk on the same SSE connection (~10-20ms),
not an additional network round-trip.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4cc16100-ec90-4fd5-ab3d-3e4412d4c93b

📥 Commits

Reviewing files that changed from the base of the PR and between 55adc43 and ab571ba.

📒 Files selected for processing (3)
  • packages/typescript/ai-openrouter/src/adapters/cost-capture.ts
  • packages/typescript/ai-openrouter/tests/openrouter-adapter.test.ts
  • packages/typescript/ai/src/types.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/typescript/ai-openrouter/tests/openrouter-adapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/typescript/ai/src/types.ts
  • packages/typescript/ai-openrouter/src/adapters/cost-capture.ts

📝 Walkthrough

Walkthrough

Adds out-of-band per-request USD cost capture for the OpenRouter adapter by cloning and parsing SSE responses via an HTTPClient response hook, buffering results in a CostStore, and attaching cost/costDetails to RUN_FINISHED.usage. Extends core types to include optional cost fields; updates adapter flow and tests.

Changes

Cohort / File(s) Summary
Type System Extensions
packages/typescript/ai/src/types.ts, packages/typescript/ai/src/activities/chat/middleware/types.ts
Introduce UsageTotals (includes optional cost and costDetails); switch RunFinishedEvent.usage, UsageInfo, and FinishInfo.usage to use the new type.
Cost Capture Infrastructure
packages/typescript/ai-openrouter/src/adapters/cost-capture.ts
New module: CostInfo type, CostStore with TTL and concurrency handling, SSE parser hook createCostCaptureHook(), and attachCostCapture() to augment an HTTPClient without mutating callers.
OpenRouter Adapter Integration
packages/typescript/ai-openrouter/src/adapters/text.ts
Adapter now constructs a cost-enabled client, defers RUN_FINISHED until stream drain, accumulates final usage, queries CostStore.take(responseId) and synthesizes RUN_FINISHED.usage including cost; adjusts error and finish handling and some config typing.
Documentation & Changeset
.changeset/openrouter-cost-tracking.md, README.md, docs/adapters/openrouter.md
Add changeset, README bullet, and adapter docs covering cost tracking, examples, and behavior when cost is absent.
Tests
packages/typescript/ai-openrouter/tests/cost-capture.test.ts, packages/typescript/ai-openrouter/tests/openrouter-adapter.test.ts
Add comprehensive unit and integration tests for SSE parsing, hook behavior, CostStore concurrency semantics, adapter integration, custom httpClient wrapping, and stream-abort regressions.
Misc (small)
packages/typescript/ai-openrouter/src/adapters/text.ts (type alias change)
Change OpenRouterConfig from interface extending SDKOptions to type OpenRouterConfig = SDKOptions; update related param types.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Adapter as OpenRouter Adapter
    participant Hook as HTTPClient Hook
    participant SSE as SSE Stream
    participant CostStore as CostStore
    participant App

    Client->>Adapter: chatStream(request)
    Adapter->>Hook: build client with cost-capture hook
    Adapter->>SSE: Open chat completion stream

    SSE->>Hook: HTTP Response (text/event-stream)
    Hook->>Hook: clone Response body (tee)
    Hook->>SSE: return untouched stream to adapter
    Hook->>Hook: parse cloned SSE (extract usage.cost & id)
    Hook->>CostStore: set(id, {cost, costDetails})

    par Streaming messages and cost extraction
        SSE->>Adapter: yield chunk deltas (content, token usage)
        Adapter->>Adapter: accumulate finalUsage, finishReason
    and
        Hook->>CostStore: store extracted cost
    end

    SSE->>Adapter: trailing usage-only chunk (finalUsage)
    Adapter->>CostStore: take(responseId)
    CostStore-->>Adapter: {cost, costDetails} | undefined
    Adapter->>Adapter: buildRunFinishedUsage(finalUsage, costInfo)
    Adapter->>App: emit RUN_FINISHED with usage (+ cost)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇
I nibble streams and clone with care,
Sniff out the cents hiding in the air.
No extra hops, no latency cost—
I tuck USD secrets where they won't be lost.
Hooray, a rabbit's feast: cost tracking at last!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: capturing per-request cost from OpenRouter chat responses and surfacing it on RUN_FINISHED.
Description check ✅ Passed The description comprehensively covers all aspects of the change, including the approach, implementation details, type changes, correctness considerations, and testing—all sections of the template are addressed.
Linked Issues check ✅ Passed The PR fully implements all objectives from issue #468: captures cost via HTTPClient hook [#468], defers RUN_FINISHED for trailing chunks [#468], extends RunFinishedEvent.usage with optional cost fields [#468], preserves caller httpClient [#468], adds tests and docs [#468].
Out of Scope Changes check ✅ Passed All changes are directly scoped to #468: cost-capture hook implementation, type extensions for cost fields, adapter integration, tests, and documentation—no unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/typescript/ai/src/types.ts (1)

796-806: costDetails comment says "loosely typed" but the shape is actually locked to OpenRouter's two fields.

The JSDoc argues the type must be loose to accommodate provider divergence (BYOK upstream, cache discounts, per-tier rates, ...), but costDetails only declares upstreamInferenceCost and cacheDiscount. Any future adapter that reports, say, a tier-specific rate would be forced to either omit it or as any-cast — exactly what the comment says should be avoided.

If the intent is genuinely "loose", consider adding an index signature so additional provider-specific keys are type-legal without a cast. Otherwise, tighten the comment to match the actual (narrow, OpenRouter-shaped) contract.

♻️ Suggested adjustment
   costDetails?: {
     upstreamInferenceCost?: number | null
     cacheDiscount?: number | null
+    [key: string]: number | null | undefined
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai/src/types.ts` around lines 796 - 806, The JSDoc
promises a "loosely typed" costDetails but the declared shape only allows
upstreamInferenceCost and cacheDiscount; update the type for costDetails (in
packages/typescript/ai/src/types.ts, symbol: costDetails) to include an index
signature (or use Record<string, number | null>) while keeping the existing
named keys so provider-specific numeric fields are allowed without casting; this
ensures adapters can add tier/rate/BYOK fields legally without changing the
comment.
packages/typescript/ai-openrouter/src/adapters/cost-capture.ts (1)

161-167: Minor: content-type check is case-sensitive.

content-type header values are case-insensitive per RFC; OpenRouter today uses lowercase text/event-stream, but a proxy on the path could legitimately return e.g. Text/Event-Stream and cost capture would silently skip. Cheap to harden:

♻️ Proposed tweak
-    const contentType = res.headers.get('content-type') ?? ''
+    const contentType = (res.headers.get('content-type') ?? '').toLowerCase()
     // Cost capture is only wired for streaming chat completions. Non-SSE
     // responses on `/chat/completions` (e.g. `structuredOutput()` which
     // calls `chat.send({ stream: false })`) never consume `costStore` —
     // skipping them here avoids cloning the response and second-parsing
     // potentially large JSON bodies for no downstream consumer.
     if (!contentType.includes('text/event-stream')) return
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/typescript/ai-openrouter/src/adapters/cost-capture.ts` around lines
161 - 167, The content-type check in the cost capture branch is case-sensitive
and may miss valid SSE responses; update the check around the contentType
variable so it compares case-insensitively (e.g. normalize contentType with
toLowerCase() or use a case-insensitive regex) before testing for
'text/event-stream' and keep the early return behavior unchanged to avoid extra
parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/typescript/ai-openrouter/tests/openrouter-adapter.test.ts`:
- Around line 6-19: Merge the two separate type imports from
'../src/adapters/text' into one line (exported types OpenRouterTextAdapter and
OpenRouterTextModelOptions) to remove import/no-duplicates, and replace the
typed vi.importActual call in the vi.mock helper (currently using await
vi.importActual<typeof import('@openrouter/sdk')>('@openrouter/sdk')) with an
untyped runtime import (await vi.importActual('@openrouter/sdk')) or cast the
result to any so you avoid the banned import() type annotation; keep the other
type imports (CostStore, StreamChunk, Tool) and the module-scope mockSend
variable unchanged.

---

Nitpick comments:
In `@packages/typescript/ai-openrouter/src/adapters/cost-capture.ts`:
- Around line 161-167: The content-type check in the cost capture branch is
case-sensitive and may miss valid SSE responses; update the check around the
contentType variable so it compares case-insensitively (e.g. normalize
contentType with toLowerCase() or use a case-insensitive regex) before testing
for 'text/event-stream' and keep the early return behavior unchanged to avoid
extra parsing.

In `@packages/typescript/ai/src/types.ts`:
- Around line 796-806: The JSDoc promises a "loosely typed" costDetails but the
declared shape only allows upstreamInferenceCost and cacheDiscount; update the
type for costDetails (in packages/typescript/ai/src/types.ts, symbol:
costDetails) to include an index signature (or use Record<string, number |
null>) while keeping the existing named keys so provider-specific numeric fields
are allowed without casting; this ensures adapters can add tier/rate/BYOK fields
legally without changing the comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90bbcd91-81f2-44c8-9463-74a171a180c3

📥 Commits

Reviewing files that changed from the base of the PR and between 2d1fd08 and 55adc43.

📒 Files selected for processing (9)
  • .changeset/openrouter-cost-tracking.md
  • README.md
  • docs/adapters/openrouter.md
  • packages/typescript/ai-openrouter/src/adapters/cost-capture.ts
  • packages/typescript/ai-openrouter/src/adapters/text.ts
  • packages/typescript/ai-openrouter/tests/cost-capture.test.ts
  • packages/typescript/ai-openrouter/tests/openrouter-adapter.test.ts
  • packages/typescript/ai/src/activities/chat/middleware/types.ts
  • packages/typescript/ai/src/types.ts

Comment thread packages/typescript/ai-openrouter/tests/openrouter-adapter.test.ts Outdated
Merge the two separate type imports from '../src/adapters/text' into
one statement (import/no-duplicates) and replace the inline `typeof
import('@openrouter/sdk')` type annotation with a top-level
`import type * as OpenRouterSDK` (@typescript-eslint/consistent-type-imports).

Picked up in CodeRabbit review of TanStack#469.
Content-Type header values are case-insensitive per RFC 9110.
OpenRouter today serves lowercase `text/event-stream` but a proxy on
the path could return a different casing, which would make cost
capture silently skip a real SSE response. Lowercase the header before
the substring match.

Picked up in CodeRabbit review of TanStack#469.
…etails

The JSDoc promised "loosely typed" costDetails to accommodate provider
divergence (BYOK upstream costs, cache discounts, per-tier rates) but
the declared shape only allowed two OpenRouter-specific keys, so any
other adapter would have been forced to `as any`-cast to report its
own breakdown. Add a numeric index signature so additional keys are
type-legal without a cast, matching the documented intent.

Picked up in CodeRabbit review of TanStack#469.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(ai-openrouter): surface per-request cost on RUN_FINISHED

1 participant