Skip to content

feat(ai-openrouter): surface per-request cost on RUN_FINISHED #468

@season179

Description

@season179

Problem

OpenRouter returns the authoritative per-request USD cost inline in every chat completion response (usage.cost, plus usage.cost_details with upstream_inference_cost and cache_discount) — see https://openrouter.ai/docs/use-cases/usage-accounting. TanStack AI's @tanstack/ai-openrouter adapter currently discards this data, so apps that rely on OpenRouter for prod traffic have no first-class way to observe spend from the adapter's event stream.

Two things make this tricky to solve at the call site:

  1. The @openrouter/sdk Zod parser strips the fields. The SDK's response schema doesn't declare cost / cost_details, so by the time a chunk reaches the adapter, the fields are gone. A separate /generation?id=… lookup is possible but costs an extra HTTP round-trip per request.
  2. Cost can't be reconstructed locally from tokens × price. OpenRouter routes the same model id to different upstream providers (primary, fallback, BYOK), each with different prices, and applies cache discounts and BYOK upstream costs the SDK can't see. A static price table would silently drift and produce wrong numbers.

Proposal

Capture cost out-of-band, from the same response, via the SDK's public HTTPClient.addHook('response', …) API:

  • The hook calls Response.clone() (tees the body via ReadableStream.tee()) and parses the clone to pull usage.cost / usage.cost_details before the Zod parser sees them.
  • The SDK's stream consumer reads the other branch untouched — no extra HTTP request, no added end-of-stream latency. Cost arrives in the trailing SSE chunk, which the adapter was already waiting on for final token usage.
  • Captured values are attached to RUN_FINISHED under usage.cost (USD) and usage.costDetails.{upstreamInferenceCost, cacheDiscount}.

Type surface (additive, backwards-compatible)

Extend RunFinishedEvent.usage in @tanstack/ai with optional cost and costDetails. The middleware UsageInfo (onUsage) and FinishInfo.usage (onFinish) reuse the same shape so middleware authors can read cost without casts. Adapters that don't populate cost are unaffected.

interface UsageTotals {
  promptTokens: number
  completionTokens: number
  totalTokens: number
  cost?: number
  costDetails?: {
    upstreamInferenceCost?: number
    cacheDiscount?: number
  }
}

Consumer example

for await (const chunk of stream) {
  if (chunk.type === 'RUN_FINISHED') {
    console.log('USD:', chunk.usage?.cost)
    console.log('Cache discount:', chunk.usage?.costDetails?.cacheDiscount)
  }
}

Scope

  • @tanstack/ai-openrouter — adapter attaches the cost-capture hook, defers RUN_FINISHED until the stream fully drains so trailing usage-only chunks are included.
  • @tanstack/aiUsageTotals type + additive fields on RunFinishedEvent.usage, UsageInfo, FinishInfo.usage.
  • No change to other adapters. Other providers can opt in later via the same type surface.

Alternatives considered

  • /generation?id=… lookup after the stream ends — adds an extra HTTP round-trip per request; also tends to race OpenRouter's internal cost aggregation and can return null if called too quickly.
  • Local tokens × price table — unreliable (see above).
  • New event type (COST_REPORTED) instead of extending usage — more intrusive on consumers; usage is the natural home for cost and keeps the agent loop and middleware unaware of the new field if they don't want it.

Happy to open a PR — branch is ready and full PR suite (lint, types, build, tests, docs, knip, sherif) is green across all 40 projects.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions