Skip to content

fal adapters discard x-fal-billable-units — surface actual billed cost as result.usage #722

@tombeckenham

Description

@tombeckenham

Summary

The fal adapters (@tanstack/ai-fal) discard fal's response headers, so the actual billed cost of a generation is unrecoverable by consumers. fal returns the real billed quantity in the x-fal-billable-units response header on the result fetch; combined with the endpoint's unit price (GET https://api.fal.ai/v1/models/pricing?endpoint_id=…unit_price) the cost is simply billableUnits * unitPrice (unit-agnostic — the header is already denominated in the priced unit).

Today there's no way to get this through the SDK, so apps have to wrap fetch and scrape the header themselves. It'd be great to surface it as result.usage.

Where it's lost

  • @fal-ai/client's resultResponseHandler reads x-fal-request-id (REQUEST_ID_HEADER) but ignores x-fal-billable-units, returning only { data, requestId }.
  • The fal adapters then return only the transformed payload + id:
    • adapters/image.jsfal.subscribe(...){ id, model, images }
    • adapters/video.jsfal.queue.result(...) in getVideoUrl{ jobId, url }
    • adapters/audio.js{ id, audio }

So the units never reach the activity layer.

The hook already exists

@tanstack/ai's generateImage/generateAudio activities already pass a usage field through when an adapter provides one — e.g. runGenerateImage:

if (result.usage) {
  // emitted as usage
}

The adapters just never populate it. generateVideo's result type would need a usage slot added; image/audio already have the passthrough.

Proposed change

Have each fal adapter read x-fal-billable-units off the result response and surface it:

result.usage = { billableUnits: number, unit?: string }

No global state or correlation needed — the adapter already knows the requestId it's resolving (fal.queue.result({ requestId }) / subscribe's result), so it can read the header off that fetch and attach usage to the object it's about to return.

Implementation note: use config.fetch, not responseHandler

A global responseHandler set via fal.config(...) won't work for this: the fal client forces resultResponseHandler per queue operation (queue.js overrides the config's response handler for submit/status/result), clobbering any global one. config.fetch, however, is honoured for every request, so it's the stable place to read the header before the body is parsed. Reading response.headers doesn't consume the body, so the response can be returned untouched.

Why this matters

With LLM chat() cost already flowing through (token usage → cost), media generation is the missing half of the spend picture. Surfacing usage.billableUnits lets consumers compute exact USD cost and emit it to their analytics (PostHog $ai_generation, etc.) without each app reinventing a fetch interceptor + request-id registry.

Context

Found while restoring fal media cost analytics in an app on @tanstack/ai-fal@0.7.21 / @tanstack/ai@0.26.1. We currently work around it with an app-side fetch wrapper that captures x-fal-billable-units keyed by request id; that whole layer would be deleted if result.usage carried the units.

Happy to open a PR if this direction sounds right.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions