workers-ai-provider@3.2.0

Latest

Latest

github-actions released this 16 Jun 10:01

· 1 commit to main since this release

workers-ai-provider@3.2.0

632e561

Minor Changes

#573 4f19489 Thanks @threepointone! - Add AI Gateway routing for third-party catalog models to createWorkersAI, with capability-driven transport selection, the full provider registry, a bring-your-own-provider wrapper, typed errors, and client/server fallback.

Experimental. This is a substantial new surface for the package — well beyond its original job of wrapping Workers AI — and several behaviors rely on undocumented AI Gateway internals (the cf-aig-run-id resume buffer, per-provider run-path wire formats). Treat the entire third-party / gateway surface as experimental: the API may change, and provider coverage maturity varies (only the run-catalog providers are live-verified end-to-end). It does not affect the existing stable Workers AI / AI Search APIs.

createWorkersAI is the single public entry point. Pass an optional providers array (wire-format plugins from the sub-paths below). When set, a "<provider>/<model>" catalog slug passed to the provider (or .chat) is routed through AI Gateway automatically, while @cf/... ids continue to build Workers AI models. Each slug is resolved against a registry of every AI Gateway provider, and the transport is picked from the requested options: the run path (env.AI.run) for resumable streaming (cf-aig-run-id, the default, on the unified-billing run catalog), or the gateway path (env.AI.gateway(id).run([…])) for BYOK providers, server-side fallback, and caching. Incompatible option combinations (e.g. resume: true with fallback.mode: "server", or resume/transport: "run" on a BYOK provider) throw a clear GatewayDelegateError; resume-disabling combinations warn loudly. This is fully additive: leaving providers unset preserves the prior behavior exactly, and passing a catalog slug without it throws a helpful error. The chat factory's settings argument is typed from the model id literal — a "<provider>/<model>" slug autocompletes DelegateCallOptions, while a @cf/... id autocompletes WorkersAIChatSettings. gateway is optional for catalog routing — when unset, requests use the account's "default" AI Gateway; set gateway (here or per call) to target a specific one.

New sub-path exports:
- workers-ai-provider/openai, workers-ai-provider/anthropic, workers-ai-provider/google — provider plugins keyed by wire format. One openai plugin serves the OpenAI-compatible long tail (deepseek, xai/grok, groq, mistral, perplexity, cerebras, openrouter, fireworks) plus the unified-catalog chat providers alibaba (Qwen) and minimax. @ai-sdk/openai, @ai-sdk/anthropic, and @ai-sdk/google are optional peer dependencies; install only the ones whose wire formats you use. The openai plugin is required for the run path (see below). Providers whose gateway-path URL isn't reproducible from the shared builder (cohere, baseten, parallel, azure-openai, google-vertex) and provider-native/non-chat providers are bring-your-own-provider only.
- workers-ai-provider/gateway — createGatewayFetch / createGatewayProvider wrap any @ai-sdk/* provider so its traffic flows through AI Gateway (provider id detected from the request URL, or set explicitly). Use it for provider-native or non-chat providers the slug routing can't auto-wire (bedrock, replicate, audio/image), or for full control of the underlying provider.
The transport types, error classes (WorkersAIGatewayError, WorkersAIFallbackError, GatewayDelegateError), the registry helpers, DelegateCallOptions, and createResumableStream are re-exported from the package root.

Features:
- Provider registry (GATEWAY_PROVIDERS, findProviderBySlug, detectProviderByUrl) maps slugs to gateway provider ids, wire formats, billing model, and run-catalog membership. Covers every provider in the AI Gateway directory (OpenAI, Anthropic, Google AI Studio/Vertex, xAI, Groq, DeepSeek, Mistral, Perplexity, Cerebras, OpenRouter, Cohere, Baseten, Parallel, Azure OpenAI, Amazon Bedrock, HuggingFace, Replicate, Fal, Ideogram, Cartesia, Deepgram, ElevenLabs — plus Fireworks), with URL host patterns so createGatewayFetch auto-detects each from the wrapped provider's request URL. Also includes the unified-catalog chat providers alibaba (Qwen) and minimax on the resumable run catalog (verified live: OpenAI-wire, cf-aig-run-id on streams); these are run-path only (gatewayPath: false — not native gateway providers), so caching, server-side fallback, and transport: "gateway" are rejected with a clear GatewayDelegateError instead of failing upstream.
- Metadata & logging — metadata (custom log attributes for spend attribution) and collectLog are first-class call options on both transports. On the run path they fold into the typed gateway options; on the gateway path they become cf-aig-metadata / cf-aig-collect-log headers (bigint metadata values are coerced to strings). Call-level metadata merges over (and wins against) any metadata set via gateway: { metadata }.
- BYOK — set byok: true (+ supply the key via extraHeaders) to forward the upstream provider key on the gateway path; otherwise provider auth headers are stripped so unified billing / the gateway's stored key applies.
- Client-side fallback (fallback.mode: "client") keeps resume per leg — a failed pre-stream dispatch falls through to the next model; if all fail, a WorkersAIFallbackError carries the per-attempt tree. Server-side fallback (fallback.mode: "server") routes same-vendor fallbacks through the gateway path.
- Typed errors — WorkersAIGatewayError (with a coarse code, a recoverable hint, and the parsed CF/provider envelope) and WorkersAIFallbackError (attempt tree). Helpers classifyStatus / extractErrorMessage are exported.
- Abort + gateway options are passed through on both transports.
On the run path, the response stream is wrapped so a transient mid-stream drop reconnects through the gateway resume endpoint (resume?from=N) transparently — the @ai-sdk parser never sees the break. from is an SSE event index, so the wrapper emits only complete events and realigns on the boundary after a drop (no duplicated or truncated bytes). When the gateway buffer expires (404, ~5.5 min TTL), an onResumeExpired policy controls whether the stream errors ("error", the default) or ends with partial output ("accept-partial").

For cross-invocation recovery (e.g. a new Durable Object invocation after eviction), createResumableStream is exported and accepts no initial body plus a fromEvent offset — it re-attaches by resuming directly from that event index. An onProgress(eventOffset) callback (also surfaced on the delegate as a call option) reports the live SSE event offset so callers can persist { runId, eventOffset } and re-attach later.

Run-path wire format (per-provider): on the resumable run path (env.AI.run), Cloudflare's unified catalog normalizes most providers to OpenAI chat-completions wire (so google/… is parsed with the openai plugin on the run path, even though the gateway path uses the native google plugin), but passes Anthropic through natively (content[].text, native tool shape) — so anthropic/… is parsed with the anthropic plugin on both paths. The registry records this as runWireFormat (defaults to "openai"). Include openai for the openai-wire run-path providers (openai, google, xai/grok, groq) and anthropic to use anthropic/…; the delegate throws a clear GatewayDelegateError naming the exact plugin a transport needs if it's missing.

Patch Changes

#563 231c19b Thanks @slegarraga! - Validate file parts in chat messages before sending them to Workers AI.

Previously every file part in a user message was unconditionally wrapped as
an image_url, regardless of its mediaType. Non-image files (e.g.
application/pdf, audio/*, video/*, application/octet-stream) were
forwarded as if they were valid vision inputs, and a missing mediaType
silently defaulted to image/png, producing a corrupt data URL.

Now convertToWorkersAIChatMessages:
- throws an UnsupportedFunctionalityError when a file part has a
  non-image/* mediaType, or no mediaType at all, instead of forwarding
  broken multimodal content;
- matches the image/ prefix case-insensitively (per RFC 2045), so media
  types such as IMAGE/JPEG are accepted while the caller's original casing
  is preserved in the emitted data URL;
- preserves the provided image mediaType instead of defaulting missing
  media types to image/png.
This is a behavior change: inputs that previously "succeeded" with broken or
defaulted media types now throw a clear, catchable error. Type-correct callers
(the AI SDK always sets mediaType on file parts) are unaffected for valid
image inputs.
#575 65e0735 Thanks @threepointone! - Map the AI SDK's forced single-tool choice to the documented named-function form.

Previously toolChoice: { type: "tool", toolName } was downgraded to
tool_choice: "required" (with the tool list filtered to the single function).
Workers AI treats "required" as advisory: on long contexts and reasoning
models (e.g. @cf/google/gemma-4-26b-a4b-it, @cf/qwen/qwq-32b,
@cf/qwen/qwen3-30b-a3b-fp8) the model would "fail open" and answer in prose
instead of calling the requested tool.

Now the provider sends the OpenAI-style named-function form
tool_choice: { type: "function", function: { name } }, which Workers AI
enforces server-side, and keeps the full tool list (matching OpenAI semantics
and preserving tool-result context fidelity).

Note: forcing a tool on a reasoning model with insufficient max_tokens is
validated server-side and now surfaces as a clear error (Workers AI 8006)
rather than silently producing no tool call.

Additionally, recover forced tool calls that gpt-oss models leak as text.
When a tool is forced, gpt-oss (harmony format) sometimes emits the tool call
as raw JSON in message.content with an empty tool_calls array and
finish_reason: "stop". The provider now detects this — only when a tool was
forced and the leaked JSON's name matches a requested tool — and
reinterprets it as a structured tool call (with finishReason: "tool-calls"
and a warning), across both generateText and streamText. Ambiguous leaks
(harmony channel/role names, hallucinated names) are left untouched to avoid
fabricating bogus calls.
#570 104c4a7 Thanks @threepointone! - Refresh Workers AI model references from the deprecated @cf/moonshotai/kimi-k2.5 to the current @cf/moonshotai/kimi-k2.7-code in the README and inline source documentation.
#576 a360e7a Thanks @threepointone! - Keep structured-output name/description instead of dropping them on native Workers AI models.

Output.object({ schema, name, description }) and generateObject({ schema, schemaName, schemaDescription }) pass a name/description alongside the JSON
schema. On the native @cf/... path the provider previously forwarded only the
bare schema as response_format.json_schema and silently discarded both.

Native Workers AI expects json_schema to be a bare JSON Schema, not
OpenAI's { name, schema, strict } envelope, so we can't just wrap it (that
would break native models). Instead the name is folded into the schema's
standard title keyword and the description into its description keyword —
the payload stays a valid bare schema while the guidance reaches the model.
Existing schema-level title/description are never overwritten and the input
schema is not mutated.

Note on issue #559: the reported failure was OpenAI partner models (e.g.
openai/gpt-5.4-mini) rejecting requests with Missing required parameter: 'response_format.json_schema.name'. Partner-model slugs are no longer handled
by this code path at all — they route through the AI Gateway delegate and the
real @ai-sdk/* providers, which build the required json_schema.name envelope
themselves (configure them via createWorkersAI({ binding, providers: [openai] })). This change covers the remaining native-model gap where that guidance was
being dropped.

See #559.

Assets 2