Skip to content

Conversation

@threepointone
Copy link
Collaborator

This PR is a ground-up cleanup of the workers-ai-provider package. It fixes real bugs (streaming was broken, tool round-trips through the binding failed), adds workarounds for documented Workers AI quirks, renames AutoRAG to AI Search, rewrites the README, and brings test coverage from 82 to 149 unit tests + full integration tests across 12 models.

Also migrates all 13 demos off deprecated AI SDK APIs (generateObject -> generateText + Output.object, experimental_generateImage -> generateImage), and adds a new examples/workers-ai demo app.

What was broken

  1. Streaming didn't actually stream. Two bugs: (a) the ReadableStream used an eager async start() that buffered everything before the consumer pulled, and (b) a heuristic silently fell back to non-streaming doGenerate whenever tools were defined. Every streamText call with tools was secretly a generateText wrapped in a fake stream.

  2. fetch-event-stream was a phantom dependency. Listed in the root package.json but not in the package's own — anyone installing from npm outside the monorepo got a runtime crash on any streaming call.

  3. Tool round-trips through the binding were broken. The binding generates IDs like chatcmpl-tool-875d3ec6179676ae but validates them against [a-zA-Z0-9]{9}. It also rejects content: null on tool-calling messages. Both are now normalized before every binding.run() call.

  4. REST API errors were silently swallowed. A 401 or 404 would flow through to response.json() and throw an opaque parse error.

  5. reasoning-delta events had wrong IDs in the simulated streaming path, causing reasoning content to be dropped by the AI SDK.

What's new

  • Proper streaming via a TransformStream pipeline (SSEDecoder -> event mapper) with backpressure. Verified with a timing-based test.
  • Workers AI quirk workarounds: sanitizeToolCallId, normalizeMessagesForBinding, null-finalization chunk filtering, numeric value coercion, dual stream format detection (native vs OpenAI-compatible).
  • createAISearch replaces createAutoRAG (which still works with a deprecation warning). Reflects the AutoRAG -> AI Search rename.
  • Graceful degradation: if binding.run() returns a non-streaming response despite stream: true, it wraps the response as a simulated stream instead of throwing.
  • Premature stream termination detection: emits finishReason: "error" instead of silently reporting "stop".
  • 149 unit tests (was 82), 10 test files (was 7), plus integration tests for REST + binding across 12 models.
  • New example app at examples/workers-ai: Vite + React + Cloudflare Workers with chat (streaming + tool calling + reasoning), image generation, and embeddings. Model dropdowns for each.
  • README rewritten from scratch.

Breaking changes

None. createAutoRAG preserves "autorag.chat" as the provider name. Models that don't support streaming with tools get graceful degradation (same output, just not token-by-token). Removed peer deps (zod, @ai-sdk/provider-utils) were unused in source.

Demo migrations

All 13 demos updated from deprecated AI SDK APIs:

  • generateObject -> generateText + Output.object (12 demos, ~25 call sites)
  • experimental_generateImage -> generateImage (1 demo)

Notes for reviewers

  1. The streaming fix is the most important change. Before: everything was buffered. After: proper TransformStream pipeline. The backpressure test in stream-text.test.ts ("should deliver chunks incrementally") verifies this with timing assertions.

  2. The simulateStreamFromGenerate fallback is gone but replaced by graceful degradation at the response level. If binding.run() returns an object instead of a stream, doStream wraps it. Three dedicated tests cover this (text, tool calls, reasoning). This is better than the old approach because it tries streaming first, only falling back when the binding refuses.

  3. sanitizeToolCallId is deterministic — same input always produces same output. This matters because the AI SDK stores the tool call ID from step 1 and passes it back in step 2. Both sides go through sanitizeToolCallId, so they match. The normalization is only applied on the binding path (isBinding flag), not REST.

  4. The SSEDecoder class in streaming.ts replaces both the fetch-event-stream dependency and the old parseSSEStream async generator. It's a TransformStream subclass that handles line buffering, partial chunks, and both data: and data: (no space) formats.

  5. Integration tests require credentials. REST tests need CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_API_TOKEN in .env. Binding tests start a wrangler dev server. Both skip gracefully without credentials. Run with pnpm test:e2e:rest / pnpm test:e2e:binding.

  6. The example app at examples/workers-ai is a standalone Vite + React app, not a demo. It uses the Cloudflare Vite plugin in SPA mode with a plain Workers fetch handler (no Hono). Good for testing the provider end-to-end with a real UI.

  7. The workersai-models.ts TODO (// This needs to be fixed to allow more models) is still there. The Exclude<TextGen, TextToImage> type could incorrectly exclude models that satisfy both interfaces. Worth investigating separately.

**Bug fixes:**

- Fixed phantom dependency on `fetch-event-stream` that caused runtime crashes when installed outside the monorepo. Replaced with a built-in SSE parser.
- Fixed streaming buffering: responses now stream token-by-token instead of arriving all at once. The root cause was twofold — an eager `ReadableStream` `start()` pattern that buffered all chunks, and a heuristic that silently fell back to non-streaming `doGenerate` whenever tools were defined. Both are fixed. Streaming now uses a proper `TransformStream` pipeline with backpressure.
- Fixed `reasoning-delta` ID mismatch in simulated streaming — was using `generateId()` instead of the `reasoningId` from the preceding `reasoning-start` event, causing the AI SDK to drop reasoning content.
- Fixed REST API client (`createRun`) silently swallowing HTTP errors. Non-200 responses now throw with status code and response body.
- Fixed `response_format` being sent as `undefined` on every non-JSON request. Now only included when actually set.
- Fixed `json_schema` field evaluating to `false` (a boolean) instead of `undefined` when schema was missing.

**Workers AI quirk workarounds:**

- Added `sanitizeToolCallId()` — strips non-alphanumeric characters and pads/truncates to 9 chars, fixing tool call round-trips through the binding which rejects its own generated IDs.
- Added `normalizeMessagesForBinding()` — converts `content: null` to `""` and sanitizes tool call IDs before every binding call. Only applied on the binding path (REST preserves original IDs).
- Added null-finalization chunk filtering for streaming tool calls.
- Added numeric value coercion in native-format streams (Workers AI sometimes returns numbers instead of strings for the `response` field).
- Improved image model to handle all output types from `binding.run()`: `ReadableStream`, `Uint8Array`, `ArrayBuffer`, `Response`, and `{ image: base64 }` objects.
- Graceful degradation: if `binding.run()` returns a non-streaming response despite `stream: true`, it wraps the complete response as a simulated stream instead of throwing.

**Premature stream termination detection:**

- Streams that end without a `[DONE]` sentinel now report `finishReason: "error"` with `raw: "stream-truncated"` instead of silently reporting `"stop"`.
- Stream read errors are caught and emit `finishReason: "error"` with `raw: "stream-error"`.

**AI Search (formerly AutoRAG):**

- Added `createAISearch` and `AISearchChatLanguageModel` as the canonical exports, reflecting the rename from AutoRAG to AI Search.
- `createAutoRAG` still works but emits a one-time deprecation warning pointing to `createAISearch`.
- `createAutoRAG` preserves `"autorag.chat"` as the provider name for backward compatibility.
- AI Search now warns when tools or JSON response format are provided (unsupported by the `aiSearch` API).
- Simplified AI Search internals — removed dead tool/response-format processing code.

**Code quality:**

- Removed dead code: `workersai-error.ts` (never imported), `workersai-image-config.ts` (inlined).
- Consistent file naming: renamed `workers-ai-embedding-model.ts` to `workersai-embedding-model.ts`.
- Replaced `StringLike` catch-all index signatures with `[key: string]: unknown` on settings types.
- Replaced `any` types with proper interfaces (`FlatToolCall`, `OpenAIToolCall`, `PartialToolCall`).
- Tightened `processToolCall` format detection to check `function.name` instead of just the presence of a `function` property.
- Removed `@ai-sdk/provider-utils` and `zod` peer dependencies (no longer used in source).
- Added `imageModel` to the `WorkersAI` interface type for consistency.

**Tests:**

- 149 unit tests across 10 test files (up from 82).
- New test coverage: `sanitizeToolCallId`, `normalizeMessagesForBinding`, `prepareToolsAndToolChoice`, `processText`, `mapWorkersAIUsage`, image model output types, streaming error scenarios (malformed SSE, premature termination, empty stream), backpressure verification, graceful degradation (non-streaming fallback with text/tools/reasoning), REST API error handling (401/404/500), AI Search warnings, embedding `TooManyEmbeddingValuesForCallError`, message conversion with images and reasoning.
- Integration tests for REST API and binding across 12 models and 7 categories (chat, streaming, multi-turn, tool calling, tool round-trip, structured output, image generation, embeddings).
- All tests use the AI SDK's public APIs (`generateText`, `streamText`, `generateImage`, `embedMany`) instead of internal `.doGenerate()`/`.doStream()` methods.

**README:**

- Rewritten from scratch with concise examples, model recommendations, configuration guide, and known limitations section.
- Updated to use current AI SDK v6 APIs (`generateText` + `Output.object` instead of deprecated `generateObject`, `generateImage` instead of `experimental_generateImage`, `stopWhen: stepCountIs(2)` instead of `maxSteps`).
- Added sections for tool calling, structured output, embeddings, image generation, and AI Search.
- Uses `wrangler.jsonc` format for configuration examples.
@changeset-bot
Copy link

changeset-bot bot commented Feb 10, 2026

🦋 Changeset detected

Latest commit: b0e91e6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
workers-ai-provider Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 10, 2026

Open in StackBlitz

npx https://pkg.pr.new/cloudflare/ai/ai-gateway-provider@393
npx https://pkg.pr.new/cloudflare/ai/workers-ai-provider@393

commit: b0e91e6

Update demos/structured-output-node/biome.json to use Biome schema 2.3.13. Add explicit type="button" to tab buttons in examples/workers-ai/src/client/App.tsx to avoid accidental form submission. Remove the unused HttpResponse import from packages/workers-ai-provider/test/stream-text.test.ts to address linter/test warnings.
@threepointone threepointone merged commit c07bc8c into main Feb 10, 2026
3 checks passed
@threepointone threepointone deleted the fix-workers-ai-provider branch February 10, 2026 21:04
@github-actions github-actions bot mentioned this pull request Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant