Skip to content

feat: add sessionAffinity setting for prefix-cache optimization#444

Merged
threepointone merged 2 commits intocloudflare:mainfrom
mchenco:mchen/session-affinity
Mar 18, 2026
Merged

feat: add sessionAffinity setting for prefix-cache optimization#444
threepointone merged 2 commits intocloudflare:mainfrom
mchenco:mchen/session-affinity

Conversation

@mchenco
Copy link
Contributor

@mchenco mchenco commented Mar 16, 2026

Summary

Adds a sessionAffinity option to both workers-ai-provider and @cloudflare/tanstack-ai that sends an x-session-affinity header with inference requests, routing requests with the same key to the same backend replica for prefix-cache reuse.

workers-ai-provider

const model = workersai("@cf/meta/llama-3.3-70b-instruct-fp8-fast", {
  sessionAffinity: "my-unique-session-id",
});

@cloudflare/tanstack-ai

const adapter = createWorkersAiChat("@cf/meta/llama-3.3-70b-instruct-fp8-fast", {
  binding: env.AI,
  sessionAffinity: "my-unique-session-id",
});

Context

Workers AI already accepts the x-session-affinity header and routes accordingly. This PR adds the client-side plumbing so both packages can send it. A follow-up PR to the agents package will auto-set this using the Durable Object ID so AIChatAgent users get prefix-cache optimization for free.

Changes

workers-ai-provider

File Change
workersai-chat-settings.ts Add sessionAffinity?: string field
workersai-chat-language-model.ts Map sessionAffinityextraHeaders, merged with any user-provided extraHeaders so neither is silently dropped
utils.ts Forward extraHeaders in REST fetch headers instead of discarding them
README.md Document per-model settings (sessionAffinity, safePrompt)
text-generation.test.ts 6 new tests (binding + REST × set / unset / merge with user-provided extraHeaders)

@cloudflare/tanstack-ai

File Change
create-fetcher.ts Add sessionAffinity?: string to WorkersAiAdapterConfig; update createWorkersAiBindingFetch to forward extraHeaders to binding.run()
workers-ai.ts Thread sessionAffinity through all three config modes (binding via extraHeaders, REST via defaultHeaders, gateway via createGatewayFetch headers)
README.md Document sessionAffinity config option
binding-fetch.test.ts 2 new tests (extraHeaders forwarded / not forwarded)
workers-ai-adapter.test.ts 2 new tests (end-to-end sessionAffinity through binding path)

Test plan

  • All 200 existing workers-ai-provider tests pass, 6 new tests added
  • All 214 existing @cloudflare/tanstack-ai tests pass, 4 new tests added
  • Tests cover: header set, header not set, and merge with user-provided extraHeaders for both binding and REST paths

Add sessionAffinity option to WorkersAIChatSettings that sends an
x-session-affinity header with inference requests. This routes requests
with the same key to the same backend replica, improving prefix-cache
hit rates across conversation turns.

- Binding path: sessionAffinity is passed as extraHeaders to binding.run()
- REST path: extraHeaders are now forwarded in fetch headers instead of
  being discarded
@changeset-bot
Copy link

changeset-bot bot commented Mar 16, 2026

🦋 Changeset detected

Latest commit: 414b4d5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
workers-ai-provider Patch
@cloudflare/tanstack-ai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 16, 2026

Open in StackBlitz

npx https://pkg.pr.new/cloudflare/ai/ai-gateway-provider@444
npx https://pkg.pr.new/cloudflare/ai/@cloudflare/tanstack-ai@444
npx https://pkg.pr.new/cloudflare/ai/workers-ai-provider@444

commit: 414b4d5

Add a sessionAffinity option across Workers AI adapters/providers to route requests with the same key to the same backend replica via the x-session-affinity header for prefix-cache optimization. Implementation details:

- Extend WorkersAiAdapterConfig with an optional sessionAffinity string.
- Propagate sessionAffinity as x-session-affinity to binding.run() via createWorkersAiBindingFetch(extraHeaders), to REST requests via defaultHeaders, and to gateway mode via createGatewayFetch call.
- Merge sessionAffinity with user-provided extraHeaders in the WorkersAI provider so both headers are forwarded together.

Other changes:

- Add and update tests covering binding.fetch, adapter behavior, and REST/binding header merging.
- Update README docs for tanstack-ai and workers-ai-provider to document sessionAffinity usage.
- Add changeset files to trigger a patch release for the relevant packages and minor formatting updates to demos.json.
Copy link
Collaborator

@threepointone threepointone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a version for tanstack-ai as well

@threepointone threepointone merged commit 8b82164 into cloudflare:main Mar 18, 2026
3 checks passed
@github-actions github-actions bot mentioned this pull request Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants