Skip to content

feat: enable prompt caching and cache token tracking for google-vertex-anthropic#20266

Merged
rekram1-node merged 2 commits intoanomalyco:devfrom
major:feat/vertex-caching
Mar 31, 2026
Merged

feat: enable prompt caching and cache token tracking for google-vertex-anthropic#20266
rekram1-node merged 2 commits intoanomalyco:devfrom
major:feat/vertex-caching

Conversation

@major
Copy link
Copy Markdown
Contributor

@major major commented Mar 31, 2026

Issue for this PR

Closes #20265

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds explicit prompt caching and cache token tracking support for the google-vertex-anthropic provider.

Commit 1 - prompt caching: Adds model.providerID === "google-vertex-anthropic" to the applyCaching() gate condition in transform.ts. The gate already catches this implicitly via model.api.id.includes("anthropic"), but an explicit providerID check is more stable and readable. Includes a test verifying cache control options are applied.

Commit 2 - cache token tracking: Adds input.metadata?.["vertex"]?.["cacheCreationInputTokens"] to the cache write extraction chain in session/index.ts. The Anthropic SDK on Vertex uses provider string vertex.anthropic.messages, which derives a custom metadata key of "vertex". Response metadata is stored under both "anthropic" (canonical) and "vertex" (custom). The existing "anthropic" check handles the common case; the "vertex" fallback is defensive. Includes a test verifying extraction from the "vertex" metadata key.

Why no native google-vertex (Gemini) changes? Gemini uses implicit server-side caching, not Anthropic-style per-message cache breakpoints. Adding Gemini to applyCaching() would send cache control options the SDK ignores. Gemini's implicit caching already works without client-side changes (verified with test script: 97.8% cache hit on second request).

How did you verify your code works?

  • 120 tests pass in test/provider/transform.test.ts (including new google-vertex-anthropic cache control test)
  • 37 tests pass in test/session/compaction.test.ts (including new vertex metadata key extraction test)
  • bun typecheck passes clean
  • Tested with bun run dev against live Vertex API:
    • Vertex Anthropic (claude-opus-4-6): cache write tokens tracked correctly (46,946 tokens)
    • Vertex Gemini (gemini-3.1-pro-preview): responses work, provider identified as google-vertex
    • Gemini implicit caching: standalone test showed cachedContentTokenCount: 28645 out of 29,293 input tokens (97.8% cache hit)

Screenshots / recordings

N/A - no UI changes.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search results, I found two potentially related PRs that address similar topics:

Related PRs:

  1. PR feat(provider): add Google Vertex/AI context caching annotations #17569 - feat(provider): add Google Vertex/AI context caching annotations

  2. PR fix(opencode): use api.npm to detect Anthropic SDK for cache control #14643 - fix(opencode): use api.npm to detect Anthropic SDK for cache control

These PRs are related to overlapping concerns around prompt caching and Anthropic/Vertex provider detection, though they may be addressing different aspects or previous implementations. The current PR (20266) appears to be a follow-up or enhancement to consolidate this functionality.

@major major force-pushed the feat/vertex-caching branch from ca0bae4 to 7f9a967 Compare March 31, 2026 13:04
@major major changed the title feat: wire up prompt caching and cache token tracking for Google Vertex providers feat: enable prompt caching and cache token tracking for google-vertex-anthropic Mar 31, 2026
major added 2 commits March 31, 2026 08:09
Add explicit npm check for @ai-sdk/google-vertex/anthropic in the
applyCaching gate condition. The existing includes('anthropic') check
on model.api.id catches this implicitly, but an explicit npm check
is more robust against future refactoring and matches the pattern
used elsewhere (e.g. kimi-k2.5 thinking config at line 802).

The Anthropic SDK's AnthropicMessagesLanguageModel reads cache control
from providerOptions.anthropic (canonical key), which applyCaching
already sets. No changes to the cache format are needed.

Signed-off-by: Major Hayden <major@mhtx.net>
Extract cacheCreationInputTokens from the 'vertex' metadata key in
addition to the existing 'anthropic' key. The Anthropic SDK always
stores cache metadata under 'anthropic' (canonical), but for
google-vertex-anthropic it also stores under 'vertex' (custom key
derived from provider string 'vertex.anthropic.messages').

This ensures cache write token tracking works regardless of which
metadata key the SDK prioritizes in future versions.

For native google-vertex (Gemini), cache read tokens are already
handled by the SDK's normalization to cachedInputTokens. Gemini uses
implicit caching (automatic for 2.5+) with no client-reported cache
writes.

Signed-off-by: Major Hayden <major@mhtx.net>
@major major force-pushed the feat/vertex-caching branch from 7f9a967 to 941453b Compare March 31, 2026 13:09
@rekram1-node
Copy link
Copy Markdown
Collaborator

@rekram1-node
Copy link
Copy Markdown
Collaborator

This lgtm, I cant merge until ci passes and it's not ur fault that it fails.

@rekram1-node rekram1-node merged commit 26cc924 into anomalyco:dev Mar 31, 2026
16 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Enable prompt caching and cache token tracking for google-vertex-anthropic

2 participants