feat: enable prompt caching and cache token tracking for google-vertex-anthropic#20266
feat: enable prompt caching and cache token tracking for google-vertex-anthropic#20266rekram1-node merged 2 commits intoanomalyco:devfrom
Conversation
|
The following comment was made by an LLM, it may be inaccurate: Based on my search results, I found two potentially related PRs that address similar topics: Related PRs:
These PRs are related to overlapping concerns around prompt caching and Anthropic/Vertex provider detection, though they may be addressing different aspects or previous implementations. The current PR (20266) appears to be a follow-up or enhancement to consolidate this functionality. |
ca0bae4 to
7f9a967
Compare
Add explicit npm check for @ai-sdk/google-vertex/anthropic in the
applyCaching gate condition. The existing includes('anthropic') check
on model.api.id catches this implicitly, but an explicit npm check
is more robust against future refactoring and matches the pattern
used elsewhere (e.g. kimi-k2.5 thinking config at line 802).
The Anthropic SDK's AnthropicMessagesLanguageModel reads cache control
from providerOptions.anthropic (canonical key), which applyCaching
already sets. No changes to the cache format are needed.
Signed-off-by: Major Hayden <major@mhtx.net>
Extract cacheCreationInputTokens from the 'vertex' metadata key in addition to the existing 'anthropic' key. The Anthropic SDK always stores cache metadata under 'anthropic' (canonical), but for google-vertex-anthropic it also stores under 'vertex' (custom key derived from provider string 'vertex.anthropic.messages'). This ensures cache write token tracking works regardless of which metadata key the SDK prioritizes in future versions. For native google-vertex (Gemini), cache read tokens are already handled by the SDK's normalization to cachedInputTokens. Gemini uses implicit caching (automatic for 2.5+) with no client-reported cache writes. Signed-off-by: Major Hayden <major@mhtx.net>
7f9a967 to
941453b
Compare
|
I wonder if this needs to change too or not: |
|
This lgtm, I cant merge until ci passes and it's not ur fault that it fails. |
Issue for this PR
Closes #20265
Type of change
What does this PR do?
Adds explicit prompt caching and cache token tracking support for the
google-vertex-anthropicprovider.Commit 1 - prompt caching: Adds
model.providerID === "google-vertex-anthropic"to theapplyCaching()gate condition intransform.ts. The gate already catches this implicitly viamodel.api.id.includes("anthropic"), but an explicitproviderIDcheck is more stable and readable. Includes a test verifying cache control options are applied.Commit 2 - cache token tracking: Adds
input.metadata?.["vertex"]?.["cacheCreationInputTokens"]to the cache write extraction chain insession/index.ts. The Anthropic SDK on Vertex uses provider stringvertex.anthropic.messages, which derives a custom metadata key of"vertex". Response metadata is stored under both"anthropic"(canonical) and"vertex"(custom). The existing"anthropic"check handles the common case; the"vertex"fallback is defensive. Includes a test verifying extraction from the"vertex"metadata key.Why no native google-vertex (Gemini) changes? Gemini uses implicit server-side caching, not Anthropic-style per-message cache breakpoints. Adding Gemini to
applyCaching()would send cache control options the SDK ignores. Gemini's implicit caching already works without client-side changes (verified with test script: 97.8% cache hit on second request).How did you verify your code works?
test/provider/transform.test.ts(including new google-vertex-anthropic cache control test)test/session/compaction.test.ts(including new vertex metadata key extraction test)bun typecheckpasses cleanbun run devagainst live Vertex API:google-vertexcachedContentTokenCount: 28645out of 29,293 input tokens (97.8% cache hit)Screenshots / recordings
N/A - no UI changes.
Checklist