Heads up: v0.1.0-alpha.19 lands cacheControl typed write path + cost.cacheSavingsUSD rename (BREAKING) #41
baabakk
announced in
Announcements
Replies: 1 comment
-
|
Shipped: alpha.19 is live. Everything previewed in this thread landed exactly as described. Ship announcement with migration details and per-provider behavior table: #42 Release notes: https://github.com/baabakk/llm-ports/releases/tag/%40llm-ports%2Fcore%400.1.0-alpha.19 Install: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
v0.1.0-alpha.19 (heads up) —
cacheControlwrite path lands; BREAKING field-name changeStatus: Pre-release announcement. alpha.19 ships ~2026-06-13.
Where to post: GitHub Discussions, Announcements category. Same shape as the alpha.16 / .17 / .18 pre-cuts.
Audience: Anyone pinning to
@alphawho reads token usage or cost telemetry offCostUsage/TokenUsagefields, OR currently usingproviderExtrasto inject Anthropiccache_controlas a workaround.TL;DR
@llm-ports/core@0.1.0-alpha.19lands the typed write path for provider-native prompt caching: a singleCacheControlinterface that covers Anthropic explicit breakpoints, Vercel-style auto placement, and Gemini's pre-createdCachedContenthandle pattern. Pluscost.cacheSavingsUSDbecomes unconditionally populated on cache hits across all adapters.BREAKING the
CostUsage.cacheDiscountUSDfield is renamed tocacheSavingsUSDto align with OpenInferencellm.cost.*conventions. Consumers reading the cache-discount field directly will need a one-line update.This is the third alpha-line breaking change. It's an alpha bump precisely because we want this churn to happen now, before beta.0 cuts on 2026-06-30 and the contract freezes.
Where alpha.19 sits in the release line
The alpha line through beta.0 follows one principle: lock shape in alphas; freeze contract in beta.0. Each alpha addresses one specific structural decision surfaced by the v1 (8 dogfood projects) + v2 (~40 JS + Python competitor libraries) research effort. Recent history:
providerExtrasescape hatch for provider-specific request fields (vLLMchat_template_kwargs, SGLangregex/ebnf, vLLMguided_json); shippeduseStrictResponseFormatauto-detect for OpenAI / Cerebras / Groq / SambaNovaRerankPortskeleton (sibling ofLLMPort+EmbeddingsPortso adapter-cohere has a target for beta.0); jittered exponential backoff config (BackoffConfig+computeBackoffDelay);onRetryparity for adapter-google + adapter-ollamaLLMPortErrorbase;BadRequestErrorroot withContextWindowExceededError+ContentPolicyViolationErrorsubclasses;AuthenticationError;RateLimitErrorwith parsedretryAfterMs;ServiceUnavailableErrorroot;errorMatchershelperCacheControlwrite path +cost.cacheSavingsUSDfield-name normalization (BREAKING) — this releaseBudgetScope)gen_ai.*field-name alignment@llm-ports/adapter-cohere(firstRerankPortimplementation)The whole alpha sequence exists to surface breakage early, when adopters expect alpha churn, rather than at a beta cut that's supposed to be stable. After beta.0 ships, additive features go in beta.1 / beta.2 / beta.3; the typed surface itself doesn't move.
Why alpha.19 exists
alpha.16 added cache READ telemetry: when a provider returns cache-hit data, the adapter parses it into
TokenUsage.cacheReadTokens+CostUsage.cacheDiscountUSD. But there's still no typed write path. Adopters who want to actually opt into Anthropic'scache_control: { type: "ephemeral" }breakpoints, or trigger Vercel-style auto-placement, have to inject the field throughproviderExtrasand lose cross-provider portability.The competitor research (v2 §3.3) showed three provider patterns in the wild, none of which fit a single flat strategy enum:
cache_control: { type: "ephemeral" }markers on tool / system / message blocks; 5-minute or 1-hour TTL; manual placementusage.prompt_tokens_details.cached_tokensclient.caches.create({ contents, systemInstruction, tools, ttl })returns acachedContents/<id>handle that subsequent generations referenceA binary
strategy: "anthropic" | "openai"enum (which v1 originally proposed) can't represent the Gemini handle. Vercel AI Gateway'scaching: 'auto'mode covers the 80% case by letting the adapter decide per provider. The locked shape covers all four scenarios with one interface.The locked
CacheControlshapeIt gets threaded into every
*Optionsinterface (GenerateTextOptions,GenerateStructuredOptions,StreamTextOptions,StreamStructuredOptions,RunAgentOptions) and all 7 capability factories.cost.cacheSavingsUSDbecomes unconditionally populated on cache hits across all adapters. This is the field-name normalization that mirrors OpenInference'sllm.cost.cache_readconventions.The breaking change in concrete terms
One field rename:
That's the entire migration. Search-replace
cacheDiscountUSD→cacheSavingsUSDin your consumer code and you're done. Reason for the rename: "discount" framed the field as a price reduction; "savings" frames it as recoverable spend, which is the semantic OpenInference and Langfuse both standardize on.Not breaking:
CacheControlinterface is additive. Existing calls that don't passcacheControlcontinue to behave exactly as before.providerExtrasinjection of Anthropiccache_controlcontinues to work as an escape hatch.TokenUsage.cacheReadTokensis unchanged.ModelPricing.cacheReadPer1Mis unchanged.The typed-error taxonomy from alpha.18 (which was the first big alpha-line breaking change) is unchanged here — that contract is now stable through beta.0.
How to prepare for alpha.19
1. Pin to alpha.18 until you migrate. If your code reads
cacheDiscountUSDanywhere, pin@llm-ports/core@0.1.0-alpha.18explicitly until you've planned the field-name update. The npm@alphadist-tag will move to alpha.19 on ship day.2. Grep your codebase.
If you find hits, they all need to become
cacheSavingsUSD. Most consumers won't have any — the field was introduced in alpha.13 and is read mostly by observability wrappers.3. If you're injecting Anthropic
cache_controlviaproviderExtras, plan a migration. It will keep working in alpha.19, but the typedcacheControlfield is now the recommended path. Drop-in equivalent:4. If you're a Google Gemini user wanting context caching, you now have a typed surface:
This wasn't possible at all in alpha.18.
5. If you care about cost-savings observability, the new
cacheSavingsUSDis unconditionally populated on cache hits, so you can wire it to your dashboard without checking for undefined:The
?? 0becomes unnecessary in alpha.19 but harmless.What about the remaining alphas?
Two more alphas land before beta.0, both with smaller breakage:
BudgetScope5-tier hierarchy (tenant/customer/user/agent/session) replaces the binaryuser/sessionenum proposed in v1 research. Addsreq:N/minute+cost:N/minute+ session-scoped gating tokens. Affects budget configuration only; runtime call signatures unchanged.onCost/onTokenUsage/onFallback/onValidationRetry/onCacheHit). Field-name alignment with OTelgen_ai.*+ OpenInferencellm.*. SomeTokenUsagefield renames (e.g.inputTokenskeeps its name;outputTokenskeeps its name; cache field naming finalized). Mostly additive but treat as alpha churn.After alpha.21 ships, beta.0 is just a promotion of the alpha.21 surface plus
@llm-ports/adapter-cohere(which implements theRerankPortskeleton from alpha.17). Beta.0 cuts on 2026-06-30 and the contract freezes; from there beta.1 / beta.2 / beta.3 ship additive features (more provider adapters: Bedrock, Vertex, Mistral; BAML Schema-Aligned Repair; observability backends; etc.) without breaking beta.0 adopters.Why so much breakage now?
If this looks like a lot of breaking changes packed into June, that's intentional. The strategy is: do all the shape work in alphas where everyone expects it, ship beta.0 with a contract worth pinning to, and let the beta line iterate additively from there. Five alphas of churn over three weeks is the cost; two-plus years of stable v0.1.x is the buy.
The alternative was Option A from the v2 research: ship one mega-beta.0 with all the shape decisions baked in at once. We rejected it because it's too easy to get one signature wrong and discover it after the freeze, at which point the only options are (a) break the beta promise or (b) ship the wrong shape forever.
The chosen path makes each shape decision its own release with its own changelog and migration guide. That's how a typed library should evolve.
Install + bake clock
When alpha.19 ships:
All 7
@llm-ports/*packages bump together via the linked-group config.Target ship date: 2026-06-13.
To skip alpha.19 (and migrate at beta.0 instead), pin to alpha.18 explicitly:
pnpm add @llm-ports/core@0.1.0-alpha.18 \ @llm-ports/adapter-openai@0.1.0-alpha.18 \ @llm-ports/capabilities@0.1.0-alpha.18You'll do the
cacheDiscountUSD→cacheSavingsUSDrename in one shot at beta.0 instead of incrementally.Questions before ship?
cost.cacheSavingsUSD > 0is the load-bearing signal. Before alpha.19, checkTokenUsage.cacheReadTokens > 0.automode try to cache short prompts?" → No. The adapter respects each provider's threshold; OpenAI's implicit cache only kicks in at ≥1024 tokens regardless of whatcacheControlsays.providerExtrasfor both the typed surface AND escape-hatch fields?" → Yes. They're independent.cacheControlsets the typed surface;providerExtrasis for fields the port doesn't model.Discussion welcome below. The locked shape is what ships unless someone flags a real edge case the design missed.
Cross-references for adopters:
Beta Was this translation helpful? Give feedback.
All reactions