Heads up: v0.1.0-alpha.19 lands cacheControl typed write path + cost.cacheSavingsUSD rename (BREAKING) #41

baabakk · 2026-06-10T23:27:34Z

baabakk
Jun 10, 2026
Maintainer

v0.1.0-alpha.19 (heads up) — `cacheControl` write path lands; BREAKING field-name change

Status: Pre-release announcement. alpha.19 ships ~2026-06-13.

Where to post: GitHub Discussions, Announcements category. Same shape as the alpha.16 / .17 / .18 pre-cuts.

Audience: Anyone pinning to @alpha who reads token usage or cost telemetry off CostUsage / TokenUsage fields, OR currently using providerExtras to inject Anthropic cache_control as a workaround.

TL;DR

@llm-ports/core@0.1.0-alpha.19 lands the typed write path for provider-native prompt caching: a single CacheControl interface that covers Anthropic explicit breakpoints, Vercel-style auto placement, and Gemini's pre-created CachedContent handle pattern. Plus cost.cacheSavingsUSD becomes unconditionally populated on cache hits across all adapters.

BREAKING the CostUsage.cacheDiscountUSD field is renamed to cacheSavingsUSD to align with OpenInference llm.cost.* conventions. Consumers reading the cache-discount field directly will need a one-line update.

This is the third alpha-line breaking change. It's an alpha bump precisely because we want this churn to happen now, before beta.0 cuts on 2026-06-30 and the contract freezes.

Where alpha.19 sits in the release line

The alpha line through beta.0 follows one principle: lock shape in alphas; freeze contract in beta.0. Each alpha addresses one specific structural decision surfaced by the v1 (8 dogfood projects) + v2 (~40 JS + Python competitor libraries) research effort. Recent history:

Release	Date	What it locked
alpha.16	2026-05-30	`providerExtras` escape hatch for provider-specific request fields (vLLM `chat_template_kwargs`, SGLang `regex` / `ebnf`, vLLM `guided_json`); shipped `useStrictResponseFormat` auto-detect for OpenAI / Cerebras / Groq / SambaNova
alpha.17	2026-06-05	`RerankPort` skeleton (sibling of `LLMPort` + `EmbeddingsPort` so adapter-cohere has a target for beta.0); jittered exponential backoff config (`BackoffConfig` + `computeBackoffDelay`); `onRetry` parity for adapter-google + adapter-ollama
alpha.18	2026-06-05	Typed-error taxonomy commit (BREAKING): new `LLMPortError` base; `BadRequestError` root with `ContextWindowExceededError` + `ContentPolicyViolationError` subclasses; `AuthenticationError`; `RateLimitError` with parsed `retryAfterMs`; `ServiceUnavailableError` root; `errorMatchers` helper
alpha.19	2026-06-13 (target)	`CacheControl` write path + `cost.cacheSavingsUSD` field-name normalization (BREAKING) — this release
alpha.20	2026-06-17 (target)	BudgetScope 5-tier hierarchy + minute/session gating tokens (BREAKING on `BudgetScope`)
alpha.21	2026-06-20 (target)	Observability hook signatures + OTel `gen_ai.*` field-name alignment
beta.0	2026-06-30 (target)	Contract freeze + `@llm-ports/adapter-cohere` (first `RerankPort` implementation)

The whole alpha sequence exists to surface breakage early, when adopters expect alpha churn, rather than at a beta cut that's supposed to be stable. After beta.0 ships, additive features go in beta.1 / beta.2 / beta.3; the typed surface itself doesn't move.

Why alpha.19 exists

alpha.16 added cache READ telemetry: when a provider returns cache-hit data, the adapter parses it into TokenUsage.cacheReadTokens + CostUsage.cacheDiscountUSD. But there's still no typed write path. Adopters who want to actually opt into Anthropic's cache_control: { type: "ephemeral" } breakpoints, or trigger Vercel-style auto-placement, have to inject the field through providerExtras and lose cross-provider portability.

The competitor research (v2 §3.3) showed three provider patterns in the wild, none of which fit a single flat strategy enum:

Anthropic explicit breakpoints: up to 4 cache_control: { type: "ephemeral" } markers on tool / system / message blocks; 5-minute or 1-hour TTL; manual placement
OpenAI implicit prefix: automatic for prompts ≥1024 tokens; no API surface; cache hits observed via usage.prompt_tokens_details.cached_tokens
Google Gemini handle: client.caches.create({ contents, systemInstruction, tools, ttl }) returns a cachedContents/<id> handle that subsequent generations reference

A binary strategy: "anthropic" | "openai" enum (which v1 originally proposed) can't represent the Gemini handle. Vercel AI Gateway's caching: 'auto' mode covers the 80% case by letting the adapter decide per provider. The locked shape covers all four scenarios with one interface.

The locked `CacheControl` shape

// @llm-ports/core (alpha.19+)
interface CacheControl {
  /**
   * - "auto":       adapter decides per provider. Default for most adopters.
   *                 Anthropic: place cache_control on the last static block.
   *                 OpenAI: no-op (implicit). Gemini: opt-in only if cachedContentHandle is set.
   * - "manual":     caller passes breakpoints[]; adapter translates per provider.
   * - "preCreated": caller passes cachedContentHandle (Gemini-style; ignored for others).
   * - "off":        disable caching even when the adapter would otherwise enable it.
   */
  mode: "auto" | "manual" | "preCreated" | "off";

  /**
   * Anthropic accepts 300 (5m) or 3600 (1h). Gemini accepts arbitrary durations.
   * OpenAI ignores this (implicit caching has its own TTL policy).
   */
  ttlSeconds?: number;

  /**
   * Used when mode: "manual". Each breakpoint marks a position in the request
   * where the adapter should insert a cache_control marker (Anthropic) or
   * truncate the cache prefix (others).
   */
  breakpoints?: Array<{
    at: "tools" | "system" | "message-index";
    index?: number;
  }>;

  /**
   * Used when mode: "preCreated". Provide the cachedContents/<id> handle
   * returned by Gemini's createCachedContent. Ignored by Anthropic / OpenAI.
   */
  cachedContentHandle?: string;

  /**
   * Per-tenant cache partition. Mirrors Helicone's Cache-Seed pattern.
   * When set, cache keys are namespaced so two tenants can't share cached
   * prefixes even when their prompts collide.
   */
  namespace?: string;
}

It gets threaded into every *Options interface (GenerateTextOptions, GenerateStructuredOptions, StreamTextOptions, StreamStructuredOptions, RunAgentOptions) and all 7 capability factories.

cost.cacheSavingsUSD becomes unconditionally populated on cache hits across all adapters. This is the field-name normalization that mirrors OpenInference's llm.cost.cache_read conventions.

The breaking change in concrete terms

One field rename:

// alpha.18 and earlier
interface CostUsage {
  inputUSD: number;
  outputUSD: number;
  totalUSD: number;
  cacheDiscountUSD?: number;  // ← gets renamed
}

// alpha.19+
interface CostUsage {
  inputUSD: number;
  outputUSD: number;
  totalUSD: number;
  cacheSavingsUSD?: number;   // ← new name; unconditionally populated on cache hits
}

That's the entire migration. Search-replace cacheDiscountUSD → cacheSavingsUSD in your consumer code and you're done. Reason for the rename: "discount" framed the field as a price reduction; "savings" frames it as recoverable spend, which is the semantic OpenInference and Langfuse both standardize on.

Not breaking:

The new CacheControl interface is additive. Existing calls that don't pass cacheControl continue to behave exactly as before.
providerExtras injection of Anthropic cache_control continues to work as an escape hatch.
TokenUsage.cacheReadTokens is unchanged.
ModelPricing.cacheReadPer1M is unchanged.

The typed-error taxonomy from alpha.18 (which was the first big alpha-line breaking change) is unchanged here — that contract is now stable through beta.0.

How to prepare for alpha.19

1. Pin to alpha.18 until you migrate. If your code reads cacheDiscountUSD anywhere, pin @llm-ports/core@0.1.0-alpha.18 explicitly until you've planned the field-name update. The npm @alpha dist-tag will move to alpha.19 on ship day.

2. Grep your codebase.

rg 'cacheDiscountUSD' --type ts
rg 'cacheDiscountUSD' --type js

If you find hits, they all need to become cacheSavingsUSD. Most consumers won't have any — the field was introduced in alpha.13 and is read mostly by observability wrappers.

3. If you're injecting Anthropic cache_control via providerExtras, plan a migration. It will keep working in alpha.19, but the typed cacheControl field is now the recommended path. Drop-in equivalent:

// Old (still works, but discouraged from alpha.19+)
await port.generateText({
  prompt: "...",
  providerExtras: {
    system: [
      { type: "text", text: "...", cache_control: { type: "ephemeral" } },
    ],
  },
});

// New (typed; portable across Anthropic + OpenAI + Gemini)
await port.generateText({
  prompt: "...",
  cacheControl: {
    mode: "auto",         // adapter places the cache breakpoint for you
    ttlSeconds: 300,      // Anthropic 5-minute cache; Gemini honors arbitrary values
  },
});

4. If you're a Google Gemini user wanting context caching, you now have a typed surface:

// Step 1: create the cached content (one-time, before the conversation)
const handle = await geminiClient.caches.create({
  contents: [...],
  ttl: "3600s",
});

// Step 2: reference the handle on every subsequent generation
await port.generateText({
  prompt: "...",
  cacheControl: {
    mode: "preCreated",
    cachedContentHandle: handle.name,  // "cachedContents/xyz"
  },
});

This wasn't possible at all in alpha.18.

5. If you care about cost-savings observability, the new cacheSavingsUSD is unconditionally populated on cache hits, so you can wire it to your dashboard without checking for undefined:

const onCost = (e: CostUsage) => {
  metrics.gauge("llm.cost.total_usd", e.totalUSD);
  metrics.gauge("llm.cost.cache_savings_usd", e.cacheSavingsUSD ?? 0);  // safe in alpha.18; explicit fallback in alpha.19
};

The ?? 0 becomes unnecessary in alpha.19 but harmless.

What about the remaining alphas?

Two more alphas land before beta.0, both with smaller breakage:

alpha.20 (~2026-06-17): BudgetScope 5-tier hierarchy (tenant / customer / user / agent / session) replaces the binary user / session enum proposed in v1 research. Adds req:N/minute + cost:N/minute + session-scoped gating tokens. Affects budget configuration only; runtime call signatures unchanged.
alpha.21 (~2026-06-20): observability hook signatures committed (5 hooks: onCost / onTokenUsage / onFallback / onValidationRetry / onCacheHit). Field-name alignment with OTel gen_ai.* + OpenInference llm.*. Some TokenUsage field renames (e.g. inputTokens keeps its name; outputTokens keeps its name; cache field naming finalized). Mostly additive but treat as alpha churn.

After alpha.21 ships, beta.0 is just a promotion of the alpha.21 surface plus @llm-ports/adapter-cohere (which implements the RerankPort skeleton from alpha.17). Beta.0 cuts on 2026-06-30 and the contract freezes; from there beta.1 / beta.2 / beta.3 ship additive features (more provider adapters: Bedrock, Vertex, Mistral; BAML Schema-Aligned Repair; observability backends; etc.) without breaking beta.0 adopters.

Why so much breakage now?

If this looks like a lot of breaking changes packed into June, that's intentional. The strategy is: do all the shape work in alphas where everyone expects it, ship beta.0 with a contract worth pinning to, and let the beta line iterate additively from there. Five alphas of churn over three weeks is the cost; two-plus years of stable v0.1.x is the buy.

The alternative was Option A from the v2 research: ship one mega-beta.0 with all the shape decisions baked in at once. We rejected it because it's too easy to get one signature wrong and discover it after the freeze, at which point the only options are (a) break the beta promise or (b) ship the wrong shape forever.

The chosen path makes each shape decision its own release with its own changelog and migration guide. That's how a typed library should evolve.

Install + bake clock

When alpha.19 ships:

pnpm add @llm-ports/core@alpha @llm-ports/adapter-openai@alpha @llm-ports/capabilities@alpha

All 7 @llm-ports/* packages bump together via the linked-group config.

Target ship date: 2026-06-13.

To skip alpha.19 (and migrate at beta.0 instead), pin to alpha.18 explicitly:

pnpm add @llm-ports/core@0.1.0-alpha.18 \
         @llm-ports/adapter-openai@0.1.0-alpha.18 \
         @llm-ports/capabilities@0.1.0-alpha.18

You'll do the cacheDiscountUSD → cacheSavingsUSD rename in one shot at beta.0 instead of incrementally.

Questions before ship?

"How do I know my Anthropic cache_control is firing?" → After alpha.19, cost.cacheSavingsUSD > 0 is the load-bearing signal. Before alpha.19, check TokenUsage.cacheReadTokens > 0.
"Will alpha.19's auto mode try to cache short prompts?" → No. The adapter respects each provider's threshold; OpenAI's implicit cache only kicks in at ≥1024 tokens regardless of what cacheControl says.
"Can I use cacheControl with providerExtras for both the typed surface AND escape-hatch fields?" → Yes. They're independent. cacheControl sets the typed surface; providerExtras is for fields the port doesn't model.
"What's the v0.2 cache story?" → Semantic caching (embedding similarity) was deferred to v0.2 per the v2 research; reference implementations (GPTCache, Portkey enterprise) aren't shape-clean for a port-layer abstraction yet.

Discussion welcome below. The locked shape is what ships unless someone flags a real edge case the design missed.

Cross-references for adopters:

Master plan §4.3 (BEPA-internal): the design decisions and rationale
v2 §3.3: competitor research showing the 3 provider patterns + Vercel auto consensus
Unified priorities §6: where this candidate landed in the ranked beta scope (P0 rank 1; combined score 20)
alpha.18 release notes: the prior breaking change, for context on the alpha-line churn pattern

baabakk · 2026-06-12T19:26:20Z

baabakk
Jun 12, 2026
Maintainer Author

Shipped: alpha.19 is live. Everything previewed in this thread landed exactly as described.

Ship announcement with migration details and per-provider behavior table: #42

Release notes: https://github.com/baabakk/llm-ports/releases/tag/%40llm-ports%2Fcore%400.1.0-alpha.19

Install: npm install @llm-ports/core@alpha resolves to 0.1.0-alpha.19 now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heads up: v0.1.0-alpha.19 lands cacheControl typed write path + cost.cacheSavingsUSD rename (BREAKING) #41

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Heads up: v0.1.0-alpha.19 lands cacheControl typed write path + cost.cacheSavingsUSD rename (BREAKING) #41

Uh oh!

baabakk Jun 10, 2026 Maintainer

v0.1.0-alpha.19 (heads up) — cacheControl write path lands; BREAKING field-name change

TL;DR

Where alpha.19 sits in the release line

Why alpha.19 exists

The locked CacheControl shape

The breaking change in concrete terms

How to prepare for alpha.19

What about the remaining alphas?

Why so much breakage now?

Install + bake clock

Questions before ship?

Replies: 1 comment

Uh oh!

baabakk Jun 12, 2026 Maintainer Author

baabakk
Jun 10, 2026
Maintainer

v0.1.0-alpha.19 (heads up) — `cacheControl` write path lands; BREAKING field-name change

The locked `CacheControl` shape

baabakk
Jun 12, 2026
Maintainer Author