Shipped: 0.1.0-alpha.19 — CacheControl shape lock + cacheSavingsUSD rename (BREAKING) #42

baabakk · 2026-06-12T19:25:44Z

baabakk
Jun 12, 2026
Maintainer

Shipped: 0.1.0-alpha.19. npm install @llm-ports/core@alpha resolves to it now. Everything previewed in #41 landed exactly as described, including the BREAKING rename.

This is the third of four shape-locks before beta.0 on 2026-06-30. Two more alphas in the next 8 days, then the surface stops moving and the beta minors start shipping the things you actually want (more on that below).

The breaking thing (do this first)

cost.cacheDiscountUSD is dead. Long live cost.cacheSavingsUSD.

The old name implied the provider was generously giving you a discount. They aren't. It's your money you didn't have to spend because you re-sent the same context. OpenInference's llm.cost.cache_savings, Helicone's dashboards, and every observability vendor in the field already use "savings". We were the odd one out. Fixed.

- if (result.cost.cacheDiscountUSD !== undefined) {
+ if (result.cost.cacheSavingsUSD !== undefined) {

TypeScript catches every read site. Runtime code that hand-rolled the old name silently resolves to undefined, so check your dashboards too.

The new typed surface

import type { CacheControl } from "@llm-ports/core";

const result = await port.generateText({
  taskType: "summary",
  instructions: longSystemPrompt,
  prompt: shortUserTurn,
  cacheControl: {
    mode: "manual",
    breakpoints: [{ at: "system" }],
    ttlSeconds: 3600,
  },
});

One field, three providers, four modes. Anthropic explicit markers, OpenAI implicit always-on, Gemini handle pattern — all collapsed behind one optional field that the adapter translates per provider:

Mode	Anthropic	OpenAI	Gemini
`auto`	place marker at last static block	no-op (implicit cache always on)	no-op
`manual`	place markers at your breakpoints	no-op	no-op
`preCreated`	no-op	no-op	uses your `cachedContentHandle`
`off`	strip `cache_control` from blocks	no-op (no API)	no-op (no API)

Plus namespace for per-tenant cache partitioning through Helicone-style proxies, and ttlSeconds for Anthropic's 300/3600 tiers or Gemini's arbitrary positive values.

Shape is locked. Per-mode adapter behaviors mature across beta minors without breaking the shape. Write call sites against it today; they won't change.

Full per-provider behavior in docs/concepts/cache.md.

What's coming (since you'll ask anyway)

The next 18 days finish the shape-lock sequence:

alpha.20 (Tue 2026-06-17). BudgetScope. Five tiers: tenant → customer → user → agent → session. Your noisy customer can't burn the rest of your budget. Minute-grain windows because Cerebras's 30-RPM cap is not expressible as cost:N/day no matter how creatively you squint at the math.
alpha.21 (Fri 2026-06-20). Observability hook signatures aligned with OTel gen_ai.*. onCost / onTokenUsage / onFallback / onValidationRetry / onCacheHit. Drop-in Langfuse / Phoenix / OpenLLMetry / Datadog. Zero adapter code on your side.
beta.0 (Tue 2026-06-30). Scope-closed. The shape contract is the contract.

Then the beta minors stop locking and start delivering:

beta.1 (Tue 2026-07-28). Resilient fallback preset (no more rolling your own circuit breaker around runtimeFallback.shouldFallback). Adapter-anthropic stops burning your tokens on extended-thinking models that spend their entire output budget on hidden reasoning. Auto multi-turn cache placement. Four capability factories you've been writing yourself: createDetector, createTagger, createAnswerer, createResponder.
beta.2 (Tue 2026-08-11). Persistent budget backend. Your cost gates survive process restart. Token-denominated mode for "$X per 1M tokens per tenant per day". Pluggable CacheBackend so your exact-prompt cache lives where you want it (Redis, file, your own KV). Three more factories: createRewriter, createDecider, createExpander.
beta.3 (Tue 2026-08-25). RerankPort gets its first adapter. Three harder factories: createRedactor for PII-safe prompts, createAgent ergonomics so multi-turn tool-use stops feeling like assembling IKEA without instructions, capability-wrapper around RerankPort.
1.0.0 (Mon 2026-09-08). The contract goes load-bearing.

Test stats

626 passing across the workspace (up from ~615 in alpha.18). 11 new tests in packages/core/tests/cache-control.test.ts cover the shape lock and the field rename.

Links

Release notes: https://github.com/baabakk/llm-ports/releases/tag/%40llm-ports%2Fcore%400.1.0-alpha.19
Migration: docs/migration/alpha-18-to-alpha-19.md
Cache concept doc: docs/concepts/cache.md
Pre-announcement thread: Heads up: v0.1.0-alpha.19 lands cacheControl typed write path + cost.cacheSavingsUSD rename (BREAKING) #41

Found a real gap in the four modes before beta.0? Reply here or file an issue. Additive fields are still on the table; structural changes aren't.

baabakk · 2026-06-12T22:59:51Z

baabakk
Jun 12, 2026
Maintainer Author

Follow-up: shipped alpha.19.1. npm install @llm-ports/core@alpha resolves to 0.1.0-alpha.19.1 now.

Honest reason for the dot-release: alpha.19 locked the CacheControl shape, but the per-mode adapter behaviors and capability pass-through were not wired in the same commit. The release notes used present tense ("Anthropic places a marker at the last static block") for code that didn't exist yet. alpha.19.1 fixes that.

What landed:

adapter-anthropic now emits cache_control markers for mode: "auto" and mode: "manual" (with breakpoints at system, tools, or message-index), no-op for mode: "off" and mode: "preCreated". ttlSeconds: 3600 → ttl: "1h". All five SDK call sites covered. 10 new tests.
adapter-google wires mode: "preCreated" with cachedContentHandle to Gemini's config.cachedContent. Other modes are documented no-ops. 7 new tests.
adapter-openai / ollama / vercel confirmed deliberate no-ops (OpenAI implicit cache is always-on; Ollama no surface; Vercel defers to the bridged provider). Field accepted everywhere so forward-compatible callers don't switch on provider.
@llm-ports/capabilities all 7 factories accept cacheControl? and forward it. onResult.cost.cacheSavingsUSD propagates when the underlying port returned cache telemetry. 11 new tests.
docs/concepts/cache.md rewritten as a verified per-adapter, per-mode behavior matrix. Each cell points at the test that proves it.

Test stats: 654 passing across 7 packages (was 626 in alpha.19; +28 new).

No breaking changes vs alpha.19. The four-alpha sequence holds: alpha.20 BudgetScope on 2026-06-17, alpha.21 observability hooks on 2026-06-20, beta.0 on 2026-06-30.

Full release notes: https://github.com/baabakk/llm-ports/releases/tag/%40llm-ports%2Fcore%400.1.0-alpha.19.1

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shipped: 0.1.0-alpha.19 — CacheControl shape lock + cacheSavingsUSD rename (BREAKING) #42

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Shipped: 0.1.0-alpha.19 — CacheControl shape lock + cacheSavingsUSD rename (BREAKING) #42

Uh oh!

Uh oh!

baabakk Jun 12, 2026 Maintainer

The breaking thing (do this first)

The new typed surface

What's coming (since you'll ask anyway)

Test stats

Links

Replies: 1 comment

Uh oh!

baabakk Jun 12, 2026 Maintainer Author

baabakk
Jun 12, 2026
Maintainer

baabakk
Jun 12, 2026
Maintainer Author