Skip to content

🤖 fix: avoid extra Anthropic cache breakpoints with explicit TTL#3112

Merged
ThomasK33 merged 1 commit intomainfrom
anthropic-models-6zjk
Apr 2, 2026
Merged

🤖 fix: avoid extra Anthropic cache breakpoints with explicit TTL#3112
ThomasK33 merged 1 commit intomainfrom
anthropic-models-6zjk

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary
This PR fixes a direct Anthropic regression where explicitly setting anthropic.cacheTtl caused Mux to emit one extra cache-control breakpoint, pushing tool-enabled requests over Anthropic's four-breakpoint limit.

Background
Mux already applies Anthropic prompt caching through manual cache markers on the cached system prompt, conversation tail, and last tool. When buildProviderOptions() also emitted top-level anthropic.cacheControl, the Anthropic SDK serialized an additional top-level cache_control block on direct requests. That produced the user-visible failure: A maximum of 4 blocks with cache_control may be provided. Found 5.

Implementation
The fix stops emitting top-level Anthropic cacheControl from buildProviderOptions() while preserving the existing manual cache-marker flow. To guard against future regressions, the PR also adds a helper that counts Anthropic cache breakpoints in shaped request payloads and tests that pin the intended breakpoint budget. A targeted StreamManager regression test verifies that explicit 1h TTL values still propagate through the manual cache path even without the top-level provider option.

Validation

  • bun test src/common/utils/ai/providerOptions.test.ts src/node/services/providerModelFactory.test.ts src/common/utils/ai/cacheStrategy.test.ts src/node/services/streamManager.test.ts
  • nix shell nixpkgs#hadolint -c make static-check
  • Dogfooded in an isolated make dev-server-sandbox instance using env-backed direct Anthropic credentials:
    • selected Anthropic in onboarding
    • set prompt cache TTL to 1 hour
    • added the current repo as the first project
    • opened an Exec workspace and sent a tool-using request
    • verified the request completed successfully without the previous Found 5 Anthropic error
    • verified the UI showed prompt-cache read/create stats for the successful request

Risks
The main regression risk is Anthropic request shaping across direct and routed paths. This change is intentionally narrow: it removes the redundant top-level direct-provider cache marker while keeping the existing manual cache markers intact, and adds tests at both the provider-options layer and the final shaped-request layer.

Pains
make static-check requires hadolint, which was not installed in the workspace environment. I ran it through nix shell nixpkgs#hadolint -c make static-check so the full required local validation still passed.


📋 Implementation Plan

Fix plan: direct Anthropic cache-marker duplication when explicit cache TTL is set

Recommendation

Recommended approach: keep Mux's existing 3 manual Anthropic cache breakpoints, and stop emitting the extra top-level Anthropic cacheControl field from buildProviderOptions().

  • Net product-code LoC estimate: +20 to +55
  • Why this is the best fit:
    • It removes the only repo-visible behavior change that happens only when anthropic.cacheTtl is explicitly set.
    • It preserves the current manual breakpoint strategy already documented in src/common/utils/ai/cacheStrategy.ts:
      1. cached system prompt
      2. cached conversation tail / last message
      3. cached last tool
    • It avoids a wider refactor across messagePipeline.ts, streamManager.ts, and providerModelFactory.ts unless follow-up cleanup is still desired after the regression is fixed.
Evidence supporting the root-cause diagnosis
  • The user hit Anthropic's runtime error: "A maximum of 4 blocks with cache_control may be provided. Found 5." on a direct Anthropic request.
  • The repo already applies 3 manual Anthropic cache breakpoints across these files:
    • src/common/utils/ai/cacheStrategy.ts
      • createCachedSystemMessage()
      • applyCacheControl()
      • applyCacheControlToTools()
    • src/node/services/messagePipeline.ts applies applyCacheControl() after message transforms.
    • src/node/services/streamManager.ts prepends the cached system message and marks the last tool.
  • src/common/utils/ai/cacheStrategy.ts explicitly documents Anthropic's 4-breakpoint limit and says the intended design is to use 3 total.
  • src/common/utils/ai/providerOptions.ts is the one place that adds an extra top-level Anthropic cacheControl field, and it does so only when muxProviderOptions.anthropic.cacheTtl is explicitly set.
  • src/node/services/aiService.ts already passes the explicit TTL separately into both:
    • prepareMessagesForProvider(...) (anthropicCacheTtl argument)
    • streamManager.startStream(...) (anthropicCacheTtlOverride argument)
  • That means the explicit TTL already reaches the manual cache-marker path without needing top-level providerOptions.anthropic.cacheControl.
  • So the most conservative repo-backed explanation is:
    • unset TTL -> manual 3-breakpoint path
    • explicit TTL -> same manual 3-breakpoint path plus an extra top-level Anthropic cache-control path
    • Anthropic rejects the resulting request once the effective marker count reaches 5.

Alternate approach (not recommended for the first fix)

Centralize all Anthropic cache injection in src/node/services/providerModelFactory.ts and remove the higher-level cache-marker transforms.

  • Net product-code LoC estimate: -40 to -110
  • Upside: one source of truth for the wire payload.
  • Downside: materially larger behavior change, touches more call sites, and increases regression surface for system prompts, tools, retries, and gateway routing.
  • Recommendation: defer this unless the surgical fix fails to cover another hidden duplication path.

Implementation plan

Phase 1 — Remove the redundant top-level Anthropic cache-control path

Files/symbols

  • src/common/utils/ai/providerOptions.ts
  • src/common/utils/ai/providerOptions.test.ts

Changes

  1. Update buildProviderOptions() so Anthropic models do not emit top-level anthropic.cacheControl, even when muxProviderOptions.anthropic.cacheTtl is set to "5m" or "1h".
  2. Keep the rest of the Anthropic provider options intact:
    • thinking
    • effort
    • disableParallelToolUse
    • sendReasoning
  3. Add a short code comment documenting why the top-level field is intentionally omitted:
    • explicit Anthropic TTL is already threaded through Mux's manual cache-marker helpers
    • sending an extra top-level cache-control field can create duplicate cache breakpoints and violate Anthropic's 4-breakpoint limit

Quality gate after Phase 1

  • Update src/common/utils/ai/providerOptions.test.ts to assert that explicit Anthropic TTL no longer appears in top-level provider options.
  • Cover both:
    • standard Anthropic models
    • effort/adaptive-thinking Anthropic models (for example Opus 4.6 / Sonnet 4.6 cases already exercised in this test file)

Phase 2 — Add a narrow regression guard at the wire-shaping layer

Files/symbols

  • src/node/services/providerModelFactory.ts
  • src/node/services/providerModelFactory.test.ts

Changes

  1. Extract or add a small pure helper near wrapFetchWithAnthropicCacheControl() that can count Anthropic cache breakpoints in the final request body.
  2. Count all cache-bearing locations relevant to this repo's current shaping strategy, including:
    • cached system blocks/messages
    • cached tools
    • cached last-message content parts
    • gateway-style providerOptions.anthropic.cacheControl message markers if present
  3. Reuse that helper in tests, and optionally add a defensive runtime assertion or warning right before sending the mutated request body.
    • Goal: fail loudly in development/tests if a future change pushes the request above Anthropic's limit again.
    • Keep the runtime behavior minimal; do not expand this into a broad fallback/rewrite mechanism in the first fix.

Quality gate after Phase 2

  • Add direct-provider regression coverage in src/node/services/providerModelFactory.test.ts that builds a representative Anthropic request shape with:
    • cached system prompt
    • cached last tool
    • cached last message
    • explicit cacheTtl: "1h"
  • Assert that the final shaped request stays at <= 4 breakpoints, and preferably at the intended 3.

Phase 3 — Verify TTL still propagates through the manual cache-marker path

Files/symbols

  • src/node/services/aiService.ts
  • src/node/services/messagePipeline.ts
  • src/node/services/streamManager.ts
  • src/common/utils/ai/cacheStrategy.ts
  • existing tests in:
    • src/common/utils/ai/cacheStrategy.test.ts
    • src/node/services/streamManager.test.ts or src/node/services/aiService.test.ts (only if a small targeted regression test is needed)

Changes

  1. Leave the existing manual cache-marker plumbing intact for the first fix.
  2. Add or update one targeted regression test proving that explicit Anthropic TTL still reaches the manual cache path even after top-level cacheControl is removed.
    • Best case: reuse an existing unit seam rather than adding a new integration harness.
    • Only expand into aiService / streamManager tests if providerOptions + providerModelFactory tests are not enough to pin the behavior down.
  3. Preserve the documented 3-breakpoint strategy in cacheStrategy.ts; do not refactor that layer yet.

Quality gate after Phase 3

  • Confirm the test suite still proves:
    • system prompt caching works
    • last tool caching works
    • last message caching works
    • explicit TTL values ("1h") are preserved on the manual path

Acceptance criteria

  • Direct Anthropic requests with explicit anthropic.cacheTtl: "1h" no longer exceed Anthropic's 4-breakpoint limit.
  • The final direct-provider request shape remains at the intended 3 manual cache breakpoints unless a future Anthropic-specific feature intentionally adds another.
  • Explicit TTL still applies to the existing manual cache markers; removing top-level providerOptions.anthropic.cacheControl must not silently disable 1-hour prompt caching.
  • Anthropic models without explicit TTL continue to use the existing manual cache-marker strategy.
  • The change does not introduce a regression for gateway-routed Anthropic models.

Validation plan

  1. Targeted unit tests
    • bun test src/common/utils/ai/providerOptions.test.ts
    • bun test src/node/services/providerModelFactory.test.ts
    • bun test src/common/utils/ai/cacheStrategy.test.ts
  2. Focused service regression test
    • run the smallest relevant additional test file only if Phase 3 adds coverage in aiService.test.ts or streamManager.test.ts
  3. Static validation
    • make typecheck
    • make lint if the touched files introduce new lint exposure
  4. Optional integration check
    • If Anthropic credentials are available in the environment, run a narrow Anthropic integration exercise after the unit tests pass.
    • Prefer a direct-provider reproduction with explicit cacheTtl: "1h" and at least one tool-enabled request.

Dogfooding plan

Goal: reproduce the original failure mode on the app path the user actually hit, then verify the fix with evidence a reviewer can inspect.

Setup

  • Configure the direct Anthropic provider (not mux-gateway).
  • Use an Anthropic model that supports the affected prompt-caching path.
  • Enable explicit anthropic.cacheTtl: "1h".
  • Use Exec mode or any other tool-enabled flow that exercises tool definitions in the request.

Repro / verification flow

  1. Start the app in a local dev session.
  2. Select the direct Anthropic provider and confirm cacheTtl is "1h".
  3. Run a simple tool-eligible request in Exec mode.
  4. Verify the request completes without the Anthropic API error about 5 cache-control blocks.
  5. If a debug request snapshot or local debug logging is available, verify the final outgoing Anthropic payload is at <= 4 breakpoints.

Evidence to capture

  • Screenshot 1: provider/model settings showing direct Anthropic + explicit 1h TTL
  • Screenshot 2: successful Exec/tool-enabled response where the old error no longer appears
  • Screenshot 3 (if available): debug snapshot or log evidence showing the final cache-breakpoint count
  • Video recording: a short end-to-end repro/verification run covering provider selection, request submission, and successful completion

Suggested tooling for verification

  • In exec mode, use the repo's normal desktop/dev workflow to reproduce the conversation flow.
  • If automation is helpful during implementation review, use the desktop/browser automation tools available in exec mode to drive the app and capture screenshots/video artifacts.

Risks / non-goals

  • Non-goal for this fix: full cache-system centralization across messagePipeline, streamManager, and providerModelFactory.
  • Risk: if another hidden Anthropic SDK path also materializes extra cache markers, removing top-level cacheControl may not be sufficient by itself.
    • Mitigation: add the wire-level breakpoint counter test in Phase 2 so the final payload shape is asserted directly.
  • Risk: some tests may currently treat top-level anthropic.cacheControl as the source of truth for TTL propagation.
    • Mitigation: update those tests to assert the new invariant: TTL is carried by the manual cache-marker path, not by top-level provider options.

Generated with mux • Model: openai:gpt-5.4 • Thinking: xhigh • Cost: $12.62

Stop emitting top-level Anthropic cacheControl from buildProviderOptions,
add regression coverage for final cache-breakpoint counts, and verify that
explicit TTLs still propagate through Mux's manual cache-marker path.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$12.62`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=12.62 -->
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Apr 2, 2026
Merged via the queue into main with commit f3a2722 Apr 2, 2026
24 checks passed
@ThomasK33 ThomasK33 deleted the anthropic-models-6zjk branch April 2, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant