Skip to content

fix(session): cache messages across prompt loop to preserve prompt cache byte-identity#24842

Open
BYK wants to merge 1 commit intoanomalyco:devfrom
BYK:fix/prompt-cache-stability
Open

fix(session): cache messages across prompt loop to preserve prompt cache byte-identity#24842
BYK wants to merge 1 commit intoanomalyco:devfrom
BYK:fix/prompt-cache-stability

Conversation

@BYK
Copy link
Copy Markdown
Contributor

@BYK BYK commented Apr 28, 2026

Issue for this PR

Closes #24841. Related: #20110, #20565, #14743.

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

filterCompactedEffect(sessionID) reloads ALL messages from the DB at the start of every prompt loop iteration. Between tool-call steps, tool parts transition from pendingcompleted with output text. toModelMessages() serializes these states differently:

  • Pending (message-v2.ts line 912): state: "output-error", errorText: "[Tool execution was interrupted]"
  • Completed (message-v2.ts line 853): state: "output-available", output: <actual text>

Anthropic's prompt cache matches on byte-identity. The changed bytes at that message position invalidate the cache from there forward — the entire remaining context becomes a cache WRITE at $6.25/MTok (12.5× the cache-read price of $0.50/MTok for Opus).

The fix: move filterCompactedEffect() above the loop and cache the result. On tool-call continuation, reload from DB but only append messages with genuinely new IDs — existing messages retain their original serialized state as the API last saw them. Full reloads still happen after compaction, subtask handling, and overflow recovery since those structurally change the conversation.

I understand why this works: the key insight is that the only message whose tool parts change between API calls is the most recent assistant message (the one whose tool just executed). All prior messages were already completed when the previous API call sent them. By not re-reading that one message from the DB, its serialized form stays byte-identical with what Anthropic cached.

Cost data from real sessions (Opus 4.7, 1M context, April 21st):

  • Cache writes: $2,264 (63% of total daily spend)
  • Cache reads: $1,234 (34%)
  • 95% of rapid cache busts (<60s gap) have a tool call in the preceding message
  • Warm turns: cache_read=614K, cache_write=1K
  • Bust turns: cache_read=54K, cache_write=560K (only system prompt survives)

How did you verify your code works?

Analyzed cache patterns from the OpenCode DB across multiple sessions. Verified the root cause by correlating bust events with tool-call timing — 95% of rapid busts (<60s) are preceded by a tool-bearing message, and the exact cache_read/write pattern matches the pending→completed byte change. The fix preserves the message array identity across tool-call steps while still correctly appending new messages.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions github-actions Bot added needs:compliance This means the issue will auto-close after 2 hours. needs:title labels Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Hey! Your PR title perf(session): cache messages across prompt loop to preserve prompt cache byte-identity doesn't follow conventional commit format.

Please update it to start with one of:

  • feat: or feat(scope): new feature
  • fix: or fix(scope): bug fix
  • docs: or docs(scope): documentation changes
  • chore: or chore(scope): maintenance tasks
  • refactor: or refactor(scope): code refactoring
  • test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search, here are the potentially related PRs:

  1. fix(cache): improve Anthropic prompt cache hit rate with system split and tool stability #14743 - fix(cache): improve Anthropic prompt cache hit rate with system split and tool stability

    • Related: Also focuses on improving Anthropic prompt cache performance, though with a different approach (system split and tool stability)
  2. fix(session): use transcript position instead of lexical ID compare in prompt loop #24379 - fix(session): use transcript position instead of lexical ID compare in prompt loop

    • Related: Recent PR addressing session/prompt loop logic that could interact with message caching behavior

The primary PR #24842 appears to be novel in its specific approach of caching messages across prompt loop iterations. The related issues mentioned (#24841, #20110, #20565) suggest this is addressing a known performance problem, but no other open PRs are directly duplicating this work.

No duplicate PRs found

@BYK BYK changed the title perf(session): cache messages across prompt loop to preserve prompt cache byte-identity fix(session): cache messages across prompt loop to preserve prompt cache byte-identity Apr 28, 2026
@github-actions github-actions Bot removed needs:title needs:compliance This means the issue will auto-close after 2 hours. labels Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

@BYK BYK force-pushed the fix/prompt-cache-stability branch from d5baeca to 7cb41f9 Compare April 28, 2026 22:07
@BYK BYK requested a review from adamdotdevin as a code owner April 28, 2026 22:07
…ache byte-identity

OpenCode updates tool part states in-place (pending → completed + output)
between consecutive API calls in the tool-execution loop. When the next
API call serializes the conversation, the previous assistant message has
different bytes (completed state + output vs pending/error placeholder),
breaking Anthropic's prompt cache from that point forward.

On real sessions this causes ~20% of turns to re-write the entire context
at the cache-write price (12.5× cache-read). On April 21st alone, this
cost $2,264 in cache writes vs $1,234 in cache reads.

Fix: cache the conversation array across prompt loop iterations. On tool-
call continuation steps, only append genuinely NEW messages instead of
reloading all messages from the DB. Existing messages retain their
original part states (as the API last saw them), preserving byte-identity
for the prompt cache.

Full reloads still happen after compaction, subtask handling, and overflow
recovery — these operations structurally change the conversation.
@BYK BYK force-pushed the fix/prompt-cache-stability branch from 7cb41f9 to 25724b7 Compare April 28, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prompt loop DB reload breaks Anthropic cache after tool calls (63% of spend)

1 participant