Skip to content

Move scratchpad injection to tail of message list for cache efficiency#147

Merged
chinmaymk merged 5 commits intomainfrom
claude/analyze-caching-efficiency-eOMXg
Mar 23, 2026
Merged

Move scratchpad injection to tail of message list for cache efficiency#147
chinmaymk merged 5 commits intomainfrom
claude/analyze-caching-efficiency-eOMXg

Conversation

@chinmaymk
Copy link
Copy Markdown
Owner

Summary

Refactored the scratchpad middleware to inject the scratchpad message just before the final user message (in the tail region) instead of after the first user message (in the pinned prefix zone). This optimization keeps the long, stable prefix (system prompt + context files + conversation history) byte-identical across turns, maximizing provider prompt-cache hit rates.

Key Changes

  • Injection position: Changed from inserting after the first user message to inserting before the last user message
  • Stripping logic: Updated to strip ALL previously injected scratchpad blocks (iterating backward) instead of stopping after the first removal, preventing stale copies from accumulating even when compaction merges messages
  • Loop control: Changed break statements to continue in the stripping loop to ensure all scratchpad blocks are removed before re-injection
  • Search direction: Reversed the loop that finds the insertion point to search backward from the end of the message list

Implementation Details

  • The scratchpad now lives in the "tail region" of the message list, keeping the prefix stable for caching
  • All previously injected scratchpad blocks are stripped first (backward iteration ensures splice indices remain valid)
  • Updated test expectations to reflect the new injection position (messages[1] instead of messages[2] in most cases, with adjustments for scenarios where new scratchpad is inserted before stripped messages)
  • Comments updated to clarify the caching rationale and the comprehensive stripping behavior

https://claude.ai/code/session_011wxfX5yFjds1KXZSzrRYhC

claude added 5 commits March 23, 2026 07:33
Previously the scratchpad was injected right after the pinned zone
(position ~2), meaning any content change invalidated the cache for
the entire conversation that followed. Now it's inserted just before
the final user message, keeping the long prefix byte-identical across
turns and maximizing provider prompt-cache hit rates.

Also strips ALL stale scratchpad blocks (not just the first match)
to prevent duplicates after compaction merges messages.

https://claude.ai/code/session_011wxfX5yFjds1KXZSzrRYhC
Instead of splicing before the last user message, just push to
the end. Simpler code, same cache benefit, and the scratchpad
is the last thing the model sees before generating.

https://claude.ai/code/session_011wxfX5yFjds1KXZSzrRYhC
Tool ordering from Map insertion order is fragile — conditionally
enabled tools shift the list and break provider prompt caches.
Sorting by name ensures the tool block is byte-identical across
sessions regardless of registration order.

https://claude.ai/code/session_011wxfX5yFjds1KXZSzrRYhC
Skill iteration order from Map depends on filesystem glob order,
which isn't deterministic across runs. Sorting by name ensures the
skills XML block in the pinned zone is byte-identical across sessions.

https://claude.ai/code/session_011wxfX5yFjds1KXZSzrRYhC
@chinmaymk chinmaymk merged commit e614633 into main Mar 23, 2026
1 check passed
@chinmaymk chinmaymk deleted the claude/analyze-caching-efficiency-eOMXg branch March 23, 2026 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants