Feature hasn't been suggested before.
Describe the enhancement you want to request
Why compaction matters
opencode runs long multi-turn agent sessions. As conversation history grows, we periodically compact it — asking the LLM to summarize old messages and replacing them with a concise summary. This is essential for staying within the context window.
The paradox
Compaction is supposed to save tokens. But to summarize the history, we have to re-send that same history to the LLM. The compaction request itself becomes one of the largest requests in the entire session — often 40K–50K tokens of input.
That would be acceptable if the provider's prompt cache helped. But it doesn't.
Why the cache misses
opencode's normal chat requests share a common prefix — system prompt, tool definitions — that hits the provider's prompt cache across turns. This is already working well.
The compaction request, however, builds a completely different structure:
- No system prompt (compaction agent has none)
- No tools (empty tool set)
- Different message history (prior compaction messages filtered out)
- Different serialization (media stripped, tool output truncated)
Because the request shares zero prefix overlap with normal chat requests, every token is charged at full input price. The cache that was built up over dozens of turns is completely wasted.
The cost in numbers
Compacting ~45K tokens of history with Claude Sonnet 4 ($3.00/MTok input, $0.30/MTok cached):
|
Tokens |
Full price |
If cached |
| System prompt |
~2K |
$0.006 |
$0.0006 |
| Tools |
~3K |
$0.009 |
$0.0009 |
| Dropped messages |
~40K |
$0.120 |
$0.012 |
| Summary instruction |
~200 |
$0.0006 |
$0.0006 (not cached) |
| Total |
~45.2K |
$0.136 |
$0.014 (~90% saved) |
The difference is ~10x. Over a long session with multiple compactions, this adds up — especially ironic for an operation whose entire purpose is cost reduction.
The fix is straightforward
The compaction request should share the same prefix as normal chat requests. Instead of building a separate request structure, keep the system prompt, tool definitions, and message serialization identical — only append a short "please summarize" instruction at the end.
Normal chat: [system] + [tools] + [old messages] + [kept messages] + [user input]
Compaction: [system] + [tools] + [old messages] + [summary prompt]
↑ cache hit up to here
The prefix [system] + [tools] + [old messages] was already sent in previous turns. The provider serves it from cache at 10% of the normal price. Only the ~200-token summary instruction is new.
This is the same approach described in the bash-agent wiki on Cache-Aligned Summarization, where it cuts compaction cost by ~90% in practice.
Feature hasn't been suggested before.
Describe the enhancement you want to request
Why compaction matters
opencode runs long multi-turn agent sessions. As conversation history grows, we periodically compact it — asking the LLM to summarize old messages and replacing them with a concise summary. This is essential for staying within the context window.
The paradox
Compaction is supposed to save tokens. But to summarize the history, we have to re-send that same history to the LLM. The compaction request itself becomes one of the largest requests in the entire session — often 40K–50K tokens of input.
That would be acceptable if the provider's prompt cache helped. But it doesn't.
Why the cache misses
opencode's normal chat requests share a common prefix — system prompt, tool definitions — that hits the provider's prompt cache across turns. This is already working well.
The compaction request, however, builds a completely different structure:
Because the request shares zero prefix overlap with normal chat requests, every token is charged at full input price. The cache that was built up over dozens of turns is completely wasted.
The cost in numbers
Compacting ~45K tokens of history with Claude Sonnet 4 ($3.00/MTok input, $0.30/MTok cached):
The difference is ~10x. Over a long session with multiple compactions, this adds up — especially ironic for an operation whose entire purpose is cost reduction.
The fix is straightforward
The compaction request should share the same prefix as normal chat requests. Instead of building a separate request structure, keep the system prompt, tool definitions, and message serialization identical — only append a short "please summarize" instruction at the end.
The prefix
[system] + [tools] + [old messages]was already sent in previous turns. The provider serves it from cache at 10% of the normal price. Only the ~200-token summary instruction is new.This is the same approach described in the bash-agent wiki on Cache-Aligned Summarization, where it cuts compaction cost by ~90% in practice.