Description
The prune function in SessionCompaction uses hardcoded thresholds (PRUNE_PROTECT = 40_000 and PRUNE_MINIMUM = 20_000 tokens) to decide when to clear old tool call outputs.
These values were reasonable when most models had 200K context (40K = 20%), but with Claude Opus 4.6, GPT-5.4, and Gemini 2.5 Pro all at 1M+ context, 40K is only 4%. Tool results from just a few turns ago get pruned even though context usage is well under the limit.
Steps to reproduce
- Use a 1M context model (Claude Opus, GPT-5.4, etc.)
- Have a multi-turn conversation with tool calls (reading/editing files)
- After several turns, observe that tool outputs from earlier turns are pruned prematurely
- The model loses access to recent work results
Expected behavior
Prune thresholds should scale with model context size so that larger-context models retain more tool output history.
Description
The prune function in
SessionCompactionuses hardcoded thresholds (PRUNE_PROTECT = 40_000andPRUNE_MINIMUM = 20_000tokens) to decide when to clear old tool call outputs.These values were reasonable when most models had 200K context (40K = 20%), but with Claude Opus 4.6, GPT-5.4, and Gemini 2.5 Pro all at 1M+ context, 40K is only 4%. Tool results from just a few turns ago get pruned even though context usage is well under the limit.
Steps to reproduce
Expected behavior
Prune thresholds should scale with model context size so that larger-context models retain more tool output history.