fix(core): apply deferred summaries when count reaches cutoff#1306
Merged
fix(core): apply deferred summaries when count reaches cutoff#1306
Conversation
Deferred tool pair summaries were never applied in practice. After prepare_context calls recompute_prompt_tokens() at the end of each turn, cached_prompt_tokens reflects the actual post-pruning value (low), so should_apply_deferred (70% of budget) was never triggered. Content was silently replaced with [pruned] instead of [tool summary]. Add a count-based fallback: apply deferred summaries when the number of pending summaries reaches tool_call_cutoff (default 6). Once prune_stale_tool_outputs has already invalidated the cache prefix, there is no benefit to further deferring application. Add regression test: tier0_count_trigger_fires_without_budget_pressure
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Deferred tool pair summaries (introduced in #1303) were never applied in practice, causing tool outputs to accumulate as
[pruned]content instead of being replaced by summaries.Root cause:
prepare_contextcallsrecompute_prompt_tokens()at the end of every turn, resettingcached_prompt_tokensto the actual post-pruning value. Since pruning keeps the actual token count low, the token-based trigger (cached_prompt_tokens > budget * 0.70) was never satisfied. Deferred summaries accumulated indefinitely in message metadata without ever being applied.Observable symptoms (visible in
--debug-dumpoutput):[pruned]; no[tool summary]messages appearFix
Add a count-based fallback in
maybe_apply_deferred_summaries: apply all pending deferred summaries whenpending >= tool_call_cutoff(default 6), regardless of token count.Rationale: once
prune_stale_tool_outputshas already replaced content with[pruned], the cache prefix is already invalidated — there is no benefit to deferring summary application further.Test plan
tier0_count_trigger_fires_without_budget_pressure: verifies that 6 deferred summaries are applied whencached_prompt_tokensis well below the 70% budget thresholdtier0_does_not_set_compacted_this_turnunchangedcargo nextest run --workspace --features full --lib --bins: 4436 passed