Skip to content

fix(core): apply deferred summaries when count reaches cutoff#1306

Merged
bug-ops merged 1 commit intomainfrom
fix/deferred-summary-count-trigger
Mar 6, 2026
Merged

fix(core): apply deferred summaries when count reaches cutoff#1306
bug-ops merged 1 commit intomainfrom
fix/deferred-summary-count-trigger

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Mar 6, 2026

Problem

Deferred tool pair summaries (introduced in #1303) were never applied in practice, causing tool outputs to accumulate as [pruned] content instead of being replaced by summaries.

Root cause: prepare_context calls recompute_prompt_tokens() at the end of every turn, resetting cached_prompt_tokens to the actual post-pruning value. Since pruning keeps the actual token count low, the token-based trigger (cached_prompt_tokens > budget * 0.70) was never satisfied. Deferred summaries accumulated indefinitely in message metadata without ever being applied.

Observable symptoms (visible in --debug-dump output):

  • Request sizes shrink while message count grows (pruning active, no summarization)
  • All old tool results contain [pruned]; no [tool summary] messages appear
  • Message count grows without bound

Fix

Add a count-based fallback in maybe_apply_deferred_summaries: apply all pending deferred summaries when pending >= tool_call_cutoff (default 6), regardless of token count.

Rationale: once prune_stale_tool_outputs has already replaced content with [pruned], the cache prefix is already invalidated — there is no benefit to deferring summary application further.

Test plan

  • New test tier0_count_trigger_fires_without_budget_pressure: verifies that 6 deferred summaries are applied when cached_prompt_tokens is well below the 70% budget threshold
  • Existing test tier0_does_not_set_compacted_this_turn unchanged
  • cargo nextest run --workspace --features full --lib --bins: 4436 passed

Deferred tool pair summaries were never applied in practice. After
prepare_context calls recompute_prompt_tokens() at the end of each
turn, cached_prompt_tokens reflects the actual post-pruning value
(low), so should_apply_deferred (70% of budget) was never triggered.
Content was silently replaced with [pruned] instead of [tool summary].

Add a count-based fallback: apply deferred summaries when the number
of pending summaries reaches tool_call_cutoff (default 6). Once
prune_stale_tool_outputs has already invalidated the cache prefix,
there is no benefit to further deferring application.

Add regression test: tier0_count_trigger_fires_without_budget_pressure
@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate bug Something isn't working size/M Medium PR (51-200 lines) labels Mar 6, 2026
@bug-ops bug-ops merged commit 3c8a3db into main Mar 6, 2026
25 checks passed
@bug-ops bug-ops deleted the fix/deferred-summary-count-trigger branch March 6, 2026 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate documentation Improvements or additions to documentation rust Rust code changes size/M Medium PR (51-200 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant