-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Add two agent-callable tools (start_focus / complete_focus) to the agent's tool set. The agent autonomously decides when to consolidate exploration history into a persistent Knowledge block, deleting raw interaction logs and retaining only structured summaries.
Source: arXiv 2601.07190 — "Active Context Compression: Autonomous Memory Management in LLM Agents" (Verma)
Technique
Two primitives added to the agent's tool set:
start_focus(scope: str)— declares a sub-investigation scope, marks a checkpoint in conversation historycomplete_focus(summary: str)— triggers: append structured summary to pinned Knowledge block at context top; delete all messages between checkpoint and current step
Context follows a sawtooth pattern: grows during exploration, collapses at consolidation. The agent decides when to call complete_focus. System prompt instructs compression every 10–15 tool calls; system-injected reminder fires after 15 calls without compression.
Knowledge block is cumulative across all focus sessions — retained facts survive future compressions.
Results (SWE-bench Lite, 5 instances, Claude Haiku 4.5)
- Total token reduction: 22.7% (14.9M → 11.5M tokens)
- Task success rate: 3/5 = 60% — identical to uncompressed baseline (no accuracy loss)
- Average compressions per task: 6.0
- Average messages dropped per compression: 70.2
- Per-instance range: −57% to +110% (pylint-7080, 8 compressions, summaries cost more than saved)
Key ablation: passive prompting → 1–2 compressions/task, 6% savings, accuracy degraded to 80%. Aggressive prompting (every 10–15 calls) → restored accuracy + 22.7% savings.
Evaluation note: 5-instance SWE-bench Lite is a thin sample — numbers should not be taken as strong evidence of general effectiveness.
Applicability to Zeph
HIGH conceptually. MEDIUM-HIGH implementation scope. The proactive agent-directed model is complementary to Zeph's existing reactive compaction tiers:
- Focus compression fires mid-task at the agent's discretion (semantic task boundaries)
- Soft/Hard compaction still fires at budget thresholds (60%/90%) as a safety net
- These do not conflict, but require careful integration (see constraints below)
Critical design constraints
-
Knowledge block pinning: The Knowledge block produced by
complete_focusmust be marked as pinned/protected in the Soft pruner — it is the compressed residue of already-deleted messages and must never be evicted. -
Hard compaction awareness: The Hard compaction summarizer must treat Knowledge block entries as pre-summarized — pass them through unchanged rather than re-summarizing them. Double-compression degrades quality and wastes tokens.
-
Minimum-messages guard: The +110% failure mode on pylint-7080 (8 compressions, summaries cost more than saved) is a real production risk. A minimum message count between checkpoints (e.g., min 8–10 messages) prevents over-eager compression from backfiring.
-
Context assembly ordering: The Knowledge block must be injected at a fixed position (before skills, after base system prompt) and marked with
cache_controlappropriately to avoid prompt cache invalidation on each update.
Implementation sketch (Zeph-specific, MEDIUM-HIGH complexity)
-
Two new tool definitions (
start_focus,complete_focus) inzeph-tools, conditionally enabled by config. -
Checkpoint tracking in
ContextManager: whenstart_focusis called, store the message index as a checkpoint. -
Compression on
complete_focus: extract messages since checkpoint → LLM summarization call (scoped, similar to existing Hard compaction) → prepend result to Knowledge block → delete bracketed messages. -
Knowledge block as a pinned
Role::Systemmessage at a fixed position, excluded from Soft pruning and Hard re-summarization. -
Context assembly changes (
assembly.rs): inject Knowledge block, mark as pre-summarized, handlecomplete_focusas a compression event (not a normal tool result). -
Config:
[agent.focus] enabled = false, compression_interval = 12, reminder_interval = 15, min_messages_per_focus = 8 -
Integration points: config section, CLI flag, TUI command,
--initwizard,--migrate-configstep.
Complements #1851 (SWE-Pruner passive pruning) and #1885 (SideQuest cursor eviction) — different triggers, can coexist with careful tier ordering.