Skip to content

research(context): active context compression via agent-controlled focus primitives (Focus Agent) #1850

@bug-ops

Description

@bug-ops

Summary

Add two agent-callable tools (start_focus / complete_focus) to the agent's tool set. The agent autonomously decides when to consolidate exploration history into a persistent Knowledge block, deleting raw interaction logs and retaining only structured summaries.

Source: arXiv 2601.07190 — "Active Context Compression: Autonomous Memory Management in LLM Agents" (Verma)

Technique

Two primitives added to the agent's tool set:

  • start_focus(scope: str) — declares a sub-investigation scope, marks a checkpoint in conversation history
  • complete_focus(summary: str) — triggers: append structured summary to pinned Knowledge block at context top; delete all messages between checkpoint and current step

Context follows a sawtooth pattern: grows during exploration, collapses at consolidation. The agent decides when to call complete_focus. System prompt instructs compression every 10–15 tool calls; system-injected reminder fires after 15 calls without compression.

Knowledge block is cumulative across all focus sessions — retained facts survive future compressions.

Results (SWE-bench Lite, 5 instances, Claude Haiku 4.5)

  • Total token reduction: 22.7% (14.9M → 11.5M tokens)
  • Task success rate: 3/5 = 60% — identical to uncompressed baseline (no accuracy loss)
  • Average compressions per task: 6.0
  • Average messages dropped per compression: 70.2
  • Per-instance range: −57% to +110% (pylint-7080, 8 compressions, summaries cost more than saved)

Key ablation: passive prompting → 1–2 compressions/task, 6% savings, accuracy degraded to 80%. Aggressive prompting (every 10–15 calls) → restored accuracy + 22.7% savings.

Evaluation note: 5-instance SWE-bench Lite is a thin sample — numbers should not be taken as strong evidence of general effectiveness.

Applicability to Zeph

HIGH conceptually. MEDIUM-HIGH implementation scope. The proactive agent-directed model is complementary to Zeph's existing reactive compaction tiers:

  • Focus compression fires mid-task at the agent's discretion (semantic task boundaries)
  • Soft/Hard compaction still fires at budget thresholds (60%/90%) as a safety net
  • These do not conflict, but require careful integration (see constraints below)

Critical design constraints

  1. Knowledge block pinning: The Knowledge block produced by complete_focus must be marked as pinned/protected in the Soft pruner — it is the compressed residue of already-deleted messages and must never be evicted.

  2. Hard compaction awareness: The Hard compaction summarizer must treat Knowledge block entries as pre-summarized — pass them through unchanged rather than re-summarizing them. Double-compression degrades quality and wastes tokens.

  3. Minimum-messages guard: The +110% failure mode on pylint-7080 (8 compressions, summaries cost more than saved) is a real production risk. A minimum message count between checkpoints (e.g., min 8–10 messages) prevents over-eager compression from backfiring.

  4. Context assembly ordering: The Knowledge block must be injected at a fixed position (before skills, after base system prompt) and marked with cache_control appropriately to avoid prompt cache invalidation on each update.

Implementation sketch (Zeph-specific, MEDIUM-HIGH complexity)

  1. Two new tool definitions (start_focus, complete_focus) in zeph-tools, conditionally enabled by config.

  2. Checkpoint tracking in ContextManager: when start_focus is called, store the message index as a checkpoint.

  3. Compression on complete_focus: extract messages since checkpoint → LLM summarization call (scoped, similar to existing Hard compaction) → prepend result to Knowledge block → delete bracketed messages.

  4. Knowledge block as a pinned Role::System message at a fixed position, excluded from Soft pruning and Hard re-summarization.

  5. Context assembly changes (assembly.rs): inject Knowledge block, mark as pre-summarized, handle complete_focus as a compression event (not a normal tool result).

  6. Config: [agent.focus] enabled = false, compression_interval = 12, reminder_interval = 15, min_messages_per_focus = 8

  7. Integration points: config section, CLI flag, TUI command, --init wizard, --migrate-config step.

Complements #1851 (SWE-Pruner passive pruning) and #1885 (SideQuest cursor eviction) — different triggers, can coexist with careful tier ordering.

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions