research(context): active context compression via agent-controlled focus primitives (Focus Agent)

## Summary

Add two agent-callable tools (`start_focus` / `complete_focus`) to the agent's tool set. The agent autonomously decides when to consolidate exploration history into a persistent Knowledge block, deleting raw interaction logs and retaining only structured summaries.

**Source**: arXiv 2601.07190 — "Active Context Compression: Autonomous Memory Management in LLM Agents" (Verma)

## Technique

Two primitives added to the agent's tool set:
- **`start_focus(scope: str)`** — declares a sub-investigation scope, marks a checkpoint in conversation history
- **`complete_focus(summary: str)`** — triggers: append structured summary to pinned Knowledge block at context top; delete all messages between checkpoint and current step

Context follows a sawtooth pattern: grows during exploration, collapses at consolidation. The agent decides when to call `complete_focus`. System prompt instructs compression every 10–15 tool calls; system-injected reminder fires after 15 calls without compression.

Knowledge block is cumulative across all focus sessions — retained facts survive future compressions.

## Results (SWE-bench Lite, 5 instances, Claude Haiku 4.5)

- Total token reduction: **22.7%** (14.9M → 11.5M tokens)
- Task success rate: **3/5 = 60%** — identical to uncompressed baseline (no accuracy loss)
- Average compressions per task: **6.0**
- Average messages dropped per compression: **70.2**
- Per-instance range: **−57% to +110%** (pylint-7080, 8 compressions, summaries cost more than saved)

Key ablation: passive prompting → 1–2 compressions/task, 6% savings, accuracy degraded to 80%. Aggressive prompting (every 10–15 calls) → restored accuracy + 22.7% savings.

**Evaluation note**: 5-instance SWE-bench Lite is a thin sample — numbers should not be taken as strong evidence of general effectiveness.

## Applicability to Zeph

HIGH conceptually. MEDIUM-HIGH implementation scope. The proactive agent-directed model is **complementary** to Zeph's existing reactive compaction tiers:

- Focus compression fires mid-task at the agent's discretion (semantic task boundaries)
- Soft/Hard compaction still fires at budget thresholds (60%/90%) as a safety net
- These do not conflict, but require careful integration (see constraints below)

## Critical design constraints

1. **Knowledge block pinning**: The Knowledge block produced by `complete_focus` must be marked as pinned/protected in the Soft pruner — it is the compressed residue of already-deleted messages and must never be evicted.

2. **Hard compaction awareness**: The Hard compaction summarizer must treat Knowledge block entries as pre-summarized — pass them through unchanged rather than re-summarizing them. Double-compression degrades quality and wastes tokens.

3. **Minimum-messages guard**: The +110% failure mode on pylint-7080 (8 compressions, summaries cost more than saved) is a real production risk. A minimum message count between checkpoints (e.g., min 8–10 messages) prevents over-eager compression from backfiring.

4. **Context assembly ordering**: The Knowledge block must be injected at a fixed position (before skills, after base system prompt) and marked with `cache_control` appropriately to avoid prompt cache invalidation on each update.

## Implementation sketch (Zeph-specific, MEDIUM-HIGH complexity)

1. **Two new tool definitions** (`start_focus`, `complete_focus`) in `zeph-tools`, conditionally enabled by config.

2. **Checkpoint tracking** in `ContextManager`: when `start_focus` is called, store the message index as a checkpoint.

3. **Compression on `complete_focus`**: extract messages since checkpoint → LLM summarization call (scoped, similar to existing Hard compaction) → prepend result to Knowledge block → delete bracketed messages.

4. **Knowledge block** as a pinned `Role::System` message at a fixed position, excluded from Soft pruning and Hard re-summarization.

5. **Context assembly changes** (`assembly.rs`): inject Knowledge block, mark as pre-summarized, handle `complete_focus` as a compression event (not a normal tool result).

6. **Config**: `[agent.focus] enabled = false, compression_interval = 12, reminder_interval = 15, min_messages_per_focus = 8`

7. Integration points: config section, CLI flag, TUI command, `--init` wizard, `--migrate-config` step.

Complements `#1851` (SWE-Pruner passive pruning) and `#1885` (SideQuest cursor eviction) — different triggers, can coexist with careful tier ordering.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(context): active context compression via agent-controlled focus primitives (Focus Agent) #1850

Summary

Technique

Results (SWE-bench Lite, 5 instances, Claude Haiku 4.5)

Applicability to Zeph

Critical design constraints

Implementation sketch (Zeph-specific, MEDIUM-HIGH complexity)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(context): active context compression via agent-controlled focus primitives (Focus Agent) #1850

Description

Summary

Technique

Results (SWE-bench Lite, 5 instances, Claude Haiku 4.5)

Applicability to Zeph

Critical design constraints

Implementation sketch (Zeph-specific, MEDIUM-HIGH complexity)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions