fix: enable auto-compaction for sub-agents and improve context overflow detection#25180
fix: enable auto-compaction for sub-agents and improve context overflow detection#25180emco1234 wants to merge 4 commits intoanomalyco:devfrom
Conversation
…ow detection Sub-agents (task tool delegates) can hang indefinitely when their context overflows because compaction never triggers. This happens through three gaps in the current implementation: 1. **overflow.ts**: When a model doesn't report context limits (context === 0), overflow detection is completely disabled (`return false`). This affects providers like z.ai that may not expose model limits in a standard way. 2. **processor.ts**: Overflow detection is purely reactive — it only checks token counts AFTER the API responds. Some providers (notably z.ai) accept overflows silently without returning errors or accurate usage stats, so the processor never detects the overflow and the sub-agent hangs. 3. **error.ts**: Not all z.ai/GLM overflow error responses are matched by existing patterns, so some overflow errors are classified as generic API errors instead of ContextOverflowError. These get retried (not compacted), causing the "loads for days" behavior. Changes: - overflow.ts: Replace `if (context === 0) return 0` with a 128k default fallback, so overflow detection is never fully disabled. Remove the corresponding guard in isOverflow(). - processor.ts: Add proactive context estimation BEFORE the LLM stream starts. Estimates token count from the outgoing message payload using Token.estimate(). If the estimated size exceeds 85% of the model's context window, triggers compaction immediately without making an API call that would likely fail or hang. - error.ts: Add additional overflow detection patterns for z.ai GLM models and generic token-limit messages. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
|
The following comment was made by an LLM, it may be inaccurate: Based on my search, I found several related PRs that address compaction and context overflow issues: Potentially Related PRs:
These PRs all address aspects of compaction and overflow detection, though your PR appears to be the first comprehensive fix specifically targeting proactive estimation before API calls for sub-agents and unknown model limits. You may want to review #20718 and #17936 to ensure compatibility. |
The proactive check in processor.ts is the primary fix for sub-agent context overflow. It has its own context-limit fallback (128k) scoped to the pre-stream estimation, so it doesn't affect the main agent's overflow behavior. Restoring overflow.ts to its original state avoids premature compaction for models with large context windows (200k+). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
The proactive compaction check now uses the model's real context limit directly. At 200k context it triggers at 170k, at 500k at 425k. When the model doesn't report a limit (context === 0), the proactive check is skipped entirely — reactive overflow detection still applies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Four PRs across opencode (3) and codex (1): - anomalyco/opencode#25166: merge-as-is, server.mdx adds /global/config docs - anomalyco/opencode#25143: merge-as-is, ecosystem.mdx adds opencode-swarm row - anomalyco/opencode#25180: merge-after-nits, pre-stream Token.estimate proactive compaction at processor.ts:543-568 + 3 new overflow regex arms - openai/codex#20524: merge-as-is, notify deprecation across README + TOML doc + JSON schema + struct doc + 2 metrics + DeprecationNotice event + 153-line user_notification.rs delete
Issue for this PR
Closes #25187
Type of change
What does this PR do?
Sub-agents hang indefinitely on context overflow because compaction never triggers for them. The root cause is in
session/processor.ts: overflow detection is purely reactive — it only checks token counts from the API response AFTER the stream finishes. Some providers (z.ai, some OpenAI-compatible endpoints) accept context overflows silently without returning errors or accurate usage stats, so the processor never detects the overflow and the sub-agent hangs forever.This PR adds a proactive context estimation step in the processor that runs BEFORE the LLM stream starts. It estimates the outgoing payload size using
Token.estimate()(char-count / 4 heuristic) and compares it against the model's context window. If the estimate exceeds 85% of the limit, compaction is triggered immediately — without making an API call that would likely fail or hang.The second change adds missing overflow error patterns in
provider/error.ts. Some z.ai/GLM error messages (e.g. "token limit exceeded") weren't matched by existing patterns, so they were classified as generic API errors → retry loop → hang.How did you verify your code works?
npx tsc --noEmit --skipLibCheck)Token.estimate()function already used elsewhere in compaction.tsScreenshots / recordings
N/A — backend logic change, no UI impact.
Checklist