Auto-compaction fails to trigger when context window fills up — garbled gzip error responses

# Bug: Auto-compaction fails to trigger when context window fills up — garbled gzip error responses

## The problem

When a Droid CLI session fills its context window, the session hard-fails instead of auto-compacting. The user is left with an unresponsive session and must manually run `/compress` to recover. This happens consistently every time the context window fills up.

The underlying cause is that the API error response body is gzip-compressed but never decompressed before being parsed. Instead of receiving a readable error message like `"prompt is too long: 215043 tokens > 200000 token limit"`, Droid receives garbled binary data. Because it can't read the error, it classifies it as `reason: "unknown"` and treats it as an unrecoverable failure rather than a signal to auto-compact.

Manual `/compress` works immediately after the failure, confirming the compaction system itself is fine.

Related: #161 (auto-compress not triggering with custom models). That issue attributes the problem to missing usage stats. This report identifies a different root cause: gzip-encoded error responses that can't be parsed.

## Environment

- **Droid CLI:** v0.65.0
- **OS:** macOS 15.4 (darwin 25.3.0)
- **Terminal:** Ghostty 1.2.3
- **Model:** Claude Opus routed through a local proxy to use a Claude Pro/Max subscription (BYOK)

## How to reproduce

1. Use Droid CLI in a long session until the context approaches the model's token limit
2. Continue working until the next LLM call would exceed the context window
3. **Expected:** Droid auto-compacts and continues
4. **Actual:** Droid retries 4 times across different providers, all fail with garbled errors, then the session hard-fails

## What the logs show

All logs are from `~/.factory/logs/droid-log-single.log`.

### 1. The session was working normally, then hit the context limit

212 successful LLM calls over ~57 minutes. The last successful call shows the context was nearly full:

```
[Agent] Streaming result | cacheReadInputTokens: 199198, contextCount: 497, outputTokens: 110
```

The very next call fails.

### 2. The error response body is gzip-compressed binary, not readable JSON

Every error response has the same structure — a valid JSON envelope with a `type` field that's readable, but a `message` field that's garbled binary:

```json
{
  "error": {
    "message": "\u001f\ufffd\b\u0000\u0000\u0000\u0000\u0000\u0000\u00034\ufffd\n\ufffd0\u0014E\ufffd\ufffd\ufffd\u000e...",
    "type": "invalid_request_error"
  }
}
```

The first bytes of the `message` field are `0x1F 0x8B 0x08 0x00` — the [gzip magic header](https://datatracker.ietf.org/doc/html/rfc1952#section-2.3.1) (`0x1F 0x8B` = gzip identification, `0x08` = deflate compression, `0x00` = no flags). This is gzip-compressed data being treated as a UTF-8 string, which is why many bytes show up as `\ufffd` (the Unicode replacement character for invalid byte sequences).

Each retry attempt has different compressed bytes (fresh compression each time) but the same approximate length (~160-170 bytes), consistent with a short error message being independently compressed per request. The original error message is unrecoverable from the logs because the invalid bytes were replaced with U+FFFD during UTF-8 encoding.

### 3. All four retry attempts fail identically across three different upstream providers

```
Attempt 1: apiProvider: "anthropic"          → garbled 400
Attempt 2: apiProvider: "bedrock_anthropic"   → garbled 400
Attempt 3: apiProvider: "vertex_anthropic"    → garbled 400
Attempt 4: apiProvider: "anthropic"           → garbled 400
```

The same gzip encoding issue occurs across Anthropic direct, AWS Bedrock, and Google Vertex. It's unlikely that all three upstream providers independently gzip-encode their error responses in the same way. The common factor is Factory's gateway layer sitting in front of all three.

### 4. The error is classified as "unknown", so auto-compaction never fires

After all retries are exhausted, the session fails with:

```
[Chat route failure] | reason: "unknown", statusCode: 400, severity: "severe"
[metrics_log_agent_progress_count] | outcome: "error"
```

The `reason: "unknown"` classification is the key. Whatever logic determines whether an error is a recoverable context-length error (and should trigger auto-compaction) cannot match against garbled binary. So the error falls through to the "unknown" path, and the session dies.

### 5. Manual compaction works immediately after

Running `/compress` manually right after the failure succeeds without issue — the summarizer processes all 431 messages and creates a new session. The session continues normally from there. The compaction machinery is fine; it just never gets triggered automatically.

## Raw log excerpts

<details>
<summary>Last successful LLM call before failure</summary>

```
[2026-03-01T18:50:20.362Z] INFO: [Agent] Streaming result | Context: {
  "count": 1,
  "cacheReadInputTokens": 199198,
  "contextCount": 497,
  "outputTokens": 110,
  "hasReasoningContent": false,
  "isByok": true
}
```
</details>

<details>
<summary>First error (attempt 1 — anthropic provider)</summary>

```
[2026-03-01T18:50:21.817Z] WARN: [useLLMStreaming] LLM error | Context: {
  "error": {
    "name": "Error",
    "message": "400 {\"error\":{\"message\":\"\\u001f\ufffd\\b\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u00034\ufffd\ufffd\\n\ufffd0\\u0014E\ufffd\ufffd\ufffd\\u000e\ufffd\\nJ\\u0006\ufffd:\ufffdW\ufffd\\u0010l\\b\ufffd&/\ufffd$j)\ufffd\ufffdr.\ufffd3a\\u0019\ufffdE\ufffd\ufffd\ufffd\\u0018...\\u0000\\u0000\",\"type\":\"invalid_request_error\"}}",
    "stack": "Error: 400 {...}\n    at generate (../../node_modules/@anthropic-ai/sdk/core/error.mjs:37:24)\n    at makeRequest (../../node_modules/@anthropic-ai/sdk/client.mjs:309:30)\n    at processTicksAndRejections (native:7:39)"
  },
  "attempt": 1,
  "modelId": "claude-opus-4-6-thinking-32000",
  "apiProvider": "anthropic"
}
```
</details>

<details>
<summary>Retries across all three providers (attempts 2-4)</summary>

```
[2026-03-01T18:50:24.200Z] WARN: [useLLMStreaming] LLM error | attempt: 2, apiProvider: "bedrock_anthropic"
[2026-03-01T18:50:29.758Z] WARN: [useLLMStreaming] LLM error | attempt: 3, apiProvider: "vertex_anthropic"
[2026-03-01T18:50:35.773Z] WARN: [useLLMStreaming] LLM error | attempt: 4, apiProvider: "anthropic"
```

All three contain the same garbled gzip `message` field with `"type": "invalid_request_error"`.
</details>

<details>
<summary>Fatal session failure</summary>

```
[2026-03-01T18:50:47.446Z] WARN: [Chat route failure] | Context: {
  "reason": "unknown",
  "statusCode": 400,
  "severity": "severe"
}

[2026-03-01T18:50:47.449Z] INFO: [metrics_log_agent_progress_count] | Context: {
  "outcome": "error",
  "errorMessage": "400 {\"error\":{\"message\":\"\\u001f\ufffd\\b...garbled...\",\"type\":\"invalid_request_error\"}}"
}
```
</details>

<details>
<summary>Manual compaction succeeds immediately after</summary>

```
[2026-03-01T18:51:16.881Z] INFO: [Compaction] Start | reason: "manual", source: "tui"
[2026-03-01T18:51:16.881Z] INFO: [Summarizer] Start | messagesToSummarizeCount: 431
[2026-03-01T18:51:16.881Z] INFO: [Summarizer] Using model | modelId: "claude-opus-4-6-thinking-32000"
[2026-03-01T18:52:24.655Z] INFO: [Summarizer] Response OK | summaryLength: 9136
[2026-03-01T18:52:25.207Z] INFO: [Compaction] End | succeeded: true, compactionDurationMs: 68326, numMessagesRemoved: 431
[2026-03-01T18:52:25.207Z] INFO: [Compaction] New session created
```
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-compaction fails to trigger when context window fills up — garbled gzip error responses #692

Bug: Auto-compaction fails to trigger when context window fills up — garbled gzip error responses

The problem

Environment

How to reproduce

What the logs show

1. The session was working normally, then hit the context limit

2. The error response body is gzip-compressed binary, not readable JSON

3. All four retry attempts fail identically across three different upstream providers

4. The error is classified as "unknown", so auto-compaction never fires

5. Manual compaction works immediately after

Raw log excerpts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Auto-compaction fails to trigger when context window fills up — garbled gzip error responses #692

Description

Bug: Auto-compaction fails to trigger when context window fills up — garbled gzip error responses

The problem

Environment

How to reproduce

What the logs show

1. The session was working normally, then hit the context limit

2. The error response body is gzip-compressed binary, not readable JSON

3. All four retry attempts fail identically across three different upstream providers

4. The error is classified as "unknown", so auto-compaction never fires

5. Manual compaction works immediately after

Raw log excerpts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions