Response Truncation Bug Report (max_tokens / Context Limit)

# Factory Droid — Response Truncation Bug Report (max_tokens / Context Limit)

## Summary
During long conversation sessions, assistant responses are being cut off mid-sentence across multiple BYOK models. Investigation shows this is caused by the **Factory client not correctly enforcing model-specific output token limits**, or the **session context exceeding the model's input token limit**.

## Affected Models (Confirmed)

| Model | Error Pattern | Notes |
|---|---|---|
| `custom:kimi-k2.6:cloud` | `413 Request Entity Too Large` (`llmContextExceeded`) | Persists even after compaction |
| `custom:deepseek-v4-pro:cloud` | `400 max_tokens exceeds model's maximum output tokens (65536)` | Requested 170,394 tokens |
| `custom:qwen3-vl:235b-cloud` | `400 max_tokens exceeds model's maximum output tokens (32768)` | Requested 64,000 tokens |
| `custom:glm-5.1:cloud` | `400 max_tokens exceeds model's maximum output tokens (131072)` | Requested 194,000 tokens |

All models have `isByok: true`.

## Error Log Excerpts

### Pattern A: max_tokens Exceeds Output Limit
```
400 "max_tokens (170394) exceeds model's maximum output tokens (65536) 
for model deepseek-v4-pro (ref: ...)"
```

### Pattern B: Context Size Exceeds Input Limit
```
413 "Request Entity Too Large (ref: ...)"
reason: "llmContextExceeded"
```

### Pattern C: Invalid Message Format After Compaction
```
400 "invalid message format"
Occurs after Compaction (reason: context_limit)
```

## Reproduction Steps
1. Select any of the BYOK models above.
2. Continue a long conversation (including tool calls and large file reads).
3. Input tokens accumulate in the session.
4. On the next turn, the model call fails with `400` or `413`, and the response is truncated.

## Root Cause Analysis

### 1. Broken max_tokens Calculation
The client appears to compute `max_tokens = context_window - input_tokens`, but **ignores the model-specific maximum output token limit**, resulting in API returning `400 Bad Request`.

Correct calculation should be:
```
max_tokens = min(context_window - input_tokens, model_max_output_tokens)
```

### 2. Incomplete BYOK Model Spec Resolution
Logs show `Unknown model, falling back to default` when resolving `getTuiModelConfig` for BYOK models, indicating the client may not be retrieving the correct token limits for `custom:<model>:cloud` aliases.

### 3. Compaction Side-Effects
When `llmContextExceeded` occurs, Factory compacts the session, but the resulting message structure can trigger `invalid message format` on some models (notably `kimi-k2.6`).

## Requested Fixes

1. **Add per-model `max_output_tokens` hard caps**
   - deepseek-v4-pro → 65,536
   - qwen3-vl → 32,768
   - glm-5.1 → 131,072
   - kimi-k2.6 → limit according to model specs

2. **Fix `max_tokens` computation**
   - Always clip to `model_max_output_tokens`

3. **Improve BYOK model spec resolution**
   - Ensure `custom:<model>:cloud` aliases resolve to correct limits

4. **Validate message structure after Compaction**
   - Ensure compacted summaries conform to each model's format constraints

## Environment
- OS: Windows 11 (win32 10.0.26200)
- Factory Droid versions: 0.105.0 through 0.137.1 (reproduced across versions)
- Installation ID: `af46be50-2fc2-4e76-b07b-30aeab5ee2b0`

---
*This is a client-side model invocation control issue that cannot be mitigated by the end user.*


Model	Error Pattern	Notes
`custom:kimi-k2.6:cloud`	`413 Request Entity Too Large` (`llmContextExceeded`)	Persists even after compaction
`custom:deepseek-v4-pro:cloud`	`400 max_tokens exceeds model's maximum output tokens (65536)`	Requested 170,394 tokens
`custom:qwen3-vl:235b-cloud`	`400 max_tokens exceeds model's maximum output tokens (32768)`	Requested 64,000 tokens
`custom:glm-5.1:cloud`	`400 max_tokens exceeds model's maximum output tokens (131072)`	Requested 194,000 tokens

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Response Truncation Bug Report (max_tokens / Context Limit) #1183

Factory Droid — Response Truncation Bug Report (max_tokens / Context Limit)

Summary

Affected Models (Confirmed)

Error Log Excerpts

Pattern A: max_tokens Exceeds Output Limit

Pattern B: Context Size Exceeds Input Limit

Pattern C: Invalid Message Format After Compaction

Reproduction Steps

Root Cause Analysis

1. Broken max_tokens Calculation

2. Incomplete BYOK Model Spec Resolution

3. Compaction Side-Effects

Requested Fixes

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Response Truncation Bug Report (max_tokens / Context Limit) #1183

Description

Factory Droid — Response Truncation Bug Report (max_tokens / Context Limit)

Summary

Affected Models (Confirmed)

Error Log Excerpts

Pattern A: max_tokens Exceeds Output Limit

Pattern B: Context Size Exceeds Input Limit

Pattern C: Invalid Message Format After Compaction

Reproduction Steps

Root Cause Analysis

1. Broken max_tokens Calculation

2. Incomplete BYOK Model Spec Resolution

3. Compaction Side-Effects

Requested Fixes

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions