Skip to content

Response Truncation Bug Report (max_tokens / Context Limit) #1183

@macosai

Description

@macosai

Factory Droid — Response Truncation Bug Report (max_tokens / Context Limit)

Summary

During long conversation sessions, assistant responses are being cut off mid-sentence across multiple BYOK models. Investigation shows this is caused by the Factory client not correctly enforcing model-specific output token limits, or the session context exceeding the model's input token limit.

Affected Models (Confirmed)

Model Error Pattern Notes
custom:kimi-k2.6:cloud 413 Request Entity Too Large (llmContextExceeded) Persists even after compaction
custom:deepseek-v4-pro:cloud 400 max_tokens exceeds model's maximum output tokens (65536) Requested 170,394 tokens
custom:qwen3-vl:235b-cloud 400 max_tokens exceeds model's maximum output tokens (32768) Requested 64,000 tokens
custom:glm-5.1:cloud 400 max_tokens exceeds model's maximum output tokens (131072) Requested 194,000 tokens

All models have isByok: true.

Error Log Excerpts

Pattern A: max_tokens Exceeds Output Limit

400 "max_tokens (170394) exceeds model's maximum output tokens (65536) 
for model deepseek-v4-pro (ref: ...)"

Pattern B: Context Size Exceeds Input Limit

413 "Request Entity Too Large (ref: ...)"
reason: "llmContextExceeded"

Pattern C: Invalid Message Format After Compaction

400 "invalid message format"
Occurs after Compaction (reason: context_limit)

Reproduction Steps

  1. Select any of the BYOK models above.
  2. Continue a long conversation (including tool calls and large file reads).
  3. Input tokens accumulate in the session.
  4. On the next turn, the model call fails with 400 or 413, and the response is truncated.

Root Cause Analysis

1. Broken max_tokens Calculation

The client appears to compute max_tokens = context_window - input_tokens, but ignores the model-specific maximum output token limit, resulting in API returning 400 Bad Request.

Correct calculation should be:

max_tokens = min(context_window - input_tokens, model_max_output_tokens)

2. Incomplete BYOK Model Spec Resolution

Logs show Unknown model, falling back to default when resolving getTuiModelConfig for BYOK models, indicating the client may not be retrieving the correct token limits for custom:<model>:cloud aliases.

3. Compaction Side-Effects

When llmContextExceeded occurs, Factory compacts the session, but the resulting message structure can trigger invalid message format on some models (notably kimi-k2.6).

Requested Fixes

  1. Add per-model max_output_tokens hard caps

    • deepseek-v4-pro → 65,536
    • qwen3-vl → 32,768
    • glm-5.1 → 131,072
    • kimi-k2.6 → limit according to model specs
  2. Fix max_tokens computation

    • Always clip to model_max_output_tokens
  3. Improve BYOK model spec resolution

    • Ensure custom:<model>:cloud aliases resolve to correct limits
  4. Validate message structure after Compaction

    • Ensure compacted summaries conform to each model's format constraints

Environment

  • OS: Windows 11 (win32 10.0.26200)
  • Factory Droid versions: 0.105.0 through 0.137.1 (reproduced across versions)
  • Installation ID: af46be50-2fc2-4e76-b07b-30aeab5ee2b0

This is a client-side model invocation control issue that cannot be mitigated by the end user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions