Skip to content

fix: report finish_reason "length" when output hits the token limit#10

Merged
Defilan merged 1 commit into
defilantech:mainfrom
Defilan:fix/finish-reason-length
May 18, 2026
Merged

fix: report finish_reason "length" when output hits the token limit#10
Defilan merged 1 commit into
defilantech:mainfrom
Defilan:fix/finish-reason-length

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented May 18, 2026

What

Chat completions truncated at max_tokens now report finish_reason: "length" instead of "stop".

Why

Both the streaming and non-streaming paths in ChatCompletion.swift hardcoded finish_reason to "tool_calls" or "stop". A response cut off at the token limit was reported as a natural "stop". OpenAI clients use finish_reason to decide whether to continue a truncated response, so this misled them. Found dogfooding the LLMKube metal-agent mlx-server runtime: three runs each generated exactly max_tokens (512) tokens, all reporting "stop".

How

A finishReason helper returns "tool_calls" when the model emitted tool calls, "length" when the generated token count reached the requested limit, otherwise "stop". Truncation is inferred by comparing generationTokenCount against parameters.maxTokens, since the generator does not surface a stop reason directly. Both completion paths use it. Unit-tested.

Fixes #9

Both the streaming and non-streaming chat-completion paths hardcoded
finish_reason to "tool_calls" or "stop", so a response truncated at
max_tokens was reported as a natural "stop". OpenAI clients rely on
finish_reason to decide whether to continue a cut-off response.

Add a finishReason helper: "length" when the generated token count
reaches the requested limit, "tool_calls" when the model emitted tool
calls, otherwise "stop". Used by both completion paths.

Fixes defilantech#9

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan merged commit 6f0061f into defilantech:main May 18, 2026
1 check passed
@Defilan Defilan deleted the fix/finish-reason-length branch May 18, 2026 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] finish_reason is "stop" when output is truncated at max_tokens

1 participant