Emit a usage chunk on streamed completions by Defilan · Pull Request #3 · defilantech/mlx-server

Defilan · 2026-05-15T21:20:00Z

What

Add a token-usage block to streamed chat completions.

Why

Streaming responses ended with [DONE] and no token counts, so clients (opencode) could not report context-window consumption for a streamed turn. Only non-streaming responses carried usage.

How

The .finished stream event already carried the Usage — the streaming handler discarded it. Add an optional usage to ChatCompletionChunk and emit an OpenAI-style trailing usage chunk (empty choices, populated usage) just before [DONE]. usage is omitted on every other chunk.

Verified

swift build + 28 tests pass; the routes streaming test now asserts the stream carries prompt_tokens.

Streaming chat completions ended with `[DONE]` and no token counts, so clients (opencode) could not report context-window consumption for a streamed turn — only non-streaming responses carried `usage`. The `.finished` stream event already carried the `Usage`; the streaming handler discarded it. Add `usage` to `ChatCompletionChunk` and emit an OpenAI-style trailing usage chunk (empty `choices`, populated `usage`) before `[DONE]`.

Defilan merged commit 18775f8 into main May 15, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit a usage chunk on streamed completions#3

Emit a usage chunk on streamed completions#3
Defilan merged 1 commit into
mainfrom
feat/streaming-usage

Defilan commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Defilan commented May 15, 2026

What

Why

How

Verified

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant