Skip to content

OpenAI Chat Completions streaming silently drops token usage (Usage chunk arrives after Finish, never consumed) #48

@TYRMars

Description

@TYRMars

Summary

On the default OpenAI Chat Completions streaming path, token usage is never reported. OpenAI ships the usage payload as a separate final SSE chunk (choices: []) that arrives after the finish_reason chunk, but the agent loop breaks out of the stream the moment it sees Finish — so the later Usage chunk is never consumed.

Details

The provider sets stream_options.include_usage: true unconditionally for streaming, and the accumulator emits Usage and Finish from whichever chunk carries each:

OpenAI delivers these in separate chunks: the finish_reason chunk first, then a usage-only chunk (choices: []) last. So Finish is emitted during an earlier ingest(). The agent consumer then stops:

The subsequently-yielded Usage chunk is dropped. The other three providers emit Usage before Finish (Anthropic anthropic.rs:933, Responses responses.rs:1164, Google google.rs:644), so only OpenAI Chat Completions is affected.

Impact

No functional break, but billing/telemetry Usage events are silently lost on the default provider's streaming path — Usage accounting reads zero. Severity: medium.

Suggested fix

Either keep draining the stream after Finish until it closes (consuming a trailing Usage), or have the OpenAI accumulator buffer the Finish until the usage-only terminal chunk has been ingested so Usage is emitted before Finish like the other providers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions