fix(usage): accept MiniMax-style chat.completion final SSE frame#792
Merged
Conversation
MiniMax (and possibly other OpenAI-compatible providers) close a streaming response with a non-streaming `chat.completion` frame instead of a true `chat.completion.chunk`. That final frame is the only one carrying the authoritative usage block — earlier chunks send only zero placeholders. Before this fix BitFun dropped the frame at two layers: 1. The weak validator in stream_handler/openai.rs required the object string to be exactly "chat.completion.chunk" and labeled MiniMax's final frame as `skip:non_standard_event`, so it never reached deserialization. 2. Even if the validator had let it through, the choice in that frame uses `message` instead of `delta`, and OpenAISSEData required `Choice.delta` to be present. The net effect: BitFun recorded 0 input, 0 output, 0 cached for every MiniMax call. Fix both layers: - Validator now accepts both `chat.completion.chunk` and `chat.completion` object strings. - `Choice.delta` is now `#[serde(default)]` with `Delta` deriving `Default`, so frames lacking `delta` parse cleanly. We don't need the frame's content (earlier chunks streamed it); we only need top-level usage and finish_reason to propagate. Each fix has a regression test reproducing MiniMax's observed on-the-wire shape captured from a live MiniMax-M2.7-highspeed response.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
MiniMax (and possibly other OpenAI-compatible providers) close a streaming response with a non-streaming
chat.completionframe instead of a truechat.completion.chunk. That final frame is the only one carrying the authoritative usage block — earlier chunks send only zero placeholders. Before this fix BitFun dropped the frame at two layers:skip:non_standard_event, so it never reached deserialization.messageinstead ofdelta, and OpenAISSEData requiredChoice.deltato be present.The net effect: BitFun recorded 0 input, 0 output, 0 cached for every MiniMax call. Fix both layers:
chat.completion.chunkandchat.completionobject strings.Choice.deltais now#[serde(default)]withDeltaderivingDefault, so frames lackingdeltaparse cleanly. We don't need the frame's content (earlier chunks streamed it); we only need top-level usage and finish_reason to propagate.Each fix has a regression test reproducing MiniMax's observed on-the-wire shape captured from a live MiniMax-M2.7-highspeed response.