Skip to content

[Bug]: Streaming not functioning for 'gemini-2.5-flash' and 'gemini-2.5-pro' models in version litellm: main-v1.72.2.rc #11582

@YuriyTW

Description

@YuriyTW

What happened?

Description:
Thank you for the well-optimized new version, it is indeed faster and more economical!

However, in the latest version of Lite LLM, main-v1.72.2.rc, I encountered an issue where streaming does not function properly for the gemini-2.5-flash and gemini-2.5-pro models. Streaming works correctly with earlier models such as gemini-2.0-flash, but does not work with newer ones. All other models from open-ai, claude work fine!
Everything worked on an earlier version of litellm litellm_stable_release_branch-v1.72.0.rc1

Steps to reproduce:

import OpenAI from “openai”;

const litellmClient = new OpenAI({
    apiKey: “sk-.........”,
    baseURL: `http://litellm.........:4000`,
});
(async () => {
    const stream = await litellmClient.chat.completions.create({
        model: “gemini-2.5-flash”, // also try “gemini-2.5-pro”
        messages: [
            {
                role: “user”,
                content: “Say ‘double bubble bath’ ten times fast.”,
            }
        ],
        stream: true,
    });

    let content = “”;
    let countChunk = 0;
    for await (const chunk of stream) {
        console.log(chunk);
        console.log(chunk.choices[0].delta);
        content += chunk.choices[0].delta.content || “”;
        countChunk++;
        console.log(“****************”);
}
console.log(“>>> Total chunks received:”, countChunk);
console.log(“>>> Final content:\n”, content);
})();

For gemini-2.0-flash model. Its ok! These are the latest messages in the logs

****************
{
  id: '6gBIaLfDPPesgLUPp9GE4Q0',
  created: 1749549291,
  model: 'gemini-2.0-flash-001',
  object: 'chat.completion.chunk',
  choices: [ { index: 0, delta: [Object] } ]
}
{
  content: ' double bubble bath, double bubble bath, double bubble bath.\n'
}
****************
{
  id: '6gBIaLfDPPesgLUPp9GE4Q0',
  created: 1749549291,
  model: 'gemini-2.0-flash-001',
  object: 'chat.completion.chunk',
  choices: [ { finish_reason: 'stop', index: 0, delta: {} } ]
}
{}
****************
>>> Total chunks received: 6
>>> Final content:
 Okay, I'll try!

Double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath.

But! For gemini-2.5-flash and gemini-2.5-pro models, I get

{
  id: 'RQVIaI-qDZGsgLUP09ePqAE',
  created: 1749550405,
  model: 'gemini-2.5-flash-preview-05-20',
  object: 'chat.completion.chunk',
  choices: [ { finish_reason: 'stop', index: 0, delta: {} } ]
}
{}
****************
>>> Total chunks received: 1
>>> Final content:

As a result, empty content is displayed.

All other endpoints are working fine.
Please help as soon as possible

Relevant log output

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

1.72.2

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions