-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
What happened?
Description:
Thank you for the well-optimized new version, it is indeed faster and more economical!
However, in the latest version of Lite LLM, main-v1.72.2.rc
, I encountered an issue where streaming does not function properly for the gemini-2.5-flash
and gemini-2.5-pro
models. Streaming works correctly with earlier models such as gemini-2.0-flash
, but does not work with newer ones. All other models from open-ai, claude work fine!
Everything worked on an earlier version of litellm litellm_stable_release_branch-v1.72.0.rc1
Steps to reproduce:
import OpenAI from “openai”;
const litellmClient = new OpenAI({
apiKey: “sk-.........”,
baseURL: `http://litellm.........:4000`,
});
(async () => {
const stream = await litellmClient.chat.completions.create({
model: “gemini-2.5-flash”, // also try “gemini-2.5-pro”
messages: [
{
role: “user”,
content: “Say ‘double bubble bath’ ten times fast.”,
}
],
stream: true,
});
let content = “”;
let countChunk = 0;
for await (const chunk of stream) {
console.log(chunk);
console.log(chunk.choices[0].delta);
content += chunk.choices[0].delta.content || “”;
countChunk++;
console.log(“****************”);
}
console.log(“>>> Total chunks received:”, countChunk);
console.log(“>>> Final content:\n”, content);
})();
For gemini-2.0-flash
model. Its ok! These are the latest messages in the logs
****************
{
id: '6gBIaLfDPPesgLUPp9GE4Q0',
created: 1749549291,
model: 'gemini-2.0-flash-001',
object: 'chat.completion.chunk',
choices: [ { index: 0, delta: [Object] } ]
}
{
content: ' double bubble bath, double bubble bath, double bubble bath.\n'
}
****************
{
id: '6gBIaLfDPPesgLUPp9GE4Q0',
created: 1749549291,
model: 'gemini-2.0-flash-001',
object: 'chat.completion.chunk',
choices: [ { finish_reason: 'stop', index: 0, delta: {} } ]
}
{}
****************
>>> Total chunks received: 6
>>> Final content:
Okay, I'll try!
Double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath, double bubble bath.
But! For gemini-2.5-flash
and gemini-2.5-pro
models, I get
{
id: 'RQVIaI-qDZGsgLUP09ePqAE',
created: 1749550405,
model: 'gemini-2.5-flash-preview-05-20',
object: 'chat.completion.chunk',
choices: [ { finish_reason: 'stop', index: 0, delta: {} } ]
}
{}
****************
>>> Total chunks received: 1
>>> Final content:
As a result, empty content is displayed.
All other endpoints are working fine.
Please help as soon as possible
Relevant log output
Are you a ML Ops Team?
Yes
What LiteLLM version are you on ?
1.72.2
Twitter / LinkedIn details
No response