-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Strange Behavior in HuggingChat (Chat-UI) #1222
Comments
Hey @gururise do we know if the large chunk yielding is happening on together ai's side? Re: newline, what's a fix for this? I believe this is part of the string being returned by togetherai |
EDIT: I think I've confirmed there is something wrong/different with the together_ai implementation. If I use openai as the LLM provider with LiteLLM proxy, the application works as expected, but if I switch to together_ai as the LLM provider, things do not work nicely.
When I run litellm in debug mode, I can see the tokens being streamed individually.
Looking at the debug log when using together_ai, the newlines are escaped. Any ideas why? Is this something LiteLLM is doing? Here is a snippet of the debug log when I am using together_ai (notice the newline towards the end is escaped):
Alright, I think I have confirmed it is something to do with the way LITELLM is handling togetherai. When I continue to use the LiteLLM Proxy but switch the provider to openai (gpt-3.5-turbo) everything works exactly as expected. The streaming occurs token by token and the output is parsed correctly: TESTING LITELLM PROXY using OPEN AI (gpt-3.5-turbo): Snippet of debug log from openai as provider.
|
acknowledging this - will work on it today. thank you for the debugging so far @gururise |
Testing with this curl request:
I'm unable to repro the large chunk problem (see "content" for each line) cc: @gururise do you know what the exact call being received by litellm is? |
@gururise bump on this/ |
Seeing the same issue with formatting, TogetherAI with Mixtral-8x7B-Instruct-v0.1 The output is not formatted as reported above by OP |
I can see the tokens streamed Individually but as well, but like OP mentioned they are displayed in chunks at a time, as if the response is being first cached until it hits some sort of limit and is then displayed on Chat-UI. |
same behaviour in LibreChat as well so it looks like it's an issue with the proxy when using TogetherAI, and happens with any model on TogetherAI |
@gururise have you found a workaround to this issue or are you not using Together API's? |
Unfortunately, I have found no workaround in LiteLLM. I haven't had time to look further into this issue, perhaps if you have time to provide some more debugging information, @krrishdholakia can fix this issue. |
I'll do some further testing here, and try and repro this. I'm not seeing this when i just test the proxy chat completion endpoint w/ tgai and streaming on postman |
thanks @gururise |
@nigh8w0lf can you let me know if you're seeing this issue when making a normal curl request to the proxy endpoint? and also the version of litellm being used? |
@krrishdholakia I can see that the tokens are streamed when running in curl or when running the proxy in debug mode,the chunking seems to happen when the tokens are displayed in HF Chat-UI and LibreChat. there is also the formatting issue when the tokens are displayed in HF Chat-UI and LibreChat. |
sorry forget to mention the litellm version, I'm using 1.17.5 |
updated to Litellm 1.17.14, still the same issue. |
is this then a client side issue w/ Librechat / HF Chat UI? cc: @Manouchehri i believe you're also using us with librechat, are you seeing similar buffering? |
@nigh8w0lf do you see this buffering happening for a regular openai call via the proxy? I remember trying librechat w/ bedrock and that seemed to work fine |
I've been using Azure OpenAI, Bedrock, and Cohere. None of them had this issue from what I remember. =) |
@krrishdholakia it doesn't happen with any other API, only with TogetherAI |
@krrishdholakia Just to add, I tried HF Chat UI with LiteLLM(OpenAI API) and it worked as expected. As @nigh8w0lf says, this issue only occurs when using LiteLLM with TogetherAI. EDIT: If you look at the debug log I attached to an earlier comment. You can see that LiteLLM is returning escaped newline characters when used with TogetherAI. |
Related PR: I saw this with Sagemaker: #1569 |
Pushed a fix for TogetherAI will be live on 1.18.13 @gururise @nigh8w0lf can I get your help confirming the issue is fixed on 1.18.3+ ? |
The chunking issue seems to be fixed, I can see the response being streamed correctly. The formatting issue is still present for TogetherAI API. Seeing some new behavior after this update, HF Chat-UI thinks the response is incomplete? the Continue button appears after the response has been streamed completely, I have not seen this before, it's happening with all API's
|
Don't see the "Continue" button issue when using HF Chat-UI without the proxy. |
Do you see this when calling openai directly? @nigh8w0lf |
no, I don't see it when using the openai directly, I see it only when using the proxy. |
when I say directly, I mean using HF Chat-UI without Litellm. |
I have switched to Librechat as the frontend, the "Continue" issue is no longer a concern but the formatting issue is still exists on v1.20.0 |
@nigh8w0lf can we track the formatting bug on a new issue - since this issue was about the togetheriAI chunks hanging? |
@ishaan-jaff Sure I can log a new bug report, the Initial bug report above mentions both issues by the way hence why I was continuing here. |
What happened?
When using LiteLLM as proxy for Together.ai and Mistral-7b-Instruct-v0.2 there are two strange issues that occur during inference when using the Chat-UI frontend by Huggingface.
As seen below, when using gpt-3.5-turbo, the formatting is fine and the streaming word-by-word works:
Here is MODELS from .env.local for litellm proxy:
To set up the gpt-3.5-turbo model:
Relevant log output
No response
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: