[Bug]: Strange Behavior in HuggingChat (Chat-UI) #1222

gururise · 2023-12-22T06:15:17Z

What happened?

When using LiteLLM as proxy for Together.ai and Mistral-7b-Instruct-v0.2 there are two strange issues that occur during inference when using the Chat-UI frontend by Huggingface.

The text is displayed in large chunks rather than streamed to the UI word by word
The formatting is messed up and not respected. Newlines are ignored:

As seen below, when using gpt-3.5-turbo, the formatting is fine and the streaming word-by-word works:

Here is MODELS from .env.local for litellm proxy:

MODELS=`[
    {
      "name": "mistral-7b",
      "displayName": "mistralai/Mistral-7B-Instruct-v0.2",
      "description": "Mistral 7B is a new Apache 2.0 model, released by Mistral AI that outperforms Llama2 13B in benchmarks.",
      "websiteUrl": "https://mistral.ai/news/announcing-mistral-7b/",
      "preprompt": "",
      "chatPromptTemplate" : "<s>{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s>{{/ifAssistant}}{{/each}}",
      "parameters": {
        "temperature": 0.1,
        "top_p": 0.95,
        "repetition_penalty": 1.2,
        "top_k": 50,
        "truncate": 3072,
        "max_new_tokens": 1024,
        "stop": ["</s>"]
      },
      "endpoints": [{
        "type" : "openai",
        "baseURL": "http://localhost:8000/v1"
      }]
      "promptExamples": [
        {
          "title": "Write an email from bullet list",
          "prompt": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)"
        }, {
          "title": "Code a snake game",
          "prompt": "Code a basic snake game in python, give explanations for each step."
        }, {
          "title": "Assist in a task",
          "prompt": "How do I make a delicious lemon cheesecake?"
        }
      ]
    }
]`

To set up the gpt-3.5-turbo model:

MODELS=`[
    {
      "name": "gpt-3.5-turbo",
      "displayName": "GPT 3.5 Turbo",
      "endpoints" : [{
        "type": "openai"
      }],
      "promptExamples": [
        {
          "title": "Write an email from bullet list",
          "prompt": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)"
        }, {
          "title": "Code a snake game",
          "prompt": "Code a basic snake game in python, give explanations for each step."
        }, {
          "title": "Assist in a task",
          "prompt": "How do I make a delicious lemon cheesecake?"
        }
      ]
    }
]`

Relevant log output

No response

Twitter / LinkedIn details

No response

The text was updated successfully, but these errors were encountered:

krrishdholakia · 2023-12-22T07:03:45Z

Hey @gururise do we know if the large chunk yielding is happening on together ai's side?

Re: newline, what's a fix for this? I believe this is part of the string being returned by togetherai

gururise · 2023-12-22T16:49:26Z

EDIT: I think I've confirmed there is something wrong/different with the together_ai implementation. If I use openai as the LLM provider with LiteLLM proxy, the application works as expected, but if I switch to together_ai as the LLM provider, things do not work nicely.

Hey @gururise do we know if the large chunk yielding is happening on together ai's side?

When I run litellm in debug mode, I can see the tokens being streamed individually.

Re: newline, what's a fix for this? I believe this is part of the string being returned by togetherai

Looking at the debug log when using together_ai, the newlines are escaped. Any ideas why? Is this something LiteLLM is doing?

Here is a snippet of the debug log when I am using together_ai (notice the newline towards the end is escaped):

_reason=None, index=0, delta=Delta(content=' Number', role=None))], created=1703264237, model='mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
success callbacks: [<litellm.proxy.hooks.parallel_request_limiter.MaxParallelRequestsHandler object at 0x7f552a323fd0>]
returned chunk: ModelResponse(id='chatcmpl-dcffe220-a020-4eab-80df-df0214839ccb', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=' Number', role=None))], created=1703264237, model='mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
value of chunk: b'' 
value of chunk: b'data: {"choices":[{"text":"]"}],"id":"8399e7011ff32f77-LAX","token":{"id":28793,"text":"]","logprob":-0.0066871643,"special":false},"generated_text":null,"details":null,"stats":null,"usage":null}' 
PROCESSED CHUNK PRE CHUNK CREATOR: b'data: {"choices":[{"text":"]"}],"id":"8399e7011ff32f77-LAX","token":{"id":28793,"text":"]","logprob":-0.0066871643,"special":false},"generated_text":null,"details":null,"stats":null,"usage":null}'
model_response: ModelResponse(id='chatcmpl-b4739587-4105-4f86-9e8b-dabafae4645b', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=None, role=None))], created=1703264237, model='mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion.chunk', system_fingerprint=None, usage=Usage()); completion_obj: {'content': ']'}
model_response finish reason 3: None
hold - False, model_response_str - ]
model_response: ModelResponse(id='chatcmpl-b4739587-4105-4f86-9e8b-dabafae4645b', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=']', role=None))], created=1703264237, model='mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
PROCESSED CHUNK POST CHUNK CREATOR: ModelResponse(id='chatcmpl-b4739587-4105-4f86-9e8b-dabafae4645b', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=']', role=None))], created=1703264237, model='mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
Logging Details LiteLLM-Success Call
success callbacks: [<litellm.proxy.hooks.parallel_request_limiter.MaxParallelRequestsHandler object at 0x7f552a323fd0>]
line in async streaming: ModelResponse(id='chatcmpl-b4739587-4105-4f86-9e8b-dabafae4645b', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=']', role=None))], created=1703264237, model='mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
returned chunk: ModelResponse(id='chatcmpl-b4739587-4105-4f86-9e8b-dabafae4645b', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=']', role=None))], created=1703264237, model='mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
value of chunk: b'' 
value of chunk: b'data: {"choices":[{"text":"\\n"}],"id":"8399e7011ff32f77-LAX","token":{"id":13,"text":"\\n","logprob":-0.000019788742,"special":false},"generated_text":null,"details":null,"stats":null,"usage":null}' 
PROCESSED CHUNK PRE CHUNK CREATOR: b'data: {"choices":[{"text":"\\n"}],"id":"8399e7011ff32f77-LAX","token":{"id":13,"text":"\\n","logprob":-0.000019788742,"special":false},"generated_text":null,"details":null,"stats":null,"usage":null}'
model_response: ModelResponse(id='chatcmpl-5745b053-586c-4bb2-be7c-9de42c721c31', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=None, role=None))], created=1703264237, model='mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion.chunk', system_fingerprint=None, usage=Usage()); completion_obj: {'content': '\\n'}
model_response finish reason 3: None
hold - False, model_response_str - \n

Alright, I think I have confirmed it is something to do with the way LITELLM is handling togetherai. When I continue to use the LiteLLM Proxy but switch the provider to openai (gpt-3.5-turbo) everything works exactly as expected. The streaming occurs token by token and the output is parsed correctly:

TESTING LITELLM PROXY using OPEN AI (gpt-3.5-turbo):

Snippet of debug log from openai as provider.

completion obj content: Restaurant
model_response: ModelResponse(id='chatcmpl-a0cb7cb8-2691-4760-9ccb-3f7f438e2cfe', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=None, role=None))], created=1703264355, model='gpt-3.5-turbo', object='chat.completion.chunk', system_fingerprint=None, usage=Usage()); completion_obj: {'content': 'Restaurant'}
model_response finish reason 3: None
hold - False, model_response_str - Restaurant
model_response: ModelResponse(id='chatcmpl-8Yd9YizyHpZHuTezfVJLO7JPPDkVa', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(tool_calls=None, function_call=None, content='Restaurant', role=None))], created=1703264355, model='gpt-3.5-turbo', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
PROCESSED ASYNC CHUNK POST CHUNK CREATOR: ModelResponse(id='chatcmpl-8Yd9YizyHpZHuTezfVJLO7JPPDkVa', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(tool_calls=None, function_call=None, content='Restaurant', role=None))], created=1703264355, model='gpt-3.5-turbo', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
Logging Details LiteLLM-Success Call
line in async streaming: ModelResponse(id='chatcmpl-8Yd9YizyHpZHuTezfVJLO7JPPDkVa', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(tool_calls=None, function_call=None, content='Restaurant', role=None))], created=1703264355, model='gpt-3.5-turbo', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
success callbacks: [<litellm.proxy.hooks.parallel_request_limiter.MaxParallelRequestsHandler object at 0x7f40ab127f10>]
returned chunk: ModelResponse(id='chatcmpl-8Yd9YizyHpZHuTezfVJLO7JPPDkVa', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(tool_calls=None, function_call=None, content='Restaurant', role=None))], created=1703264355, model='gpt-3.5-turbo', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
INSIDE ASYNC STREAMING!!!
value of async completion stream: <openai.AsyncStream object at 0x7f40a7d0feb0>
value of async chunk: ChatCompletionChunk(id='chatcmpl-8Yd9YizyHpZHuTezfVJLO7JPPDkVa', choices=[Choice(delta=ChoiceDelta(content=' Name', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1703264348, model='gpt-3.5-turbo-0613', object='chat.completion.chunk', system_fingerprint=None)
PROCESSED ASYNC CHUNK PRE CHUNK CREATOR: ChatCompletionChunk(id='chatcmpl-8Yd9YizyHpZHuTezfVJLO7JPPDkVa', choices=[Choice(delta=ChoiceDelta(content=' Name', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1703264348, model='gpt-3.5-turbo-0613', object='chat.completion.chunk', system_fingerprint=None)

krrishdholakia · 2023-12-23T01:37:20Z

acknowledging this - will work on it today. thank you for the debugging so far @gururise

krrishdholakia · 2023-12-25T01:16:14Z

Looking at the raw tgai call, doesn't look like they're streaming in chunks.

krrishdholakia · 2023-12-25T01:56:14Z

Running with together_ai/mistralai/Mistral-7B-Instruct-v0.2, unable to repro with trivial example

krrishdholakia · 2023-12-25T02:02:09Z

Testing with this curl request:

curl --location 'http://0.0.0.0:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-1234' \
--data '{
  "model": "tgai-mistral",
  "messages": [
        {
          "role": "user",
          "content": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)"
        }
      ],
  "stream": true,
  "temperature": 0.1,
  "top_p": 0.95,
  "repetition_penalty": 1.2,
  "top_k": 50,
  "truncate": 3072,
  "max_new_tokens": 1024,
  "stop": ["</s>"]
}'

I'm unable to repro the large chunk problem (see "content" for each line)

cc: @gururise do you know what the exact call being received by litellm is?

krrishdholakia · 2023-12-26T04:09:05Z

@gururise bump on this/

nigh8w0lf · 2024-01-15T11:32:15Z

Seeing the same issue with formatting, TogetherAI with Mixtral-8x7B-Instruct-v0.1

The output is not formatted as reported above by OP
I'm using the litellm proxy server.
Used huggingface chat-ui and LibreChat, both had the same problem with formatting.

nigh8w0lf · 2024-01-15T16:43:38Z

I can see the tokens streamed Individually but as well, but like OP mentioned they are displayed in chunks at a time, as if the response is being first cached until it hits some sort of limit and is then displayed on Chat-UI.
Will test on LibreChat to see if it's the same behaviour.

nigh8w0lf · 2024-01-15T16:57:46Z

same behaviour in LibreChat as well so it looks like it's an issue with the proxy when using TogetherAI, and happens with any model on TogetherAI

nigh8w0lf · 2024-01-17T16:22:02Z

@gururise have you found a workaround to this issue or are you not using Together API's?

gururise · 2024-01-17T16:46:47Z

@gururise have you found a workaround to this issue or are you not using Together API's?

Unfortunately, I have found no workaround in LiteLLM. I haven't had time to look further into this issue, perhaps if you have time to provide some more debugging information, @krrishdholakia can fix this issue.

krrishdholakia · 2024-01-17T17:17:56Z

I'll do some further testing here, and try and repro this. I'm not seeing this when i just test the proxy chat completion endpoint w/ tgai and streaming on postman

nigh8w0lf · 2024-01-17T17:22:51Z

thanks @gururise
@krrishdholakia happy to help with debugging info.

krrishdholakia · 2024-01-17T17:48:01Z

@nigh8w0lf can you let me know if you're seeing this issue when making a normal curl request to the proxy endpoint?

and also the version of litellm being used?

nigh8w0lf · 2024-01-17T18:00:29Z

@krrishdholakia I can see that the tokens are streamed when running in curl or when running the proxy in debug mode,the chunking seems to happen when the tokens are displayed in HF Chat-UI and LibreChat. there is also the formatting issue when the tokens are displayed in HF Chat-UI and LibreChat.

nigh8w0lf · 2024-01-17T18:10:33Z

sorry forget to mention the litellm version, I'm using 1.17.5

nigh8w0lf · 2024-01-17T19:23:50Z

updated to Litellm 1.17.14, still the same issue.
Wondering if the chunking is because the API is too fast 😆

krrishdholakia · 2024-01-17T20:23:36Z

is this then a client side issue w/ Librechat / HF Chat UI?

cc: @Manouchehri i believe you're also using us with librechat, are you seeing similar buffering?

krrishdholakia · 2024-01-17T20:24:08Z

@nigh8w0lf do you see this buffering happening for a regular openai call via the proxy?

I remember trying librechat w/ bedrock and that seemed to work fine

Manouchehri · 2024-01-17T20:33:56Z

I've been using Azure OpenAI, Bedrock, and Cohere. None of them had this issue from what I remember. =)

nigh8w0lf · 2024-01-17T20:39:23Z

@krrishdholakia it doesn't happen with any other API, only with TogetherAI

gururise · 2024-01-18T22:42:32Z

@krrishdholakia Just to add, I tried HF Chat UI with LiteLLM(OpenAI API) and it worked as expected. As @nigh8w0lf says, this issue only occurs when using LiteLLM with TogetherAI.

EDIT: If you look at the debug log I attached to an earlier comment. You can see that LiteLLM is returning escaped newline characters when used with TogetherAI.

ishaan-jaff · 2024-01-24T03:54:04Z

Related PR: I saw this with Sagemaker: #1569

ishaan-jaff · 2024-01-24T04:09:22Z

Pushed a fix for TogetherAI will be live on 1.18.13

2d26875

@gururise @nigh8w0lf can I get your help confirming the issue is fixed on 1.18.3+ ?

nigh8w0lf · 2024-01-24T13:11:17Z

The chunking issue seems to be fixed, I can see the response being streamed correctly.

The formatting issue is still present for TogetherAI API.

Seeing some new behavior after this update, HF Chat-UI thinks the response is incomplete? the Continue button appears after the response has been streamed completely, I have not seen this before, it's happening with all API's

@gururise does it happen for you?

nigh8w0lf · 2024-01-24T13:36:16Z

Don't see the "Continue" button issue when using HF Chat-UI without the proxy.

krrishdholakia · 2024-01-24T15:44:58Z

Do you see this when calling openai directly? @nigh8w0lf

nigh8w0lf · 2024-01-24T16:10:29Z

Do you see this when calling openai directly? @nigh8w0lf

no, I don't see it when using the openai directly, I see it only when using the proxy.

nigh8w0lf · 2024-01-24T16:13:53Z

when I say directly, I mean using HF Chat-UI without Litellm.

nigh8w0lf · 2024-02-01T22:13:28Z

I have switched to Librechat as the frontend, the "Continue" issue is no longer a concern but the formatting issue is still exists on v1.20.0

ishaan-jaff · 2024-02-02T15:56:37Z

@nigh8w0lf can we track the formatting bug on a new issue - since this issue was about the togetheriAI chunks hanging?

nigh8w0lf · 2024-02-02T22:16:54Z

@ishaan-jaff Sure I can log a new bug report, the Initial bug report above mentions both issues by the way hence why I was continuing here.

nigh8w0lf · 2024-02-03T04:59:16Z

#1792 @ishaan-jaff

gururise added the bug Something isn't working label Dec 22, 2023

gururise mentioned this issue Dec 22, 2023

Formatting is incorrect when using LiteLLM (Together.ai) huggingface/chat-ui#649

Closed

krrishdholakia self-assigned this Dec 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Strange Behavior in HuggingChat (Chat-UI) #1222

[Bug]: Strange Behavior in HuggingChat (Chat-UI) #1222

gururise commented Dec 22, 2023 •

edited

Loading

krrishdholakia commented Dec 22, 2023

gururise commented Dec 22, 2023 •

edited

Loading

krrishdholakia commented Dec 23, 2023

krrishdholakia commented Dec 25, 2023

krrishdholakia commented Dec 25, 2023

krrishdholakia commented Dec 25, 2023

krrishdholakia commented Dec 26, 2023

nigh8w0lf commented Jan 15, 2024 •

edited

Loading

nigh8w0lf commented Jan 15, 2024

nigh8w0lf commented Jan 15, 2024

nigh8w0lf commented Jan 17, 2024

gururise commented Jan 17, 2024

krrishdholakia commented Jan 17, 2024 •

edited

Loading

nigh8w0lf commented Jan 17, 2024

krrishdholakia commented Jan 17, 2024

nigh8w0lf commented Jan 17, 2024

nigh8w0lf commented Jan 17, 2024

nigh8w0lf commented Jan 17, 2024

krrishdholakia commented Jan 17, 2024

krrishdholakia commented Jan 17, 2024

Manouchehri commented Jan 17, 2024

nigh8w0lf commented Jan 17, 2024

gururise commented Jan 18, 2024 •

edited

Loading

ishaan-jaff commented Jan 24, 2024 •

edited

Loading

ishaan-jaff commented Jan 24, 2024

nigh8w0lf commented Jan 24, 2024 •

edited

Loading

nigh8w0lf commented Jan 24, 2024

krrishdholakia commented Jan 24, 2024

nigh8w0lf commented Jan 24, 2024

nigh8w0lf commented Jan 24, 2024

nigh8w0lf commented Feb 1, 2024

ishaan-jaff commented Feb 2, 2024

nigh8w0lf commented Feb 2, 2024

nigh8w0lf commented Feb 3, 2024

[Bug]: Strange Behavior in HuggingChat (Chat-UI) #1222

[Bug]: Strange Behavior in HuggingChat (Chat-UI) #1222

Comments

gururise commented Dec 22, 2023 • edited Loading

What happened?

Relevant log output

Twitter / LinkedIn details

krrishdholakia commented Dec 22, 2023

gururise commented Dec 22, 2023 • edited Loading

krrishdholakia commented Dec 23, 2023

krrishdholakia commented Dec 25, 2023

krrishdholakia commented Dec 25, 2023

krrishdholakia commented Dec 25, 2023

krrishdholakia commented Dec 26, 2023

nigh8w0lf commented Jan 15, 2024 • edited Loading

nigh8w0lf commented Jan 15, 2024

nigh8w0lf commented Jan 15, 2024

nigh8w0lf commented Jan 17, 2024

gururise commented Jan 17, 2024

krrishdholakia commented Jan 17, 2024 • edited Loading

nigh8w0lf commented Jan 17, 2024

krrishdholakia commented Jan 17, 2024

nigh8w0lf commented Jan 17, 2024

nigh8w0lf commented Jan 17, 2024

nigh8w0lf commented Jan 17, 2024

krrishdholakia commented Jan 17, 2024

krrishdholakia commented Jan 17, 2024

Manouchehri commented Jan 17, 2024

nigh8w0lf commented Jan 17, 2024

gururise commented Jan 18, 2024 • edited Loading

ishaan-jaff commented Jan 24, 2024 • edited Loading

ishaan-jaff commented Jan 24, 2024

nigh8w0lf commented Jan 24, 2024 • edited Loading

nigh8w0lf commented Jan 24, 2024

krrishdholakia commented Jan 24, 2024

nigh8w0lf commented Jan 24, 2024

nigh8w0lf commented Jan 24, 2024

nigh8w0lf commented Feb 1, 2024

ishaan-jaff commented Feb 2, 2024

nigh8w0lf commented Feb 2, 2024

nigh8w0lf commented Feb 3, 2024

gururise commented Dec 22, 2023 •

edited

Loading

gururise commented Dec 22, 2023 •

edited

Loading

nigh8w0lf commented Jan 15, 2024 •

edited

Loading

krrishdholakia commented Jan 17, 2024 •

edited

Loading

gururise commented Jan 18, 2024 •

edited

Loading

ishaan-jaff commented Jan 24, 2024 •

edited

Loading

nigh8w0lf commented Jan 24, 2024 •

edited

Loading