Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Unable to decode chunks from my OpenAI server #1475

Closed
odrobnik opened this issue May 21, 2024 · 14 comments · Fixed by #1487
Closed

[BUG]: Unable to decode chunks from my OpenAI server #1475

odrobnik opened this issue May 21, 2024 · 14 comments · Fixed by #1487
Assignees
Labels
possible bug Bug was reported but is not confirmed or is unable to be replicated.

Comments

@odrobnik
Copy link

odrobnik commented May 21, 2024

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

I am working on my own OpenAI-compatible local server, for now I am decoding the chunks from OpenAI and re-encode them. That changes the order of fields somewhat, but otherwise the JSON is identical. AnythingLLM is unable to properly decode the actual message. It shows an empty message

Are there known steps to reproduce?

These are the streamed lines that it should be able to decode:

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: {"choices":[{"delta":{"content":"Hello"},"index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: {"choices":[{"delta":{"content":"!"},"index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: {"choices":[{"delta":{"content":" How"},"index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: {"choices":[{"delta":{"content":" can"},"index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: {"choices":[{"delta":{"content":" I"},"index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: {"choices":[{"delta":{"content":" assist"},"index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: {"choices":[{"delta":{"content":" you"},"index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: {"choices":[{"delta":{"content":" today"},"index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: {"choices":[{"delta":{"content":"?"},"index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: {"choices":[{"delta":{},"finish_reason":"stop","index":0}],"created":1716325454,"id":"chatcmpl-9RQwYj6NgMZrCth2IHxS4INd1ZRue","model":"gpt-4-turbo-2024-04-09","object":"chat.completion.chunk","system_fingerprint":"fp_e9446dc58f"}

data: [DONE]

This is how the steamed lines from OpenAI look like, you see that the order of json fields is different. But your decoder should be robust enough to not care about that.

data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{"content":" How"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{"content":" I"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{"content":" assist"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{"content":" you"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{"content":" today"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{"content":"?"},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-9RR01Csi806nWQZJARjtgaCGD5Nhz","object":"chat.completion.chunk","created":1716325669,"model":"gpt-4-turbo-2024-04-09","system_fingerprint":"fp_e9446dc58f","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
data: [DONE]

The message should appear as "Hello! How can I assist you today?", but you see only empty messages:

image
@odrobnik odrobnik added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label May 21, 2024
@odrobnik
Copy link
Author

odrobnik commented May 21, 2024

PS: I test the same endpoint with Ollama's OpenWebUI, works without problems:

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[{"delta":{"content":"Hello"},"index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[{"delta":{"content":"!"},"index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[{"delta":{"content":" How"},"index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[{"delta":{"content":" can"},"index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[{"delta":{"content":" I"},"index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[{"delta":{"content":" help"},"index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[{"delta":{"content":" you"},"index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[{"delta":{"content":" today"},"index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[{"delta":{"content":"?"},"index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[{"delta":{},"finish_reason":"stop","index":0}],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk"}


data: {"choices":[],"created":1716326479,"id":"chatcmpl-9RRD5zO2CmeOn84O4H9KdFqItKvUy","model":"gpt-4-0613","object":"chat.completion.chunk","usage":{"completion_tokens":9,"prompt_tokens":72,"total_tokens":81}}


data: [DONE]

That is to say: You see the text appear and be exactly what was streamed in Open WebUI

@timothycarambat
Copy link
Member

What connector are you specifically using? Generic OpenAI?

@odrobnik
Copy link
Author

Yes, Generic OpenAI

@timothycarambat
Copy link
Member

@odrobnik Ah, I think I see what is going on here. Your intermediate chunks do not contain finish_reason - only the last key does. OpenAI currently returns a finish_reason even on every response chunk. If we patch it now it will not be desktop until next release

@timothycarambat
Copy link
Member

Who is the provider behind this connector you are connecting with? They are mostly OpenAI compatible, but not 1:1 exactly

@odrobnik
Copy link
Author

@timothycarambat it's my own provider. I am working on an agent framework. I'll try to add the finish_reason NULL. Although I believe you'd be more robust if you could handle it. That's what Open WebUI does.

@timothycarambat
Copy link
Member

@odrobnik Oh cool, okay well we are handling that as you suggest via #1487

Thanks for pointing it out!

@odrobnik
Copy link
Author

@timothycarambat I think you have one more problem here. When passing the option to include usage information you get a chunk like this:

{\"choices\":[],\"created\":1716408014,\"id\":\"chatcmpl-9RmQAhXIqjaq2YfOjCo6pjWYzgJNN\",\"model\":\"gpt-4-turbo-2024-04-09\",\"object\":\"chat.completion.chunk\",\"system_fingerprint\":\"fp_e9446dc58f\",\"usage\":{\"completion_tokens\":9,\"prompt_tokens\":75,\"total_tokens\":84}}\n\n"

There will be an empty choices array and an additional usage dict. Are you doing anything with this information?

Anyway, I'm adding NULL for when there is no finish reason, and I saw the text begin to appear but then it got replaced by this:

image

@odrobnik
Copy link
Author

PS: And when you got into this state, and try to send again, then there's some sort of endless-loop where the user message and this error appear, disappear, appear, disappear and so on ad infinitum. A parsing error shouldn't leave the app in an unusable state.

@odrobnik
Copy link
Author

PPS: if I omit the streamOption includeUsage(true) then everything is fine.

image

@odrobnik
Copy link
Author

ChatGPT found your issue: you always access choices[0] which is of course bad style as it leads to undefined in the case of the usage chunk.

it suggests this change:

function handleDefaultStreamResponseV2(response, stream, responseProps) {
  const { uuid = uuidv4(), sources = [] } = responseProps;

  return new Promise(async (resolve) => {
    let fullText = "";

    // Establish listener to early-abort a streaming response
    // in case things go sideways or the user does not like the response.
    // We preserve the generated text but continue as if chat was completed
    // to preserve previously generated content.
    const handleAbort = () => clientAbortedHandler(resolve, fullText);
    response.on("close", handleAbort);

    for await (const chunk of stream) {
      if (Array.isArray(chunk?.choices) && chunk.choices.length > 0) {
        const message = chunk.choices[0];
        const token = message?.delta?.content;

        if (token) {
          fullText += token;
          writeResponseChunk(response, {
            uuid,
            sources: [],
            type: "textResponseChunk",
            textResponse: token,
            close: false,
            error: false,
          });
        }

        // LocalAi returns '' and others return null on chunks - the last chunk is not "" or null.
        // Either way, the key `finish_reason` must be present to determine ending chunk.
        if (
          message.hasOwnProperty("finish_reason") &&
          message.finish_reason !== "" &&
          message.finish_reason !== null
        ) {
          writeResponseChunk(response, {
            uuid,
            sources,
            type: "textResponseChunk",
            textResponse: "",
            close: true,
            error: false,
          });
          response.removeListener("close", handleAbort);
          resolve(fullText);
        }
      }
    }
  });
}

@timothycarambat
Copy link
Member

With that patch, if you wrap the entire function in the if (Array.isArray(chunk?.choices) && chunk.choices.length > 0) there still exists situations with some providers where the promise will never resolve so that fix does not patch that - it just works in this instance but we use this streamHandler many places

@odrobnik
Copy link
Author

sorry, don't get hung up over ChatGPT's attempt. My point was that an empty array of Choices is a valid scenario which needs to be handled or in the least ignored without putting the app into an unusable state.

@odrobnik
Copy link
Author

the problem is that encountering undefined also causes the Promise to not resolve because it throws, right?. Why don't you remove the listener and resolve in any case after the for loop, then you could just break out the loop when you see a finish reason. This would then also deal with the mentioned case of runaway whitespace generation after a non-null finish reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
possible bug Bug was reported but is not confirmed or is unable to be replicated.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants