Feature: Chat stream response #5

devcxl · 2024-04-07T06:50:08Z

Hello, I tried to add streaming return functionality, but it seems it doesn't truly stream the return. Can you help me take a look at this piece of code?

devcxl · 2024-04-07T07:45:18Z

$ curl http://localhost:8787/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer sk-123123123123123"   -d '{
    "stream":true,
    "model": "@cf/qwen/qwen1.5-0.5b-chat",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello"
      }
    ]
  }'
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"!"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" How"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" may"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" I"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" assist"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" you"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" today"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"?"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":""},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":""},"index":0,"finish_reason":"stop"}]}

I tested the return like this, and it conforms to the format of streaming return. However, it seems to return all the data at once.

chand1012 · 2024-04-09T13:22:29Z

I'll definitely take a look, this is a needed feature.

chand1012 · 2024-04-09T14:21:48Z

Look like you may be using the streaming code incorrectly. Here is some example streaming code from the official Workers docs.

export default {
  async fetch(request, env, ctx) {
    // Fetch from origin server.
    let response = await fetch(request);

    // Create an identity TransformStream (a.k.a. a pipe).
    // The readable side will become our new response body.
    let { readable, writable } = new TransformStream();

    // Start pumping the body. NOTE: No await!
    response.body.pipeTo(writable);

    // ... and deliver our Response while that’s running.
    return new Response(readable, response);
  }
}

Here's a code fragment that ChatGPT suggested that could help.

if (json.stream) {
    let {
        readable,
        writable
    } = new TransformStream();
    aiResp.body.pipeThrough(transformer).pipeTo(writable);
    return new Response(readable, {
        headers: {
            'Content-Type': 'text/event-stream',
            'Cache-Control': 'no-cache',
            'Connection': 'keep-alive',
        }
    });
}

devcxl · 2024-04-09T14:56:32Z

The answer from ChatGPT is not reliable. I tried to split the TransformStream into readable and writable parts, then returned the readable part, but the result still seems to return all the results at once.

			const { readable,writable} = new TransformStream({
				// omit some code
			});

			// for now, nothing else does anything. Load the ai model.
			const aiResp = await ai.run(model, { stream: json.stream, messages });
			// Piping the readableStream through the transformStream

			aiResp.pipeTo(writable)
			return json.stream ? new Response(readable, {
				headers: {
					'content-type': 'text/event-stream',
					'Cache-Control': 'no-cache',
					'Connection': 'keep-alive',
				},
			})

chand1012 · 2024-04-09T19:03:07Z

Your above code seems to work on my side, how are you testing this? Here is now I tested the streaming.

curl -H 'Content-Type: application/json' -d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant"
    },
    {
      "role": "user",
      "content": "Write me an essay on the formation of black holes"
    }
  ],
  "stream": true
}' http://localhost:8787/chat/completions

And it seemed to work fine. Another thing that should be updated is here. On my branch (linked below) I had to update the version to the latest version (2024-04-05 as of today) in order to get everything working properly.

I pushed the code to a separate branch for testing, however I intend to merge this PR and delete that branch once the code is working.

devcxl · 2024-04-09T20:40:02Z

name = "openai-cf"                # todo
main = "index.js"
compatibility_date = "2022-05-03"
compatibility_flags = [ "transformstream_enable_standard_constructor","streams_enable_constructors"]

I forgot to update the configuration in this branch as well.

devcxl · 2024-04-09T20:51:25Z

"Upgrading compatibility_date without using compatibility_flags is also possible."

chand1012 · 2024-04-09T23:06:57Z

Thank you for your contribution!

feature: stream chat

b85c2ab

add: compatibility_flags

85b249e

chand1012 merged commit a1b278c into chand1012:main Apr 9, 2024

chand1012 mentioned this pull request Apr 10, 2024

Compatibility with chat UIs (BetterChatGPT) #3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Chat stream response #5

Feature: Chat stream response #5

devcxl commented Apr 7, 2024

devcxl commented Apr 7, 2024

chand1012 commented Apr 9, 2024

chand1012 commented Apr 9, 2024

devcxl commented Apr 9, 2024

chand1012 commented Apr 9, 2024

devcxl commented Apr 9, 2024

devcxl commented Apr 9, 2024

chand1012 commented Apr 9, 2024

Feature: Chat stream response #5

Feature: Chat stream response #5

Conversation

devcxl commented Apr 7, 2024

devcxl commented Apr 7, 2024

chand1012 commented Apr 9, 2024

chand1012 commented Apr 9, 2024

devcxl commented Apr 9, 2024

chand1012 commented Apr 9, 2024

devcxl commented Apr 9, 2024

devcxl commented Apr 9, 2024

chand1012 commented Apr 9, 2024