Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Chat stream response #5

Merged
merged 2 commits into from
Apr 9, 2024
Merged

Conversation

devcxl
Copy link
Contributor

@devcxl devcxl commented Apr 7, 2024

Hello, I tried to add streaming return functionality, but it seems it doesn't truly stream the return. Can you help me take a look at this piece of code?

@devcxl
Copy link
Contributor Author

devcxl commented Apr 7, 2024

$ curl http://localhost:8787/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer sk-123123123123123"   -d '{
    "stream":true,
    "model": "@cf/qwen/qwen1.5-0.5b-chat",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello"
      }
    ]
  }'
data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"!"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" How"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" may"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" I"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" assist"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" you"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":" today"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":"?"},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":""},"index":0,"finish_reason":null}]}

data: {"id":"ed370b7a-c20d-46f2-a553-0d8d71caf336","created":1712475793,"object":"chat.completion.chunk","model":"@cf/qwen/qwen1.5-0.5b-chat","choices":[{"delta":{"content":""},"index":0,"finish_reason":"stop"}]}

I tested the return like this, and it conforms to the format of streaming return. However, it seems to return all the data at once.

@chand1012
Copy link
Owner

I'll definitely take a look, this is a needed feature.

@chand1012
Copy link
Owner

Look like you may be using the streaming code incorrectly. Here is some example streaming code from the official Workers docs.

export default {
  async fetch(request, env, ctx) {
    // Fetch from origin server.
    let response = await fetch(request);

    // Create an identity TransformStream (a.k.a. a pipe).
    // The readable side will become our new response body.
    let { readable, writable } = new TransformStream();

    // Start pumping the body. NOTE: No await!
    response.body.pipeTo(writable);

    // ... and deliver our Response while that’s running.
    return new Response(readable, response);
  }
}

Here's a code fragment that ChatGPT suggested that could help.

if (json.stream) {
    let {
        readable,
        writable
    } = new TransformStream();
    aiResp.body.pipeThrough(transformer).pipeTo(writable);
    return new Response(readable, {
        headers: {
            'Content-Type': 'text/event-stream',
            'Cache-Control': 'no-cache',
            'Connection': 'keep-alive',
        }
    });
}

@devcxl
Copy link
Contributor Author

devcxl commented Apr 9, 2024

The answer from ChatGPT is not reliable. I tried to split the TransformStream into readable and writable parts, then returned the readable part, but the result still seems to return all the results at once.

			const { readable,writable} = new TransformStream({
				// omit some code
			});

			// for now, nothing else does anything. Load the ai model.
			const aiResp = await ai.run(model, { stream: json.stream, messages });
			// Piping the readableStream through the transformStream

			aiResp.pipeTo(writable)
			return json.stream ? new Response(readable, {
				headers: {
					'content-type': 'text/event-stream',
					'Cache-Control': 'no-cache',
					'Connection': 'keep-alive',
				},
			})

@chand1012
Copy link
Owner

Your above code seems to work on my side, how are you testing this? Here is now I tested the streaming.

curl -H 'Content-Type: application/json' -d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant"
    },
    {
      "role": "user",
      "content": "Write me an essay on the formation of black holes"
    }
  ],
  "stream": true
}' http://localhost:8787/chat/completions

And it seemed to work fine. Another thing that should be updated is here. On my branch (linked below) I had to update the version to the latest version (2024-04-05 as of today) in order to get everything working properly.

I pushed the code to a separate branch for testing, however I intend to merge this PR and delete that branch once the code is working.

@devcxl
Copy link
Contributor Author

devcxl commented Apr 9, 2024

name = "openai-cf"                # todo
main = "index.js"
compatibility_date = "2022-05-03"
compatibility_flags = [ "transformstream_enable_standard_constructor","streams_enable_constructors"]

I forgot to update the configuration in this branch as well.

@devcxl
Copy link
Contributor Author

devcxl commented Apr 9, 2024

"Upgrading compatibility_date without using compatibility_flags is also possible."

@chand1012 chand1012 merged commit a1b278c into chand1012:main Apr 9, 2024
@chand1012
Copy link
Owner

Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants