Streaming #189

kilimchoi · 2023-02-15T11:44:13Z

is there a way to stream the response from the api like how chatgpt streams one token at a time?

alexrudall · 2023-02-15T18:51:21Z

Hi @kilimchoi, thanks for your question. Can you explain a bit more how this would work?

rmontgomery429 · 2023-02-17T03:07:37Z

I think you'd need to use something like ruby-eventsource to listen for the Server Sent Events and then in turn provide a stream api to the caller.

kilimchoi · 2023-02-17T14:56:24Z

@alexrudall in python you can do it as follows,

for resp in openai.Completion.create(model='code-davinci-002', prompt='def hello():', max_tokens=512, stream=True):
    sys.stdout.write(resp.choices[0].text)
    sys.stdout.flush()

i don't believe this ruby library supports the streaming option as it uses httparty under the hood which doesn't support server-sent events. But if you added streaming, it would def help other devs who are looking to minimize the response wait time since streaming characters is much faster than waiting for the whole thing.

kilimchoi · 2023-02-17T14:59:32Z

I think you'd need to use something like ruby-eventsource to listen for the Server Sent Events and then in turn provide a stream api to the caller.

have you implemented streaming using this library? If so, I'd appreciate if you could share a sample repository or even a gist.
I looked at it briefly but it doesn't seem like you can set the data while initializing the client object.

alexrudall · 2023-02-18T21:57:22Z

Thanks for more info. Looks like Typhoeus supports streaming, maybe that could work - thinking of switching from HTTParty to that in next major release anyway

gastonmorixe · 2023-03-11T21:39:17Z

This is an example of a raw response when setting streaming to true

HTTP/2 200
date: Sat, 11 Mar 2023 21:34:10 GMT
content-type: text/event-stream
access-control-allow-origin: *
cache-control: no-cache, must-revalidate
openai-model: gpt-3.5-turbo-0301
openai-organization: user-XXXXXXXXXXXXX
openai-processing-ms: 169
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
x-request-id: 3f263e38be42c4825c159feccf93e313

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"\n\n"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" there"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":","},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" how"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" may"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" I"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" assist"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" you"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" today"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"?"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]

lucasluitjes · 2023-03-16T11:05:42Z

Made an example sinatra app that uses typhoeus to stream responses: https://gist.github.com/lucasluitjes/0bf82de475ac91fe2ad8e71d5c2df164

chloerei · 2023-03-20T05:20:28Z

Maybe another http client that supports streams: https://github.com/httprb/http

templeman15 · 2023-03-29T01:28:14Z

I'm happy to create a PR for this to help get it moving along as we need this right now.

Would we want to use one of these libraries?

HTTP
https://github.com/httprb/http

Faraday
https://github.com/lostisland/faraday

Typhoeus
https://github.com/typhoeus/typhoeus

Velora · 2023-03-30T19:06:47Z

I'd personally vote for HTTP or Faraday as most Rails apps I work on already use those, whereas Typhoeus isn't as popular.

gastonmorixe · 2023-03-30T19:36:59Z

I have tried Typhoeus for this exactly (streaming) and it works phenomenally. It relies on libcurl (ten billion installations [1]) which is extremely good. Maybe we can default to Typhoeus through Faraday as Faraday is transport-agnostic.

[1]https://curl.se

alexrudall · 2023-04-02T12:49:26Z

@gastonmorixe That sounds like a good path to me, taking that route in #234

bf4 · 2023-04-03T03:36:09Z

xref #196

gastonmorixe · 2023-04-03T04:01:22Z

Thank you the good work and progress guys. We’ll definitely use it. Maybe I can help this week.

One thing that it’s important to test and I did with lsof is that the TCP connection keep-alive really happen and that the connection is HTTP2.

If the connection is reused the next request is really fast. If it doesn’t there’s a connection time and handshake delay of around half a second at least.

I’m not sure if Typhoeus’ Hydra is thread safe, if it is I was planning to initiate one instance of Hydra in an initializer and somehow queue the requests through it. As it was the only way I found the TCP connection to keep-alive and be reused.

There’s also a way to force the connection to be alive if you want to optimize this and be sure the next request doesn’t have to wait for the tcp connection. I will post how later as I’m in my mobile now.

alexrudall · 2023-04-25T15:19:00Z

#234 adds streaming with Faraday - final reviews much appreciated 👍

alexrudall · 2023-04-26T11:13:15Z

ruby-openai v4 adds chat streaming with Faraday! Thanks everyone for your input and ideas! Let us know how you get on :)

Streaming ChatGPT

You can stream from the API in realtime, which can be much faster and used to create a more engaging user experience. Pass a Proc to the stream parameter to receive the stream of text chunks as they are generated. Each time one or more chunks is received, the Proc will be called once with each chunk, parsed as a Hash. If OpenAI returns an error, ruby-openai will pass that to your proc as a Hash.

client.chat(
    parameters: {
        model: "gpt-3.5-turbo", # Required.
        messages: [{ role: "user", content: "Describe a character called Anna!"}], # Required.
        temperature: 0.7,
        stream: proc do |chunk, _bytesize|
            print chunk.dig("choices", 0, "delta", "content")
        end
    })
# => "Anna is a young woman in her mid-twenties, with wavy chestnut hair that falls to her shoulders..."

bf4 mentioned this issue Apr 3, 2023

Slow performance #225

Closed

alexrudall closed this as completed Apr 26, 2023

parsley42 mentioned this issue Aug 22, 2023

AI Plugin: stream out a paragraph at a time lnxjedi/gopherbot#272

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming #189

Streaming #189

kilimchoi commented Feb 15, 2023

alexrudall commented Feb 15, 2023

rmontgomery429 commented Feb 17, 2023 •

edited

Loading

kilimchoi commented Feb 17, 2023

kilimchoi commented Feb 17, 2023 •

edited

Loading

alexrudall commented Feb 18, 2023

gastonmorixe commented Mar 11, 2023

lucasluitjes commented Mar 16, 2023

chloerei commented Mar 20, 2023

templeman15 commented Mar 29, 2023 •

edited

Loading

Velora commented Mar 30, 2023

gastonmorixe commented Mar 30, 2023 •

edited

Loading

alexrudall commented Apr 2, 2023

bf4 commented Apr 3, 2023

gastonmorixe commented Apr 3, 2023 •

edited

Loading

alexrudall commented Apr 25, 2023

alexrudall commented Apr 26, 2023 •

edited

Loading

Streaming #189

Streaming #189

Comments

kilimchoi commented Feb 15, 2023

alexrudall commented Feb 15, 2023

rmontgomery429 commented Feb 17, 2023 • edited Loading

kilimchoi commented Feb 17, 2023

kilimchoi commented Feb 17, 2023 • edited Loading

alexrudall commented Feb 18, 2023

gastonmorixe commented Mar 11, 2023

lucasluitjes commented Mar 16, 2023

chloerei commented Mar 20, 2023

templeman15 commented Mar 29, 2023 • edited Loading

Velora commented Mar 30, 2023

gastonmorixe commented Mar 30, 2023 • edited Loading

alexrudall commented Apr 2, 2023

bf4 commented Apr 3, 2023

gastonmorixe commented Apr 3, 2023 • edited Loading

alexrudall commented Apr 25, 2023

alexrudall commented Apr 26, 2023 • edited Loading

Streaming ChatGPT

rmontgomery429 commented Feb 17, 2023 •

edited

Loading

kilimchoi commented Feb 17, 2023 •

edited

Loading

templeman15 commented Mar 29, 2023 •

edited

Loading

gastonmorixe commented Mar 30, 2023 •

edited

Loading

gastonmorixe commented Apr 3, 2023 •

edited

Loading

alexrudall commented Apr 26, 2023 •

edited

Loading