Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming #189

Closed
kilimchoi opened this issue Feb 15, 2023 · 16 comments
Closed

Streaming #189

kilimchoi opened this issue Feb 15, 2023 · 16 comments

Comments

@kilimchoi
Copy link

is there a way to stream the response from the api like how chatgpt streams one token at a time?

@alexrudall
Copy link
Owner

Hi @kilimchoi, thanks for your question. Can you explain a bit more how this would work?

@rmontgomery429
Copy link
Contributor

rmontgomery429 commented Feb 17, 2023

I think you'd need to use something like ruby-eventsource to listen for the Server Sent Events and then in turn provide a stream api to the caller.

@kilimchoi
Copy link
Author

@alexrudall in python you can do it as follows,

for resp in openai.Completion.create(model='code-davinci-002', prompt='def hello():', max_tokens=512, stream=True):
    sys.stdout.write(resp.choices[0].text)
    sys.stdout.flush()

i don't believe this ruby library supports the streaming option as it uses httparty under the hood which doesn't support server-sent events. But if you added streaming, it would def help other devs who are looking to minimize the response wait time since streaming characters is much faster than waiting for the whole thing.

@kilimchoi
Copy link
Author

kilimchoi commented Feb 17, 2023

I think you'd need to use something like ruby-eventsource to listen for the Server Sent Events and then in turn provide a stream api to the caller.

have you implemented streaming using this library? If so, I'd appreciate if you could share a sample repository or even a gist.
I looked at it briefly but it doesn't seem like you can set the data while initializing the client object.

@alexrudall
Copy link
Owner

Thanks for more info. Looks like Typhoeus supports streaming, maybe that could work - thinking of switching from HTTParty to that in next major release anyway

@gastonmorixe
Copy link

This is an example of a raw response when setting streaming to true

HTTP/2 200
date: Sat, 11 Mar 2023 21:34:10 GMT
content-type: text/event-stream
access-control-allow-origin: *
cache-control: no-cache, must-revalidate
openai-model: gpt-3.5-turbo-0301
openai-organization: user-XXXXXXXXXXXXX
openai-processing-ms: 169
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
x-request-id: 3f263e38be42c4825c159feccf93e313

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"\n\n"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" there"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":","},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" how"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" may"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" I"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" assist"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" you"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" today"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"?"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-6t18serxbjpyqkF79x20bAiZANJjX","object":"chat.completion.chunk","created":1678570450,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]

@lucasluitjes
Copy link

Made an example sinatra app that uses typhoeus to stream responses: https://gist.github.com/lucasluitjes/0bf82de475ac91fe2ad8e71d5c2df164

@chloerei
Copy link

Maybe another http client that supports streams: https://github.com/httprb/http

@templeman15
Copy link

templeman15 commented Mar 29, 2023

I'm happy to create a PR for this to help get it moving along as we need this right now.

Would we want to use one of these libraries?

HTTP
https://github.com/httprb/http

Faraday
https://github.com/lostisland/faraday

Typhoeus
https://github.com/typhoeus/typhoeus

@Velora
Copy link

Velora commented Mar 30, 2023

I'd personally vote for HTTP or Faraday as most Rails apps I work on already use those, whereas Typhoeus isn't as popular.

@gastonmorixe
Copy link

gastonmorixe commented Mar 30, 2023

I have tried Typhoeus for this exactly (streaming) and it works phenomenally. It relies on libcurl (ten billion installations [1]) which is extremely good. Maybe we can default to Typhoeus through Faraday as Faraday is transport-agnostic.

[1]https://curl.se

@alexrudall
Copy link
Owner

@gastonmorixe That sounds like a good path to me, taking that route in #234

@bf4
Copy link

bf4 commented Apr 3, 2023

xref #196

@bf4 bf4 mentioned this issue Apr 3, 2023
@gastonmorixe
Copy link

gastonmorixe commented Apr 3, 2023

Thank you the good work and progress guys. We’ll definitely use it. Maybe I can help this week.

One thing that it’s important to test and I did with lsof is that the TCP connection keep-alive really happen and that the connection is HTTP2.

If the connection is reused the next request is really fast. If it doesn’t there’s a connection time and handshake delay of around half a second at least.

I’m not sure if Typhoeus’ Hydra is thread safe, if it is I was planning to initiate one instance of Hydra in an initializer and somehow queue the requests through it. As it was the only way I found the TCP connection to keep-alive and be reused.

There’s also a way to force the connection to be alive if you want to optimize this and be sure the next request doesn’t have to wait for the tcp connection. I will post how later as I’m in my mobile now.

@alexrudall
Copy link
Owner

#234 adds streaming with Faraday - final reviews much appreciated 👍

@alexrudall
Copy link
Owner

alexrudall commented Apr 26, 2023

ruby-openai v4 adds chat streaming with Faraday! Thanks everyone for your input and ideas! Let us know how you get on :)

Streaming ChatGPT

You can stream from the API in realtime, which can be much faster and used to create a more engaging user experience. Pass a Proc to the stream parameter to receive the stream of text chunks as they are generated. Each time one or more chunks is received, the Proc will be called once with each chunk, parsed as a Hash. If OpenAI returns an error, ruby-openai will pass that to your proc as a Hash.

client.chat(
    parameters: {
        model: "gpt-3.5-turbo", # Required.
        messages: [{ role: "user", content: "Describe a character called Anna!"}], # Required.
        temperature: 0.7,
        stream: proc do |chunk, _bytesize|
            print chunk.dig("choices", 0, "delta", "content")
        end
    })
# => "Anna is a young woman in her mid-twenties, with wavy chestnut hair that falls to her shoulders..."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants