Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete response with streaming #251

Closed
deikka opened this issue Apr 28, 2023 · 20 comments
Closed

Incomplete response with streaming #251

deikka opened this issue Apr 28, 2023 · 20 comments

Comments

@deikka
Copy link

deikka commented Apr 28, 2023

I have a chat system working perfectly. When I updated to version 4 of the gem, everything went well (it works normally). However, when I add the 'stream' option to the OpenAI API call, the content of the response is incomplete.

This code works normally:

response = openai_client.chat(
  parameters: {
    model: "gpt-4",
    messages: [
      {role: "system", content: system_prompt},
      {role: "user", content: user_prompt}
    ],
    temperature: 0.4,
    user: "user_#{user_id}"
  }
)
puts "RESPONSE: #{response}"
response.dig("choices", 0, "message", "content")

However, when adding the streaming option:

response = openai_client.chat(
  parameters: {
    model: "gpt-4",
    messages: [
      {role: "system", content: system_prompt},
      {role: "user", content: user_prompt}
    ],
    temperature: 0.4,
    stream: proc do |chunk, _bytesize|
      new_content = chunk.dig("choices", 0, "delta", "content")
      if new_content
        answer.content = (answer.content || "") + new_content
        answer.save!
      end
    end,
    user: "user_#{user_id}"
  }
)
puts "RESPONSE: #{response}"
response.dig("choices", 0, "message", "content")

Screenshot with responses:
Captura de pantalla 2023-04-28 a las 12 29 24

Any clue as to why it could happen?

  • Rails 7.0.4.3
  • Ruby 3.2.1
  • OS: macOS
  • Browser ARC/Chrome
@deikka deikka changed the title Response with streaming activated Incomplete response with streaming activated Apr 28, 2023
@deikka deikka changed the title Incomplete response with streaming activated Incomplete response with streaming Apr 28, 2023
@alexrudall
Copy link
Owner

Thanks for sharing on here! I don't have gpt-4 access and I don't know what your system prompt is, with gpt-3.5 I'm not able to reproduce this so far:

openai_client.chat(
  parameters: {
    model: "gpt-3.5-turbo",
    messages: [
      {role: "system", content: "Answer politely."},
      {role: "user", content: "What is the maximum weight allowed in the closed drawer or in combination of inside and on top of the open drawer?"}
    ],
    temperature: 0.4,
    stream: proc do |chunk, _bytesize|
      print chunk.dig("choices", 0, "delta", "content")
    end,
  }
)
#=> I'm sorry, but I don't have enough information to answer your question. Could you please specify which drawer or product you are referring to?

Are you able to print out all chunks and see if they are coming through correctly?

@rmontgomery429
Copy link
Contributor

Hopefully you get access soon @alexrudall. In the meantime I've added a spec and tested this scenario in #252.

@alexrudall
Copy link
Owner

@rmontgomery429 thanks, that's really helpful! Seems like you couldn't reproduce the issue either?

@rmontgomery429
Copy link
Contributor

@alexrudall Correct.

@alexrudall
Copy link
Owner

@deikka can you print all the chunks you're getting and share here? We can't reproduce this, it's a strange one

@deikka
Copy link
Author

deikka commented Apr 28, 2023

It's super strange... I'm receiving these incomplete chunks. Perhaps there's an installed gem causing interference?

As it appears:
For this question: What are some potential applications of SCI?

According to applications innovative services human, public safety resource management,, and example, an provide information and contact analyze data from mobile and to understand behavior patterns include community linking mobile social.

@deikka
Copy link
Author

deikka commented Apr 28, 2023

Hey folks, I was just about to throw in the towel when I tried commenting out the helicone part. Now, everything is working like a charm!

require "openai"

OpenAI.configure do |config|
  config.access_token = Rails.application.credentials.dig(:openai, :access_token)
  config.organization_id = Rails.application.credentials.dig(:openai, :organization_id)
  # config.uri_base = "https://oai.hconeai.com/" # Optional
  config.request_timeout = 240 # Optional
end

@alexrudall
Copy link
Owner

Legend, that’s really good to know. Will try and reproduce in a test and then we can feed back to Helicone if the issue is on their end. Or you fancy doing it in your PR @rmontgomery429 ? 😎

@rmontgomery429
Copy link
Contributor

rmontgomery429 commented Apr 29, 2023

@alexrudall my suggestion would be to merge that PR given that it confirms GPT-4 streaming is working and there are no specs for that now and address any changes required to accommodate Helicone in a separate PR.

@rmontgomery429
Copy link
Contributor

Maybe @chitalian @ScottMktn have some insight they can share that would help troubleshoot this issue.

@rmontgomery429
Copy link
Contributor

Never mind @alexrudall, I didn't realize that Helicone was in the readme so it's less orthogonal that I thought. I tried to reproduce the issue but to no avail. I've updated my PR to include those request specs as well.

@alexrudall
Copy link
Owner

alexrudall commented Apr 29, 2023 via email

@jicheng1014
Copy link

This is a simple testing case. When I use stream, the prompt is "please output 1 to 100". The content returned by Ruby OpenAI will lose some data.

image

In order to ensure that data is not overwritten incorrectly, I used the function of kredis to maintain order.

  def stream_proc(message:)

    answer = Kredis.list "message-#{message.id}"
    proc do |chunk, _bytesize|
      new_content = chunk.dig("choices", 0, "delta", "content")

      answer << new_content if new_content

      message.update(content: answer.elements.join("")) if new_content
    end

@ScotterC
Copy link

I can also contribute to "it's helicone" in my experience of this bug. Using Alex's example from the readme. Commenting/uncommenting uri_base produces the difference. I'm getting significant drop in chunks though, like every 3rd or 4th chunk is gone.

OpenAI.configure do |config|
  # config.uri_base = "https://oai.hconeai.com/"
  config.access_token = ENV.fetch("OPENAI_ACCESS_TOKEN")
  config.organization_id = ENV.fetch("OPENAI_ORGANIZATION")
  config.extra_headers = {
    "Helicone-Auth" => "Bearer #{ENV.fetch("HELICONE_API_KEY")}",
    "Helicone-Cache-Enabled" => "true"
  }
end

task scratch: :environment do
  client = OpenAI::Client.new
  client.chat(
    parameters: {
      model: "gpt-3.5-turbo", # Required.
      messages: [{role: "user", content: "Describe a character called Anna!"}], # Required.
      temperature: 0.7,
      stream: proc do |chunk, _bytesize|
                print chunk.dig("choices", 0, "delta", "content")
              end
    }
  )
end

@colegottdank
Copy link

Hi! I was able to recreate it locally and fixed it by setting this header:

"helicone-stream-force-format" => "true"

Let me know if this works, I will look into a fix that does not require that header.

@alexrudall
Copy link
Owner

@colegottdank thanks, that's very helpful! @ScotterC does that work for you?

@ScotterC
Copy link

Yup. On a small scale test and larger implementation it works. I'd love to better understand what's happening here @colegottdank
Here's the source https://github.com/Helicone/helicone/blob/5f1e190f84f01068649b11b014bac9f51bb5a5b5/worker/src/lib/HeliconeHeaders.ts#L105

@deikka
Copy link
Author

deikka commented Aug 16, 2023

Hey family, adding the header works perfectly for me too. I think we can consider this issue resolved, don't you think?
Or would you prefer to wait for an explanation on how it works?

@alexrudall
Copy link
Owner

alexrudall commented Aug 17, 2023

It looks like the effect is here. Seems it... queues chunks into an array? It seems like it either fixes or mostly fixes it. If you join the Helicone Discord here and then go to this thread you can see any more discussion that comes out of this or request for updates.

I've also added a note to our README here to use this flag with Helicone for now. But yeah our work here is done, thanks for raising @deikka + others and for the fix @colegottdank!

client = OpenAI::Client.new(
    access_token: "access_token_goes_here",
    uri_base: "https://oai.hconeai.com/",
    request_timeout: 240,
    extra_headers: {
      "Helicone-Auth": "Bearer HELICONE_API_KEY", # For https://docs.helicone.ai/getting-started/integration-method/openai-proxy
      "helicone-stream-force-format" => "true", # Use this with Helicone otherwise streaming drops chunks # https://github.com/alexrudall/ruby-openai/issues/251
    }
)

@atesgoral
Copy link
Contributor

The reason this issue manifests itself when accessing OpenAI through Helicone (or any other proxy-like intermediary) could be that the completion JSON chunks from OpenAI are being buffered/joined/split at non-JSON boundaries during transit. #332 should fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants