Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parallel request queuing for Typhoeus request replacement #196

Closed
wants to merge 11 commits into from

Conversation

cguess
Copy link

@cguess cguess commented Feb 25, 2023

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?

This PR branches off the #194 PR to add optional batch and queuing of requests to better handle large data sets. This is especially useful for generating embeddings of large data sets. In testing it takes the total time to do 32k embeddings from ~5 hours to ~10 minutes.

Usage

To use this new functionality there are two new optional parameters added to the initializer:
client = OpenAI::Client.new(access_token: "xxxxx", rate_limit: 2500, max_concurrency: 10)

New Defaults:

  • rate_limit: 2500. The limit for OpenAI is 3000 requests/minute however the timing for it seems to be off a bit so we back off a bit to ensure success without slamming against the limit unpredictably.
  • max_concurrency: 200. This is the limit default for Typhoeus, but you can change this depending on what you need/can handle. In my testing this doesn't significantly change the the time it takes.

After initialization this PR adds a queue_ version of all the functions that take a block to handle the response i.e.

client.queue_embeddings(parameters: { model: "text-embedding-ada-002", input: "It was a dark and stormy evening..." }) do |result|
    puts result.body
end

After queuing, a user calls client.run_queued_requests to begin the execution. Each finished block will run as the request finishes.

Errors and such should be managed in the response block.

Notes:

  • There's a few listing issues with this because of the length of the Client class and a few of the methods have more than ten lines. I'd recommend relaxing these rules if the team is OK with that, there's no way I could consider to refactor the lengths without adding a significant amount of complexity.
  • There are no tests for this. As I discussed with @alexrudall I'm just not familiar with VHS and RSpec enough to do this properly (I'm a minitest guy for 17 years). I'm happy to add them if there's help to do so.
  • Because there's no tests there's probably breaking changes that will need to be resolved before a final merge. I'm happy to do so once we get the testing up and running.
  • This adds one dependency on ruby-limiter an up-to-date and maintained gem by Shopify (so it's legit) to manage the rate limiting, otherwise you pretty instantly slam up agains the OpenAI rate limiting and it's a lot of work to manage outside this gem.
  • The commits are poorly documented, mostly because I wasn't considering upstreaming this at the time. However, the changes are pretty limited and self-explanatory for the most part.

@cguess cguess mentioned this pull request Feb 25, 2023
4 tasks
@alexrudall
Copy link
Owner

alexrudall commented Feb 25, 2023

Thanks a lot @cguess !

A few questions, to help me understand:

  1. Could we always run the queue automatically so we don't need run_queued_requests?
  2. Is there ever a situation where a user would want to NOT batch and use concurrency? ie. could we just use batches and queues for every request, with the overrides rate_limit and max_concurrency for advanced users only?
  3. How do you personally observe and test this stuff? How do you know you were hitting the limit of 3000, how do you test that more concurrency than 200 doesn't help?
  4. The team is just me by the way ;)

@cguess
Copy link
Author

cguess commented Feb 26, 2023

@alexrudall happy to answer!

  • I considered that but the Hydra documentation clearly states that you have to queue everything before running the queue. It also helps even out the rate limiting.
  • The main reason you wouldn't would be simplicity. You may want the request to block to simplify a script for instance. It'd also be a massive breaking change that would require restructuring all apps currently built with this tool. I figured I'd keep your architecture as it is rather than mess with it.
  • I tested this within a script I wrote this specifically for where I had to get embeddings for about 150k strings.
    • The OpenAI api returns an error with a message if you hit the rate limit
    • There's a few other errors too, such as if the server is overloaded and sometimes just a standard 500
    • To make sure it was working I put together some error checking in the script and observed using https://github.com/jfelchner/ruby-progressbar
    • If you do switch to an intermediary object then that's probably a good place to add error checking, but until then it's probably best to let the user implement it themselves
  • Thanks for all your work!

@alexrudall
Copy link
Owner

Thanks so much for this @cguess . Would you mind adding a section to the README in this PR? How to use this stuff. I generally try to aim docs at a junior dev / with copy-paste commands

@cguess
Copy link
Author

cguess commented Mar 7, 2023 via email

@cguess
Copy link
Author

cguess commented Mar 8, 2023

@alexrudall I've aded the instructions, I hope they're simple enough.

@bf4 bf4 mentioned this pull request Apr 3, 2023
@bf4
Copy link

bf4 commented Apr 25, 2023

Recommend closing.

@cguess
Copy link
Author

cguess commented Apr 25, 2023

@bf4 why? This does work and significantly speeds up requests, if there's more work to be done on it let me know.

@alexrudall
Copy link
Owner

@cguess sorry for the very slow response! In the end we went with Faraday for its modularity and popularity vs Typhoeus. I really appreciate this work and would love to include it somehow, I guess we would need to add the ability to select Typhoeus as an optional dependency and (https://github.com/dleavitt/faraday-typhoeus)[https://github.com/dleavitt/faraday-typhoeus] would love to hear your thoughts if you're still interested in this. Thanks

@alexrudall alexrudall closed this Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants