Add parallel request queuing for Typhoeus request replacement #196

cguess · 2023-02-25T20:06:14Z

All Submissions:

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?
Have you added an explanation of what your changes do and why you'd like us to include them?

This PR branches off the #194 PR to add optional batch and queuing of requests to better handle large data sets. This is especially useful for generating embeddings of large data sets. In testing it takes the total time to do 32k embeddings from ~5 hours to ~10 minutes.

Usage

To use this new functionality there are two new optional parameters added to the initializer:
client = OpenAI::Client.new(access_token: "xxxxx", rate_limit: 2500, max_concurrency: 10)

New Defaults:

rate_limit: 2500. The limit for OpenAI is 3000 requests/minute however the timing for it seems to be off a bit so we back off a bit to ensure success without slamming against the limit unpredictably.
max_concurrency: 200. This is the limit default for Typhoeus, but you can change this depending on what you need/can handle. In my testing this doesn't significantly change the the time it takes.

After initialization this PR adds a queue_ version of all the functions that take a block to handle the response i.e.

client.queue_embeddings(parameters: { model: "text-embedding-ada-002", input: "It was a dark and stormy evening..." }) do |result|
    puts result.body
end

After queuing, a user calls client.run_queued_requests to begin the execution. Each finished block will run as the request finishes.

Errors and such should be managed in the response block.

Notes:

There's a few listing issues with this because of the length of the Client class and a few of the methods have more than ten lines. I'd recommend relaxing these rules if the team is OK with that, there's no way I could consider to refactor the lengths without adding a significant amount of complexity.
There are no tests for this. As I discussed with @alexrudall I'm just not familiar with VHS and RSpec enough to do this properly (I'm a minitest guy for 17 years). I'm happy to add them if there's help to do so.
Because there's no tests there's probably breaking changes that will need to be resolved before a final merge. I'm happy to do so once we get the testing up and running.
This adds one dependency on ruby-limiter an up-to-date and maintained gem by Shopify (so it's legit) to manage the rate limiting, otherwise you pretty instantly slam up agains the OpenAI rate limiting and it's a lot of work to manage outside this gem.
The commits are poorly documented, mostly because I wasn't considering upstreaming this at the time. However, the changes are pretty limited and self-explanatory for the most part.

alexrudall · 2023-02-25T23:48:28Z

Thanks a lot @cguess !

A few questions, to help me understand:

Could we always run the queue automatically so we don't need run_queued_requests?
Is there ever a situation where a user would want to NOT batch and use concurrency? ie. could we just use batches and queues for every request, with the overrides rate_limit and max_concurrency for advanced users only?
How do you personally observe and test this stuff? How do you know you were hitting the limit of 3000, how do you test that more concurrency than 200 doesn't help?
The team is just me by the way ;)

cguess · 2023-02-26T00:01:33Z

@alexrudall happy to answer!

I considered that but the Hydra documentation clearly states that you have to queue everything before running the queue. It also helps even out the rate limiting.
The main reason you wouldn't would be simplicity. You may want the request to block to simplify a script for instance. It'd also be a massive breaking change that would require restructuring all apps currently built with this tool. I figured I'd keep your architecture as it is rather than mess with it.
I tested this within a script I wrote this specifically for where I had to get embeddings for about 150k strings.
- The OpenAI api returns an error with a message if you hit the rate limit
- There's a few other errors too, such as if the server is overloaded and sometimes just a standard 500
- To make sure it was working I put together some error checking in the script and observed using https://github.com/jfelchner/ruby-progressbar
- If you do switch to an intermediary object then that's probably a good place to add error checking, but until then it's probably best to let the user implement it themselves
Thanks for all your work!

alexrudall · 2023-03-07T14:53:44Z

Thanks so much for this @cguess . Would you mind adding a section to the README in this PR? How to use this stuff. I generally try to aim docs at a junior dev / with copy-paste commands

cguess · 2023-03-07T14:57:01Z

Will do!

…

-Christopher Guess ***@***.*** US/WhatsApp/Signal: +1 262.893.1037 PGP: AAE7 5171 0D81 B45B – https://keybase.io/cguess

On Tuesday, Mar 07, 2023 at 08:53, Alex ***@***.*** ***@***.***)> wrote: Thanks so much for this @cguess (https://github.com/cguess) . Would you mind adding a section to the README in this PR? How to use this stuff. I generally try to aim docs at a junior dev / with copy-paste commands — Reply to this email directly, view it on GitHub (#196 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AACOYQSJEXVKVA3GGAOW34TW25DYHANCNFSM6AAAAAAVIBCKII). You are receiving this because you were mentioned.Message ID: ***@***.***>

cguess · 2023-03-08T22:24:51Z

@alexrudall I've aded the instructions, I hope they're simple enough.

bf4 · 2023-04-25T18:26:16Z

Recommend closing.

cguess · 2023-04-25T19:20:34Z

@bf4 why? This does work and significantly speeds up requests, if there's more work to be done on it let me know.

alexrudall · 2023-08-05T18:07:37Z

@cguess sorry for the very slow response! In the end we went with Faraday for its modularity and popularity vs Typhoeus. I really appreciate this work and would love to include it somehow, I guess we would need to add the ability to select Typhoeus as an optional dependency and (https://github.com/dleavitt/faraday-typhoeus)[https://github.com/dleavitt/faraday-typhoeus] would love to hear your thoughts if you're still interested in this. Thanks

cguess added 10 commits February 20, 2023 15:18

Initial queueing setup for testing purposes

be1340c

Add maximum concurrency

b88659c

Add maximum concurrency

735c053

Add rate limiter

d9f5906

Add rate limiter

57991ce

Finish rate limiter

29790b3

Set limiter to balanced for more smooth requests

9235a10

Update rate limit defaults

6cef0a7

Fix linting issues

8f5f6fe

Revert version number back to original

1a70e39

cguess mentioned this pull request Feb 25, 2023

Switch to using Typhoeus #194

Closed

4 tasks

Update README for new batch usages

c8a703a

bf4 mentioned this pull request Apr 3, 2023

Streaming #189

Closed

alexrudall closed this Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel request queuing for Typhoeus request replacement #196

Add parallel request queuing for Typhoeus request replacement #196

cguess commented Feb 25, 2023 •

edited

Loading

alexrudall commented Feb 25, 2023 •

edited

Loading

cguess commented Feb 26, 2023

alexrudall commented Mar 7, 2023

cguess commented Mar 7, 2023 via email

cguess commented Mar 8, 2023

bf4 commented Apr 25, 2023

cguess commented Apr 25, 2023

alexrudall commented Aug 5, 2023

Add parallel request queuing for Typhoeus request replacement #196

Add parallel request queuing for Typhoeus request replacement #196

Conversation

cguess commented Feb 25, 2023 • edited Loading

All Submissions:

Usage

New Defaults:

Notes:

alexrudall commented Feb 25, 2023 • edited Loading

cguess commented Feb 26, 2023

alexrudall commented Mar 7, 2023

cguess commented Mar 7, 2023 via email

cguess commented Mar 8, 2023

bf4 commented Apr 25, 2023

cguess commented Apr 25, 2023

alexrudall commented Aug 5, 2023

cguess commented Feb 25, 2023 •

edited

Loading

alexrudall commented Feb 25, 2023 •

edited

Loading