-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Rate limit aware batch completion calls #527
Comments
@ishaan-jaff are the related issues in-scope for this feature? |
as a v0 of this sticking to just this: https://docs.litellm.ai/docs/rate_limit_manager |
Is there current/planned support for rate limit aware batch completion calls? Or any workaround/tweaks to the current |
@vilmar-hillow the recommended approach here is to use Router for handling the reliability logic, and allow users to write their own parallel calling logic - https://docs.litellm.ai/docs/routing reason: seems like the batch completion logic is just multi-threaded / parallel async calls to completion(). Is there a reason you want litellm to handle this? |
Thanks! I'll explore the router. The use case is simple - batch processing for experimentation, sending as many requests as possible to a single model given the set TPM/RPM, and |
@CLARKBENHAM caught a relevant issue around making the router better for this. We've added the testing for this. Will work on fixing issues it raises. If there's something specific, will make it a separate issue for tracking. |
The Feature
pass in a bunch of batch completions and a rate limit. the requests should not fail
Motivation, pitch
user request
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: