Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Req : Handle the Ratelimits in chatreadretrive #496

Open
vrajroutu opened this issue Aug 3, 2023 · 7 comments
Open

Feature Req : Handle the Ratelimits in chatreadretrive #496

vrajroutu opened this issue Aug 3, 2023 · 7 comments
Labels

Comments

@vrajroutu
Copy link

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ X] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Users often see Rate Limit Issues, we should have an ability to add exponential backoff as mentioned by OAI

https://github.com/openai/openai-cookbook/blob/main/examples/How_to_handle_rate_limits.ipynb

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

azd version?

run azd version and copy paste here.

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

@pamelafox
Copy link
Collaborator

Linking your PR here: #500

In our load tests, we were able to increase TPM to the max (120/240 depending on model) and then did not run into rate limits with the simulated users (50). Developers should first increase TPM as much as possible, and then consider implementing backoff, but keep in mind that the backoff will be the most useful to smooth over spikes, not for sustained excess TPM. In that case, developers need to increase more TPM or load balance (as you've noted in another issue).

@vikramhn
Copy link

@pamelafox Unsure how the load test was done, how many tokens did each request consume. If we take 3000 for a chat request and with 40 simultaneous users, a 120tpm limit deployment will start to experience ratelimiterror. For 240tpm max can handle 80 users, beyond that custom handling of load/retry need to be implemented. Is my assumption correct?

Linking your PR here: #500

In our load tests, we were able to increase TPM to the max (120/240 depending on model) and then did not run into rate limits with the simulated users (50). Developers should first increase TPM as much as possible, and then consider implementing backoff, but keep in mind that the backoff will be the most useful to smooth over spikes, not for sustained excess TPM. In that case, developers need to increase more TPM or load balance (as you've noted in another issue).

@vrajroutu
Copy link
Author

@vikramhn also consider adding Application gateway and add OpenAI instances.

Reference Article : https://www.raffertyuy.com/raztype/azure-openai-load-balancing/

@vikramhn
Copy link

@vrajroutu thank you for the reference article. Very cool solution for the time being. Hope it can be abstracted and incorporated into an Azure Enterprise grade OpenaAI as a premium product offering in the future.

@pamelafox
Copy link
Collaborator

@vikramhn For my test, each request took about 1000 tokens, so it could handle a bit more. I think 3000 is a reasonable assumption as well however, since requests get longer as users ask more questions, plus some questions may have longer answers. You can see my loadtest in the locustfile.py in the root of this repo.

https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/locustfile.py

I have passed on feedback to the Azure OpenAI teams on how difficult it can be to work with the current TPM and rate-limits.

@vikramhn
Copy link

Thanks @pamelafox for taking the feedback to the product team. Nice work on the load test and thanks for the link.

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

@github-actions github-actions bot added the Stale label Dec 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants