Exponential Backoff Mechanism for RateLimit Issues in /Chat #500

vrajroutu · 2023-08-04T19:38:35Z

Purpose

...

Does this introduce a breaking change?

[ ] Yes
[ X] No

Pull Request Type

What kind of change does this Pull Request introduce?

When releasing this app in production with 100+ users, there is a possibility of encountering rate limit issues when using the chat feature. To mitigate this, we can implement tenacity exponential backoff. This approach will automatically retry the chat requests if a rate limit is encountered, and it will provide the appropriate response once the rate limit has been lifted. By incorporating tenacity exponential backoff, the app will handle rate limit scenarios more gracefully and provide a smoother user experience during periods of high traffic or usage.

Reference : https://github.com/openai/openai-cookbook/blob/main/examples/How_to_handle_rate_limits.ipynb

[ ] Bugfix
[ X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

Get the code
azd deploy

git clone [repo-address]
cd [repo-name]
git checkout [branch-name]
npm install

Test the code

What to Check

Verify that the following are valid

...

Other Information

pamelafox · 2023-08-04T19:47:13Z

Thanks @vrajroutu ! We were just discussing this today. Did you do loadtesting with this in place? I'm wondering if the backoff ends up increasing the requests overall, putting users in competition with each other, or if you find it's working well to alleviate "burst" situations.

Also I assume you're maxing your deployments to 240K for this situation.

vrajroutu · 2023-08-04T20:49:09Z

@pamelafox

Firstly, enabling streaming in the environment has helped to reduce rate limit issues, but even with 100+ users on the platform simultaneously, I still observed a few rate limit problems. In my exploration of solutions, I came across various articles, including this one AvoidRateLimits. This change won't impact the existing functionality, but during high loads on the environment and requests to the OAI model, it will be beneficial. A similar change was successfully implemented in the prep docs to address rate limit issues in indexing the documents. I plan to conduct load testing and will update the results accordingly.

This change is especially beneficial for users utilizing GPT-4, as the TPM is limited to 90k per subscription. By implementing the suggested approach, we can better manage the rate limits and ensure a smoother user experience, even during peak usage periods.

pamelafox · 2023-08-04T21:39:43Z

Ah I didn't realize GPT-4 had a lower TPM limit. Yeah, the backoff technique works well in prepdocs.py where there's a single caller to the API - my concern with using it for the per-user API calls is that it may increase load overall, if there truly are too many users than it can handle. I could see this approach being able to smooth over spikes of activity, but not being the solution to a long period of high load. Other ideas: request more quota, put a message asking users to reload the page (thus reducing their history), just removing the history entirely from messages.

Let us know how it goes in production!

vrajroutu · 2023-08-04T21:49:23Z

Absolutely, I agree with your points. Increasing the capacity of GPT-4 would certainly be a long-term solution, but since it's currently paused by the MS team, we need to find ways to optimize the current setup. Enabling streaming has already shown positive results in reducing rate limit issues, and incorporating the backoff technique will further help in smoothing out spikes in activity. I understand that users might value keeping the chat history and continuing their conversations seamlessly. We can explore different options to manage the rate limit, like gradually requesting more quota and monitoring the usage patterns. If the backoff strategy combined with streaming can alleviate the majority of the rate limit issues and provide a good user experience, that would be a positive step forward. i will keep an eye on the system's performance.

github-actions · 2023-10-04T01:44:43Z

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed.

pamelafox · 2023-10-04T03:43:25Z

I think we may still want to merge this, I just want to do more loadtesting first.

github-actions · 2023-12-16T01:45:04Z

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed.

pamelafox · 2024-03-19T23:14:08Z

@vrajroutu Are you still using tenacity in this situation, now that the OpenAI SDK has built-in retries and a customizable max_retries parameter? It seems less necessary these days.

vrajroutu · 2024-03-28T02:44:03Z

@vrajroutu Are you still using tenacity in this situation, now that the OpenAI SDK has built-in retries and a customizable max_retries parameter? It seems less necessary these days.

Hi @pamelafox, I don't think it's necessary anymore. Users can now utilize API management or an app gateway to scale Azure OpenAI to multiple instances, and it works quite effectively.

vrajroutu added 3 commits August 4, 2023 15:35

Update chatreadretrieveread.py

027db1e

Update requirements.txt

c3ac12c

Update chatreadretrieveread.py

f71a36c

pamelafox mentioned this pull request Aug 24, 2023

Feature Req : Handle the Ratelimits in chatreadretrive #496

Open

github-actions bot added the Stale label Oct 4, 2023

github-actions bot removed the Stale label Oct 5, 2023

github-actions bot added the Stale label Dec 16, 2023

npsquared mentioned this pull request Feb 29, 2024

APIM integration microsoft/sample-app-aoai-chatGPT#667

Open

vrajroutu closed this Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exponential Backoff Mechanism for RateLimit Issues in /Chat #500

Exponential Backoff Mechanism for RateLimit Issues in /Chat #500

vrajroutu commented Aug 4, 2023

pamelafox commented Aug 4, 2023

vrajroutu commented Aug 4, 2023 •

edited

pamelafox commented Aug 4, 2023

vrajroutu commented Aug 4, 2023

github-actions bot commented Oct 4, 2023

pamelafox commented Oct 4, 2023

github-actions bot commented Dec 16, 2023

pamelafox commented Mar 19, 2024

vrajroutu commented Mar 28, 2024

Exponential Backoff Mechanism for RateLimit Issues in /Chat #500

Exponential Backoff Mechanism for RateLimit Issues in /Chat #500

Conversation

vrajroutu commented Aug 4, 2023

Purpose

Does this introduce a breaking change?

Pull Request Type

How to Test

What to Check

Other Information

pamelafox commented Aug 4, 2023

vrajroutu commented Aug 4, 2023 • edited

pamelafox commented Aug 4, 2023

vrajroutu commented Aug 4, 2023

github-actions bot commented Oct 4, 2023

pamelafox commented Oct 4, 2023

github-actions bot commented Dec 16, 2023

pamelafox commented Mar 19, 2024

vrajroutu commented Mar 28, 2024

vrajroutu commented Aug 4, 2023 •

edited