Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate usage limits #55

Open
mruwnik opened this issue Jul 10, 2023 · 4 comments
Open

Investigate usage limits #55

mruwnik opened this issue Jul 10, 2023 · 4 comments

Comments

@mruwnik
Copy link
Collaborator

mruwnik commented Jul 10, 2023

Find out how many requests per second can be handled by the current system. This applies both to the server infrastructure, but also to the underlying LLM system

@ccstan99
Copy link
Collaborator

OpenAI LLM chat rate limits
Current defaults as of July 2023:
3,500 RPM (requests per minute) and 90,000 TPM (tokens per minute)

@henri123lemoine
Copy link
Collaborator

We can apply to increase our rate limits. Should we?
It's also important to note that, should we decide to use it, GPT-4's current rate-limit is 200. OpenAI will increase that number over time.

@henri123lemoine
Copy link
Collaborator

Current rate limits are of 200 messages or 40,000 tokens per minute, which will likely be reached in 5-6 questions if we use the full context window for every chat. That might be a problem.
We need to set up a system that handles GPT-4 rate limits by switching to ChatGPT if they occur. Additionally, we could limit spamming by slowing down the streaming if a single user sends queries at an impossible speed.

@ishaan-jaff
Copy link

@mruwnik @ccstan99 @henri123lemoine

i'm the maintainer of LiteLLM we allow you to maximize your throughput/increase rate limits - load balance between multiple deployments (Azure, OpenAI)
I believe litellm can be helpful here - and i'd love your feedback if we're missing something

Here's how to use it
Docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants