Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Budget Manager + Support for Anthropic, Cohere, Palm (100+ LLMs using LiteLLM) #99

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ishaan-jaff
Copy link

Addressing:
#47
#55

This PR addresses two problems:

  • Add support for 100+ LLMs
  • Use a budget manager for limiting $ spend per session or per user

Add support for 100+ LLMs

using LiteLLM https://github.com/BerriAI/litellm/
LiteLLM is a lightweight package to simplify LLM API calls - use any llm as a drop in replacement for gpt-3.5-turbo.

Example

from litellm import completion
## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["COHERE_API_KEY"] = "cohere key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages)

Use a budget manager for limiting $ spend per session or per user

LiteLLM exposes a budget manager for each session/user

  • We init a budget manager
  • Check the budget manager before making completion calls
  • Update with costs after making completion calls

@ishaan-jaff
Copy link
Author

@FraserLee @henri123lemoine can i get a review on this pr ?

if this initial commit looks good i can add docs/testing too

Copy link
Collaborator

@mruwnik mruwnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea!

@@ -26,6 +28,9 @@

ENCODER = tiktoken.get_encoding("cl100k_base")

# initialize a budget manager to control costs for gpt-4/other llms
budget_manager = litellm.BudgetManager(project_name="stampy_chat")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the budget per user get configured? Could you add a new item to env.py so that it can be configured? Also, what would the units be? (I had a very quick glance at the litellm docs, but otherwise don't know anything about it)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll give it a proper look tomorrow)

@@ -225,13 +235,16 @@ def talk_to_robot_internal(index, query: str, mode: str, history: List[Dict[str,

# convert talk_to_robot_internal from dict generator into json generator
def talk_to_robot(index, query: str, mode: str, history: List[Dict[str, str]], k: int = STANDARD_K, log: Callable = print):
yield from (json.dumps(block) for block in talk_to_robot_internal(index, query, mode, history, k, log))
session_id = str(uuid.uuid4())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this, then budget manager has an internal dict to count how much a given session has used? But if you're creating a new id with each call to this function, then each session will have max 1 call? I'm planning on adding session ids, as it will be needed for logging anyway, so could you do this by extracting the session id from the request params in the main.py functions?

@@ -225,13 +235,16 @@ def talk_to_robot_internal(index, query: str, mode: str, history: List[Dict[str,

# convert talk_to_robot_internal from dict generator into json generator
def talk_to_robot(index, query: str, mode: str, history: List[Dict[str, str]], k: int = STANDARD_K, log: Callable = print):
yield from (json.dumps(block) for block in talk_to_robot_internal(index, query, mode, history, k, log))
session_id = str(uuid.uuid4())
budget_manager.create_budget(total_budget=10, user=session_id) # init $10 budget
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't hardcode it - create a setting in env.py

@@ -181,14 +186,19 @@ def talk_to_robot_internal(index, query: str, mode: str, history: List[Dict[str,
t1 = time.time()
response = ''

for chunk in openai.ChatCompletion.create(
# check if budget exceeded for session
if budget_manager.get_current_cost(user=session_id) <= budget_manager.get_total_budget(session_id):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for the number of allowed tokens or chat calls? Is it a hard total, or does it get reset every now and then? The code is run on gunicorn workers - how will that influence it, as I'm guessing litellm won't communicate across processes?

@ccstan99 ccstan99 mentioned this pull request Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants