# Estimating the likely cost of an API call

I've created a simple Python library to assist with estimating the cost of an API call before making the call. The library can be installed with `pip install llm_cost_estimation`. It exports the following objects:

- models: Contains essential details about various LLMs, including cost per prompt token, cost per completion token, model description, and maximum allowed context length (in tokens).
- count_token: A utility function to count the tokens present in a specific prompt or chat history using a given model's encoding system.
- estimate_costs: A utility function to provide cost estimates for API calls to specified LLMs, based on selected model and length of text prompt or average length of messages in chat history.

# Fetching a summary of model costs

To view the models object:

In [1]:
from llm_cost_estimation import models
import pandas as pd

# Convert the list of dictionaries to a DataFrame
models_df = pd.DataFrame(models)

# Display the DataFrame
models_df.style\
    .hide(axis="index")\
    .set_properties(**{'max-width': '80px'})\
    .set_properties(subset=['description'], **{'max-width': '280px'})\
    .set_table_styles([dict(selector="th",props=[('max-width', '85px'),('word-break', 'break-all')])])

completion_cost_per_token,description,max_tokens,name,prompt_cost_per_token
0.002 / 1000,Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration.,4096,gpt-3.5-turbo,0.002 / 1000
0.06 / 1000,Same capabilities as the base gpt-4 mode but with 4x the context length. Will be updated with our latest model iteration.,32768,gpt-4-32k,0.12 / 1000
0.06 / 1000,"More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. Will be updated with our latest model iteration.",8192,gpt-4,0.03 / 1000
0.0004 / 1000,"Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost.",2049,text-ada-001,0.0004 / 1000
0.0005 / 1000,"Capable of straightforward tasks, very fast, and lower cost.",2049,text-babbage-001,0.0005 / 1000
0.002 / 1000,"Very capable, faster and lower cost than Davinci.",2049,text-curie-001,0.002 / 1000
0.02 / 1000,,8001,text-davinci-001,0.02 / 1000
0.02 / 1000,Similar capabilities to text-davinci-003 but trained with supervised fine-tuning instead of reinforcement learning,4097,text-davinci-002,0.02 / 1000
0.02 / 1000,"Most capable GPT-3 model. Can do any task the other models can do, often with higher quality.",4097,text-davinci-003,0.02 / 1000
0.06 / 1000,"Snapshot of gpt-4 from March 14th 2023. Unlike gpt-4, this model will not receive updates, and will be deprecated 3 months after a new version is released.",8192,gpt-4-0314,0.03 / 1000


# Counting tokens in a prompt and estimating completion tokens

The `count_tokens` function counts tokens in a prompt and makes a crude estimate of completion tokens. The estimate of completion tokens is based on the simple heuristic that the completion is likely to be of similar length to either the prompt or previous messages in the conversation. This heuristic is fast, but it may not be all that accurate. I'd love to have somebody [contribute](https://github.com/chriscarrollsmith/llm-cost-estimator) an optional alterantive estimation method, such as using a real-world measured prompt length to completion length ratio, or asking a very low-cost model like text-ada-001 guess how long the completion might be.

To use the `count_tokens` function:

In [30]:
from llm_cost_estimation import count_tokens

text = "Hello, how are you?"
model = "gpt-4"

# Count tokens in the text
prompt_tokens, estimated_completion_tokens = count_tokens(text, model)

print(f"Number of tokens in the prompt: {prompt_tokens}")
print(f"Estimated number of tokens in the completion: {estimated_completion_tokens}")

Number of tokens in the prompt: 6
Estimated number of tokens in the completion: 6


# Estimating cost of a prompt + completion

The `estimate_cost` function tries to guess what the cost of a text completion will be, using `count_tokens` on its backend as a helper function.

To use the `estimate_cost` function:

In [31]:
from llm_cost_estimation import estimate_cost

prompt = "Hello, how are you?"
model = "gpt-4"

# Estimate the cost for the completion
estimated_cost = estimate_cost(prompt, model)

print(f"Estimated cost of this completion: {estimated_cost}")

Estimated cost of this completion: 0.0005399999999999999
