## Calculating the costs

### Total Size of the Database 

The total size of the database can be calculated using the following relationship:

$$n_c \times c \geq N$$

where 

* $N$ is the total number of characters in all documents.
* $n_c$ is the number of chunks.
* $c$ is the chunk size.

If the overlap between chunks is zero, then: $n_c \times c = N$

In other words, if there is an overlap $v \in (0,1) $ (a fraction of the chunk size $c$), then the total size of the database $T$ is given by:

$$T=\frac{1}{1-v} N$$

For example, if we have an overlap of $v=1/3$ of overlapping (200 chunk overlap for a chunk size of 600), then the total size of the database will be 1.5 times the total number of characters in all documents: $T=1.5 \; N$. 

### Convert Characters to Tokens

A common rule of thumb is that one token corresponds to approximately four characters: 

$$N_{tokens} = N_{characters}/ 4$$

The number of characters per token can actually vary, ranging from 1,500 to 3,000.


Here's a pivot table comparing GPT-4o, GPT-4o mini, and GPT-3.5 Turbo (in dollars per millon tokens  $ $ $ per 1M tokens):

| Model | Max Input Tokens (Context) | Max Output Tokens (Answer) | Input Token Cost | Output Token Cost |
|-------|-----------------|-----------------------|--------------------|-----------------|
| GPT-4o | 128,000 | 16,384 | 5.00 | 15.00  |
| GPT-4o mini | 128,000 | 16,384 | 0.15 | 0.60 |
| GPT-3.5 Turbo | 16,384 | 4,096 | 3.00 | 6.00 |


Suppose that, on average, we send 150 tokens per request and receive 250 tokens in response. How much will it cost to run 1,000 requests?


* Input: $0.6 \$ $ for 1M tokens

* Output: $0.15 \$ $ for 1M tokens


$(150\times 0.6 \; + \; 250\times0.15)\times 10^3/10^6= 0.13 \$ $

Formula:

$$(\overline{X}\times\text{cost}_X \; + \; \overline{Y}\times\text{cost}_Y)\times \frac{N_{req}}{N_{tokens}}= \text{price}\; \$ $$

* $\overline{X}$ - average number of input tokens per request 
* $\overline{Y}$ - average number of output tokens per request 
* $\text{cost}_X$ - cost of an input token 
* $\text{cost}_Y$ - cost of an output token 
* $N_{req}$ - number of requests
* $N_{tokens}$ - number of tokens in the specified cost

---

from [OpenAI](https://platform.openai.com/docs/guides/embeddings/embedding-models):

```python
import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

num_tokens_from_string("tiktoken is great!", "cl100k_base")

```

`cl100k_base` is used in gpt-4, gpt-3.5-turbo, text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large. 

The cost of embeddings varies depending on the size of the models: inference from larger models costs more, but larger embeddings are of higher quality.

| Model                    | Dimensions | Price ($ $ $ per 1M tokens)   | Performance on MTEB Eval | Max Input (tokens) |
|--------------------------|------------|----------------|--------------------|-----------|
| text-embedding-3-small   |     1536     |   0.020      | 62.3%        | 8191      |
| text-embedding-3-large   |     3072     |   0.130      | 64.6%        | 8191      |
| text-embedding-ada-002   |     1536     |   0.100      | 61.0%        | 8191      |





In [4]:
def calculate_llm_cost(avg_input_tokens, avg_output_tokens, input_cost, output_cost, num_requests, tokens_per_cost):
    """
    Parameters:
    avg_input_tokens (float): Average number of input tokens per request
    avg_output_tokens (float): Average number of output tokens per request
    input_cost (float): Cost of an input token
    output_cost (float): Cost of an output token
    num_requests (int): Number of requests
    tokens_per_cost (int): Number of tokens in the specified cost
    
    Returns:
    float: The calculated price in dollars
    """
    # Calculate the cost per request
    cost_per_request = (avg_input_tokens * input_cost) + (avg_output_tokens * output_cost)
    
    # Calculate the total price
    price = cost_per_request * (num_requests / tokens_per_cost)
    
    return round(price, 3)

def calculate_embeddings_cost(avg_input_tokens, input_cost, num_requests, tokens_per_cost):

    """
    Parameters:
    avg_input_tokens (float): Average number of input tokens per request
    input_cost (float): Cost of an input token
    num_requests (int): Number of requests
    tokens_per_cost (int): Number of tokens in the specified cost
    
    Returns:
    float: The calculated price in dollars
    """
    # Calculate the cost per request
    cost_per_request = avg_input_tokens * input_cost
    
    # Calculate the total price
    price = cost_per_request * (num_requests / tokens_per_cost)
    
    return round(price, 3)



In [5]:
# in dollars 
chunk_size_char = 2000
chunk_size = chunk_size_char/4 # in tokens
rank_size = 5 # how many chunks are retrieved for the context
num_chanks = 100 

# 13000 students ask a single question 
avg_input_tokens = 150 # the length of a question
avg_output_tokens = 250  # the length of a answer
num_requests = 13000  # number of students/queries 

rag_input = chunk_size*rank_size + avg_input_tokens

# COST OF MODELS
input_cost = 0.6
output_cost = 0.15

#emb_cost = 0.020 # cheapest OpenAI embeddings: text-embedding-3-small
emb_cost = 0.10 # text-embedding-ada-002
tokens_per_cost = 1000000

price_llm = calculate_llm_cost(rag_input,
                        avg_output_tokens, 
                        input_cost, 
                        output_cost, 
                        num_requests, 
                        tokens_per_cost)

rag_input_tokens = chunk_size*num_chanks

# In general, we need to create embeddings 1 time, but in practice, it can be several attempts
price_emb = calculate_embeddings_cost(rag_input_tokens,
                                      emb_cost, 1, 
                                      tokens_per_cost)

print(f"Price LLM: {price_llm}$")
print(f"Price Embeddings: {price_emb}$")
print(f"Price Total: {round(price_llm+price_emb,3)}$")

Price LLM: 21.157$
Price Embeddings: 0.006$
Price Total: 21.163$


```
python3 RAG_calculator.py
```


![rag_cost](rag_cost.png)

In [2]:
# Doesn’t work stable, see RAG_calculator.py

# import tkinter as tk
# from tkinter import ttk

# def calculate():
#     try:
#         chunk_size_char = int(chunk_size_entry.get())
#         #chunk_size_char = 2000
#         chunk_size = chunk_size_char / 4  # in tokens
        
#         rank_size = int(rank_size_entry.get())
#         num_chunks = int(num_chunks_entry.get())
#         avg_input_tokens = int(avg_input_tokens_entry.get())
#         avg_output_tokens = int(avg_output_tokens_entry.get())
#         num_requests = int(num_requests_entry.get())

#         rag_input = chunk_size * rank_size + avg_input_tokens

#         # COST OF MODELS
#         input_cost = 0.6
#         output_cost = 0.15
#         emb_cost = 0.020
#         tokens_per_cost = 1000000

#         price_llm = calculate_llm_cost(rag_input, avg_output_tokens, input_cost, output_cost, num_requests, tokens_per_cost)

#         rag_input_tokens = chunk_size * num_chunks

#         price_emb = calculate_embeddings_cost(rag_input_tokens, emb_cost, 1, tokens_per_cost)

#         price_total = round(price_llm + price_emb, 3)

#         result_label.config(text=f"Price LLM: ${price_llm}\nPrice Embeddings: ${price_emb}\nPrice Total: ${price_total}")
#     except ValueError:
#         result_label.config(text="Please enter valid numbers")

# # Create the main window
# root = tk.Tk()
# root.title("RAG Calculator")

# # Create and place input fields
# ttk.Label(root, text="Chunk Size:").grid(row=0, column=0, padx=5, pady=5)
# chunk_size_entry = ttk.Entry(root)
# chunk_size_entry.grid(row=0, column=1, padx=5, pady=5)

# ttk.Label(root, text="Rank Size:").grid(row=1, column=0, padx=5, pady=5)
# rank_size_entry = ttk.Entry(root)
# rank_size_entry.grid(row=1, column=1, padx=5, pady=5)

# ttk.Label(root, text="Number of Chunks:").grid(row=2, column=0, padx=5, pady=5)
# num_chunks_entry = ttk.Entry(root)
# num_chunks_entry.grid(row=2, column=1, padx=5, pady=5)

# ttk.Label(root, text="Avg Input Tokens:").grid(row=3, column=0, padx=5, pady=5)
# avg_input_tokens_entry = ttk.Entry(root)
# avg_input_tokens_entry.grid(row=3, column=1, padx=5, pady=5)

# ttk.Label(root, text="Avg Output Tokens:").grid(row=4, column=0, padx=5, pady=5)
# avg_output_tokens_entry = ttk.Entry(root)
# avg_output_tokens_entry.grid(row=4, column=1, padx=5, pady=5)

# ttk.Label(root, text="Number of Requests:").grid(row=5, column=0, padx=5, pady=5)
# num_requests_entry = ttk.Entry(root)
# num_requests_entry.grid(row=5, column=1, padx=5, pady=5)

# # Create and place the calculate button
# calculate_button = ttk.Button(root, text="Calculate", command=calculate)
# calculate_button.grid(row=6, column=0, columnspan=2, pady=10)

# # Create and place the result label
# result_label = ttk.Label(root, text="")
# result_label.grid(row=7, column=0, columnspan=2, pady=5)

# # Start the main event loop
# root.mainloop()

---
* Author: Anastasiia Popova
* Email: anastasiia.popova@stud.unibas.ch

[Perplexity AI](https://www.perplexity.ai/) assisted in code writing, editing, and more effective information searches. The generated output underwent critical evaluation. The author is solely responsible for the content.
