# Beginner's Guide to Understanding LLM Pricing üí∞

Welcome to this guide on Large Language Model (LLM) pricing! If you're building with AI, understanding costs is crucial. In this notebook, we'll break down:

1.  **Tokens**: What they are and why they matter.
2.  **Pricing Models**: Input vs. Output costs.
3.  **Cost Calculator**: A Python tool to estimate your bills.
4.  **Model Comparison**: Choosing the right model for your budget.

Let's dive in! üöÄ

In [None]:
# Install necessary libraries
!pip install tiktoken openai

import os
from google.colab import userdata

# Setup environment variables using colab.userdata
try:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
    print("API Key loaded successfully!")
except Exception as e:
    print("Warning: OPENAI_API_KEY not found in userdata. Some examples might need it.")
    print("Please set it in the 'Secrets' tab in Google Colab.")

## 1. What are Tokens? üî†

LLMs don't read words like we do; they read "tokens".

-   A token can be a word, part of a word, or even a space.
-   **Rule of Thumb**: 1,000 tokens $\approx$ 750 words.
-   **Pricing**: You are charged per **1 million tokens (1M)**.

### The Theory: Byte Pair Encoding (BPE)
Most modern LLMs use a tokenization method called **Causal Language Modeling** relying on **Byte Pair Encoding (BPE)**.

Instead of defining every word in the dictionary, the model learns designed "sub-word" units.
- Common words like "apple" are single tokens.
- Rare words like "antidisestablishmentarianism" are split into multiple tokens (`anti`, `dis`, `establishment`...).

This efficiency allows the model to handle any text, even made-up words, while keeping the vocabulary memory manageable.

In [None]:
import tiktoken

def count_tokens(text, model="gpt-4o"):
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")

    tokens = encoding.encode(text)
    return len(tokens), tokens

text_sample = "Generative AI is transforming the world!"
count, token_list = count_tokens(text_sample)

print(f"Text: '{text_sample}'")
print(f"Word Count: {len(text_sample.split())}")
print(f"Token Count: {count}")
print(f"Token IDs: {token_list}")

## 2. The Pricing Model üè∑Ô∏è

Most API providers (OpenAI, Anthropic, etc.) split costs into two parts:

1.  **Input Tokens (Prompt)**: What you send to the AI. usually cheaper.
2.  **Output Tokens (Completion)**: What the AI writes back. Usually more expensive (3x - 4x input price).

### Deep Dive: Why is Output more expensive?
You might think processing text is harder than writing it, but for an LLM (Transformer architecture), it's the opposite!

1.  **Parallel vs. Serial**: When the model reads your **Input**, it processes all tokens *in parallel* (all at once). This is very optimized for GPUs.
2.  **Autoregression**: When the model generates **Output**, it must do it *one token at a time*. It predicts token A, adds it to the list, then predicts token B, and so on. This serial process is much slower and consumes more specialized compute resources (memory bandwidth), justifying the higher price.

## 3. Cost Estimator Calculator üßÆ

Let's build a calculator to estimate how much a task will cost.

**Example Rates (approximate):**
- **GPT-4o**: $2.50 / 1M input, $10.00 / 1M output
- **GPT-4o-mini**: $0.15 / 1M input, $0.60 / 1M output
- **Claude 3.5 Sonnet**: $3.00 / 1M input, $15.00 / 1M output

In [None]:
def calculate_cost(input_text, output_text, model_name="gpt-4o"):
    # Standard pricing per 1M tokens (as of late 2024 - verify current rates)
    pricing = {
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "claude-3-5-sonnet": {"input": 3.00, "output": 15.00}
    }

    if model_name not in pricing:
        return "Model pricing not found."

    in_count, _ = count_tokens(input_text, model_name)
    out_count, _ = count_tokens(output_text, model_name)

    input_cost = (in_count / 1_000_000) * pricing[model_name]["input"]
    output_cost = (out_count / 1_000_000) * pricing[model_name]["output"]
    total_cost = input_cost + output_cost

    return {
        "Model": model_name,
        "Input Tokens": in_count,
        "Output Tokens": out_count,
        "Total Cost ($)": f"${total_cost:.6f}"
    }

# Example Usage
user_prompt = "Summarize this 5000-word annual report..." * 100
ai_response = "Here is the summary..." * 20

print(calculate_cost(user_prompt, ai_response, "gpt-4o"))
print(calculate_cost(user_prompt, ai_response, "gpt-4o-mini"))

## 4. Comparison & Strategy üìä

| Model Tier | Best For | Typical Price Range (Input/Output per 1M) |
| :--- | :--- | :--- |
| **Flagship** (GPT-4o, Claude 3.5 Sonnet) | Complex logic, coding, creative writing | $2.50 - $3.00 / $10.00 - $15.00 |
| **Efficient** (GPT-4o-mini, Haiku) | Summaries, simple chat, high volume | $0.15 - $0.25 / $0.60 - $1.25 |
| **Open Source** (Hosted Llama 3) | Privacy, specific tasks, lowest cost | Varies (often very cheap on Groq/TogetherAI) |

### üí° Pro Tips for Saving Money:
1.  **Use efficient models** for simple tasks.
2.  **Optimize prompts** to be concise (reduce input tokens).
3.  **Limit output length** (don't ask for 1000 words if 100 will do).
4.  **Batch processing** (some providers offer 50% off for non-urgent requests).