# Token Usage & Cost Estimator — OpenAI Models

This notebook helps you estimate token usage and **approximate** API costs for several OpenAI models based on a **test input text** that you provide.  
It computes an estimated token count for your text and multiplies by the current per‑million token rates for each model.

### Models covered
- `gpt-4.1-2025-04-14`
- `gpt-4.1-mini-2025-04-14`
- `gpt-4.1-nano-2025-04-14`
- `o3-2025-04-16`
- `o4-mini-2025-04-16`
- `o3-mini-2025-01-31`

### What you’ll get
- Estimated input token count for your test text
- Optional assumed output tokens (defaults to 25% of input)
- Estimated cost (USD) per model for input + output tokens



## Pricing sources (as of Sep 2025)
The per‑million token prices below are taken from OpenAI's official docs/pages (as of late 2025):

- **GPT‑4.1**: $2.00 input / $0.50 cached input / $8.00 output per 1M tokens.  
- **GPT‑4.1 mini**: $0.40 input / $0.10 cached input / $1.60 output per 1M tokens.  
- **GPT‑4.1 nano**: $0.10 input / $0.025 cached input / $0.40 output per 1M tokens.  
- **o3** (reasoning): $2.00 input / $0.50 cached input / $8.00 output per 1M tokens.  
- **o4‑mini** (reasoning): $2.00 input / $0.50 cached input / $8.00 output per 1M tokens.  
- **o3‑mini** (reasoning): $1.10 input / $0.55 cached input / $4.40 output per 1M tokens.

Notes
- “Cached input” rates apply when using prompt caching features.
- Only text token rates are considered here.
- Prices can change; always verify against the official pricing page before budgeting.


In [1]:
# === Configuration: model pricing (USD per 1M tokens) ===
MODEL_PRICES = {
    "gpt-4.1-2025-04-14":    {"input_per_m": 2.00,  "cached_input_per_m": 0.50,  "output_per_m": 8.00},
    "gpt-4.1-mini-2025-04-14": {"input_per_m": 0.40,  "cached_input_per_m": 0.10,  "output_per_m": 1.60},
    "gpt-4.1-nano-2025-04-14": {"input_per_m": 0.10,  "cached_input_per_m": 0.025, "output_per_m": 0.40},
    "o3-2025-04-16":         {"input_per_m": 2.00,  "cached_input_per_m": 0.50,  "output_per_m": 8.00},
    "o4-mini-2025-04-16":    {"input_per_m": 2.00,  "cached_input_per_m": 0.50,  "output_per_m": 8.00},
    "o3-mini-2025-01-31":    {"input_per_m": 1.10,  "cached_input_per_m": 0.55,  "output_per_m": 4.40},
}

MODEL_LIST = list(MODEL_PRICES.keys())

print(f"Loaded pricing for {len(MODEL_LIST)} models.")

Loaded pricing for 6 models.



## 1) Paste your test text
Change the `TEST_TEXT` variable below to your own text. This notebook will compute token estimates and costs.


In [2]:
TEST_TEXT = """Paste your test text here. You can include multiple paragraphs, code, etc.
The notebook will estimate tokens and compute costs for each model listed above."""

print(TEST_TEXT[:200] + ("..." if len(TEST_TEXT) > 200 else ""))

Paste your test text here. You can include multiple paragraphs, code, etc.
The notebook will estimate tokens and compute costs for each model listed above.



## 2) Token counting
We try to use the official `tiktoken` tokenizer if available locally.  
If it's not available, we fall back to a heuristic:
- **Heuristic estimate**: `ceil(len(text) / 4)` tokens (≈ 4 characters per token for English)


In [3]:
# === Token counting helpers ===
import math

def estimate_tokens_heuristic(text: str) -> int:
    # Approximate: ~4 characters per token
    return int(math.ceil(len(text) / 4)) if text else 0

# Attempt to use tiktoken if available (optional)
def count_tokens(text: str) -> tuple[int, str]:
    """Return (token_count, method_used)."""
    try:
        import tiktoken  # requires local install
        # Use a modern tokenizer as a reasonable default. Fall back to cl100k_base if o200k_base is unavailable.
        try:
            enc = tiktoken.get_encoding("o200k_base")
        except Exception:
            enc = tiktoken.get_encoding("cl100k_base")
        tokens = enc.encode(text)
        return len(tokens), f"tiktoken ({enc.name})"
    except Exception:
        return estimate_tokens_heuristic(text), "heuristic (~4 chars ≈ 1 token)"

INPUT_TOKENS, TOKENIZER_USED = count_tokens(TEST_TEXT)
print(f"Estimated input tokens: {INPUT_TOKENS} (using {TOKENIZER_USED})")

Estimated input tokens: 30 (using tiktoken (o200k_base))



## 3) Set an assumed output length (optional)
Set `ASSUMED_OUTPUT_TOKENS` if you already know roughly how long the model's response will be.  
Otherwise, we’ll default to `OUTPUT_TO_INPUT_RATIO = 0.25` (i.e., 25% as many output tokens as input).


In [4]:
ASSUMED_OUTPUT_TOKENS=None   # e.g., set to an integer like 150, or leave as None to use the ratio
OUTPUT_TO_INPUT_RATIO=0.25   # used only when ASSUMED_OUTPUT_TOKENS is None

OUTPUT_TOKENS = ASSUMED_OUTPUT_TOKENS if ASSUMED_OUTPUT_TOKENS is not None else int(math.ceil(INPUT_TOKENS * OUTPUT_TO_INPUT_RATIO))
print(f"Assumed output tokens: {OUTPUT_TOKENS}")

Assumed output tokens: 8



## 4) Compute costs
Costs are linear in tokens:
- **Input cost** = `input_tokens / 1e6 * input_price_per_million`
- **Output cost** = `output_tokens / 1e6 * output_price_per_million`
We compute both **standard** and **cached** input scenarios.


In [5]:
import pandas as pd

def cost_breakdown(model_name: str, in_tokens: int, out_tokens: int, use_cached: bool = False) -> dict:
    p = MODEL_PRICES[model_name]
    in_rate = p["cached_input_per_m"] if use_cached else p["input_per_m"]
    out_rate = p["output_per_m"]
    in_cost = (in_tokens / 1_000_000) * in_rate
    out_cost = (out_tokens / 1_000_000) * out_rate
    return {
        "model": model_name,
        "input_tokens": in_tokens,
        "output_tokens": out_tokens,
        "used_cached_input": use_cached,
        "input_cost_usd": round(in_cost, 6),
        "output_cost_usd": round(out_cost, 6),
        "total_cost_usd": round(in_cost + out_cost, 6),
    }

rows_standard = [cost_breakdown(m, INPUT_TOKENS, OUTPUT_TOKENS, use_cached=False) for m in MODEL_LIST]


df_standard=pd.DataFrame(rows_standard).sort_values("total_cost_usd").reset_index(drop=True)


print(df_standard)

                     model  input_tokens  output_tokens  used_cached_input  \
0  gpt-4.1-nano-2025-04-14            30              8              False   
1  gpt-4.1-mini-2025-04-14            30              8              False   
2       o3-mini-2025-01-31            30              8              False   
3       gpt-4.1-2025-04-14            30              8              False   
4            o3-2025-04-16            30              8              False   
5       o4-mini-2025-04-16            30              8              False   

   input_cost_usd  output_cost_usd  total_cost_usd  
0        0.000003         0.000003        0.000006  
1        0.000012         0.000013        0.000025  
2        0.000033         0.000035        0.000068  
3        0.000060         0.000064        0.000124  
4        0.000060         0.000064        0.000124  
5        0.000060         0.000064        0.000124  
