## **Models Used & How Cost is Calculated**

Inference cost is calculated based on the number of tokens in the input and output. The cost is calculated as follows:
```python
def calculate_cost(input_tokens, output_tokens, cost_per_token):
    return (input_tokens + output_tokens) * cost_per_token
```
The models we use are defined in `variables.py` under `MODEL_CONFIGS`:
```python
MODEL_CONFIGS = {
    "openai": {
        "provider": "openai",
        ...
    },
    "claude": {
        "provider": "claude",
        ...
    },
    "gemini": {
        "provider": "gemini",
        ...
    },
    "qwen": {
        "provider": "together",
        ...
    },
    "mistral": {
        "provider": "together",
        ...
    },
    "llama": {
        "provider": "together",
        ...
    },
    "deepseek": {
        "provider": "together",
        ...
    },
    "dummy": {
        "provider": "dummy",
        ...
    },
}
```
In terms of characters to tokens, 4 characters typically constitute one token. For example, the word "hello" is 5 characters long and would be 2 tokens

## **Cost per token**

| Model | Input Tokens | Output Tokens | Cached Tokens |
|-------|--------------|---------------|---------------|
| **GPT-4o-Mini** | $0.15 / 1M | $0.60 / 1M | $0.075 / 1M (cached input) |
| **Claude 3.7 Sonnet** | $3.00 / 1M | $15.00 / 1M | $3.75 / 1M (cached write)<br>$0.30 / 1M (cached read) |
| **Gemini 2.0 Flash** | $0.10 / 1M | $0.40 / 1M | $0.025 / 1M |
| **Qwen2.5-7B-Instruct** | $0.30 / 1M (general price) | $0.30 / 1M (general price) | - |
| **Mistral 7B Instruct** | $0.20 / 1M (general price) | $0.20 / 1M (general price) | - |
| **Llama-3.3-70B-Instruct-Turbo** | $0.88 / 1M (general price) | $0.88 / 1M (general price) | - |
| **DeepSeek-R1-Distill-Llama-70B** | $2.00 / 1M (general price) | $2.00 / 1M (general price) | - |

### **Cost Analysis for Our Prompts**

In [1]:
import pandas as pd
from IPython.display import display, HTML, Markdown

def estimate_tokens(text: str) -> int:
    """Estimate the number of tokens in text using 4 chars ≈ 1 token rule"""
    return len(text) // 4

def calculate_prompt_cost(prompt: str, output_tokens: int = 25) -> dict:
    """
    Calculate the cost of running a prompt through different models
    
    Args:
        prompt: The prompt text
        output_tokens: Estimated number of tokens in the model's response
        
    Returns:
        Dictionary with cost estimates for each model
    """
    # Estimate tokens
    input_tokens = estimate_tokens(prompt)
    
    # Cost per million tokens (input, output)
    costs = {
        "GPT-4o-Mini": {
            "input_rate": 0.15,    # per 1M tokens
            "output_rate": 0.60,   # per 1M tokens
        },
        "Claude 3.7 Sonnet": {
            "input_rate": 3.00,
            "output_rate": 15.00,
        },
        "Gemini 2.0 Flash": {
            "input_rate": 0.10,
            "output_rate": 0.40,
        },
        "Qwen 2.5 7B Instruct": {
            "input_rate": 0.30,
            "output_rate": 0.30,
        },
        "Mistral 7B Instruct": {
            "input_rate": 0.20,
            "output_rate": 0.20,
        },
        "LLama 3.3 70B Instruct Turbo": {
            "input_rate": 0.88,
            "output_rate": 0.88,
        },
        "DeepSeek R1 Distilled Llama 70B": {
            "input_rate": 2.00,
            "output_rate": 2.00,
        }
    }
    
    results = {}
    
    for model, rates in costs.items():
        input_cost = (input_tokens / 1_000_000) * rates["input_rate"]
        output_cost = (output_tokens / 1_000_000) * rates["output_rate"]
        total_cost = input_cost + output_cost
        
        results[model] = {
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
            "input_cost": input_cost,
            "output_cost": output_cost,
            "total_cost": total_cost
        }
    
    return results

def display_styled_df(df, caption=None, highlight_cheapest=True):
    """
    Display styled DataFrame
    
    Args:
        df: Pandas DataFrame to style
        caption: Optional caption for the table
        highlight_cheapest: Whether to highlight the cheapest model row
    """
    # Define styling
    def style_df(styler):
        # Convert numbers to strings with commas for better readability
        if 'Total Input Tokens' in df.columns:
            df_copy = df.copy()
            for col in ['Total Input Tokens', 'Total Output Tokens']:
                if col in df_copy.columns:
                    df_copy[col] = df_copy[col].apply(lambda x: f"{int(x):,}" if isinstance(x, (int, float)) else x)
        else:
            df_copy = df
        
        # Apply formatting to numbers
        format_dict = {}
        for col in df_copy.columns:
            if 'Cost ($)' in col or 'Cost per Prompt' in col:
                format_dict[col] = "${:.8f}"
            elif 'Total Cost' in col:
                format_dict[col] = "${:.4f}"
            
        # Base styling
        styler = pd.DataFrame(df_copy).style.format(format_dict)
        
        # Highlight the cheapest model if requested
        if highlight_cheapest and 'Total Cost ($)' in df_copy.columns:
            min_cost_idx = df_copy['Total Cost ($)'].astype(float).idxmin()
            styler = styler.apply(lambda x: ['background-color: #2a4b34' if i == min_cost_idx else '' for i in range(len(x))], axis=0)
        
        # Apply themed styling for dark mode
        return styler.set_properties(**{
            'text-align': 'center',
            'border': '1px solid #555',
            'padding': '5px',
            'background-color': '#1e1e1e',
            'color': '#e0e0e0'
        }).set_table_styles([{
            'selector': 'th',
            'props': [('background-color', '#2d2d2d'), 
                     ('text-align', 'center'),
                     ('font-weight', 'bold'),
                     ('border', '1px solid #555'),
                     ('color', '#e0e0e0'),
                     ('padding', '5px')]
        }, {
            'selector': 'caption',
            'props': [('caption-side', 'top'),
                      ('font-size', '1.1em'),
                      ('font-weight', 'bold'),
                      ('color', '#d0d0d0'),
                      ('padding', '8px')]
        }]).set_caption(caption)
    
    display(style_df(df.style))

def analyze_prompt_cost(prompt: str, expected_output_length: int = 100):
    """
    Analyze and print the cost of a prompt across different models
    
    Args:
        prompt: The prompt text
        expected_output_length: Expected length in CHARACTERS of the output
    
    Returns:
        Dictionary with cost estimates for each model
    """
    # Add section header with nice formatting
    display(HTML("<hr style='border: 2px solid #4a86e8; margin: 20px 0;'>"))
    display(Markdown("## Single Prompt Cost Analysis"))
    
    # Convert expected output length in characters to tokens
    expected_output_tokens = expected_output_length // 4
    
    results = calculate_prompt_cost(prompt, expected_output_tokens)
    
    # Display prompt information
    info_html = f"""
    <div style='background-color: #2d2d2d; padding: 10px; border-left: 4px solid #4a86e8; margin-bottom: 15px;'>
        <p><b>Prompt length:</b> {len(prompt):,} characters ≈ {estimate_tokens(prompt):,} tokens</p>
        <p><b>Expected output:</b> {expected_output_length:,} characters ≈ {expected_output_tokens:,} tokens</p>
    </div>
    """
    display(HTML(info_html))
    
    # Create data for DataFrame
    data = []
    for model, res in results.items():
        data.append({
            "Model": model,
            "Input Tokens": res["input_tokens"],
            "Output Tokens": res["output_tokens"],
            "Input Cost ($)": res["input_cost"],
            "Output Cost ($)": res["output_cost"],
            "Total Cost ($)": res["total_cost"]
        })
    
    # Create DataFrame and display with styling
    df = pd.DataFrame(data)
    display_styled_df(df, caption="Cost per Model for Single Prompt", highlight_cheapest=True)
    
    # Find the cheapest model
    cheapest = min(results.items(), key=lambda x: x[1]['total_cost'])
    cheapest_html = f"""
    <div style='background-color: #2a4b34; padding: 10px; border-left: 4px solid #4ac656; margin-top: 15px;'>
        <p><b>Cheapest model:</b> {cheapest[0]} (${cheapest[1]['total_cost']:.8f})</p>
    </div>
    """
    display(HTML(cheapest_html))
    
    return results

def analyze_pipeline_cost(
    prompt_text: str,
    analysis_type: str = "functions",
    species_count: int = 1,
    regions_per_species: int = 34,
    hemisphere_separation: bool = True,
    functions_count: int = 5,
    expected_output_length: int = 100,
    models_to_analyze: list = None
):
    """
    Analyze the cost of an entire analysis pipeline
    
    Args:
        prompt_text: Example prompt text to use for token estimation
        analysis_type: Type of analysis ("functions" or "probability")
        species_count: Number of species to analyze
        regions_per_species: Number of brain regions per species
        hemisphere_separation: Whether to analyze hemispheres separately
        functions_count: Number of functions to analyze (for probability analysis)
        expected_output_length: Expected output length in characters
        models_to_analyze: List of models to include in analysis (None = all)
        
    Returns:
        DataFrame with cost estimates for the entire pipeline
    """
    # Add section header with nice formatting
    display(HTML("<hr style='border: 2px solid #4a86e8; margin: 20px 0;'>"))
    display(Markdown(f"## Pipeline Cost Analysis for {analysis_type.capitalize()}"))
    
    # Validate analysis type
    if analysis_type not in ["functions", "probability"]:
        raise ValueError("Analysis type must be 'functions' or 'probability'")
    
    # Convert expected output length to tokens
    expected_output_tokens = expected_output_length // 4
    
    # Calculate total prompt runs
    hemispheres = 2 if hemisphere_separation else 1
    
    if analysis_type == "functions":
        total_prompts = species_count * regions_per_species * hemispheres
    else:  # probability
        total_prompts = species_count * regions_per_species * hemispheres * functions_count
    
    # Calculate cost per prompt
    prompt_costs = calculate_prompt_cost(prompt_text, expected_output_tokens)
    
    # Filter models if specified
    if models_to_analyze:
        prompt_costs = {model: cost for model, cost in prompt_costs.items() if model in models_to_analyze}
    
    # Calculate pipeline costs
    pipeline_costs = {}
    for model, cost_data in prompt_costs.items():
        per_prompt_cost = cost_data["total_cost"]
        total_cost = per_prompt_cost * total_prompts
        
        pipeline_costs[model] = {
            "per_prompt_cost": per_prompt_cost,
            "total_prompts": total_prompts,
            "total_cost": total_cost,
            "input_tokens_total": cost_data["input_tokens"] * total_prompts,
            "output_tokens_total": cost_data["output_tokens"] * total_prompts
        }
    
    # Display pipeline information
    info_html = f"""
    <div style='background-color: #2d2d2d; padding: 10px; border-left: 4px solid #4a86e8; margin-bottom: 15px;'>
        <p><b>Species:</b> {species_count}</p>
        <p><b>Regions per species:</b> {regions_per_species}</p>
        <p><b>Hemisphere separation:</b> {'Yes' if hemisphere_separation else 'No'}</p>
    """
    
    if analysis_type == "probability":
        info_html += f"<p><b>Functions per region:</b> {functions_count}</p>"
    
    info_html += f"""
        <p><b>Total prompts to run:</b> {total_prompts:,}</p>
        <p><b>Prompt length:</b> {len(prompt_text):,} characters ≈ {estimate_tokens(prompt_text):,} tokens</p>
        <p><b>Expected output:</b> {expected_output_length:,} characters ≈ {expected_output_tokens:,} tokens</p>
    </div>
    """
    display(HTML(info_html))
    
    # Create formatted data for display
    data = []
    for model, cost_data in pipeline_costs.items():
        data.append({
            "Model": model,
            "Cost per Prompt ($)": cost_data['per_prompt_cost'],
            "Total Prompts": cost_data["total_prompts"],
            "Total Cost ($)": cost_data["total_cost"],
            "Total Input Tokens": cost_data["input_tokens_total"],
            "Total Output Tokens": cost_data["output_tokens_total"]
        })
    
    # Create DataFrame and display with styling
    df = pd.DataFrame(data)
    display_styled_df(df, caption=f"Total Cost per Model for {analysis_type.capitalize()} Pipeline", highlight_cheapest=True)
    
    # Find cheapest model
    cheapest = min(pipeline_costs.items(), key=lambda x: x[1]['total_cost'])
    
    cheapest_html = f"""
    <div style='background-color: #2a4b34; padding: 10px; border-left: 4px solid #4ac656; margin-top: 15px;'>
        <p><b>Cheapest model for pipeline:</b> {cheapest[0]}</p>
        <p><b>Estimated total cost:</b> ${cheapest[1]['total_cost']:.4f}</p>
    </div>
    """
    display(HTML(cheapest_html))

### Examples

In [2]:
# Example function prompt
function_prompt = """
You are an expert in neuroscience literature analysis. Your task is to
list out the top 5 functions that region **hippocampus** in the **left
hemisphere** of the **human** brain is involved in.

These functions should be based on a simulated review of neuroscience
literature, reflecting how frequently these functions are associated with
this brain region across **peer-reviewed** studies, textbooks, and
reputable sources.

### Guidelines
1. Consider only **human**-specific neuroscience literature. Do **not**
use data from other species.
2. **DO NOT** provide explanations, citations, or any extra text—**return
only the function names**.
3. Return your results as a list: [function_1, function_2, function_3,
function_4, function_5]
4. **DO NOT** repeat functions in your list.
5. List out the functions **only** for the specified left
hemisphere.

### Expected Output Format
[function_1, function_2, function_3, function_4, function_5]
"""

# Example probability prompt
probability_prompt = """
You are an expert in neuroscience literature analysis. Your task is to
estimate the probability that the brain region **amygdala** in the **
left hemisphere** of the **human** brain is involved in **
fear processing**.

These functions should be based on a simulated review of neuroscience
literature, reflecting how frequently these functions are associated with
this brain region across **peer-reviewed** studies, textbooks, and
reputable sources.

### Guidelines
1. Consider only **human**-specific neuroscience literature. Do
**not** use data from other species.
2. The probability should be a **single decimal number** between **0 and
1**.
3. This number should **approximate the relative frequency** with which
this function is linked to the given brain region in literature.
4. **DO NOT** provide explanations, citations, or any extra text—**return
only the probability value**.

### Expected Output Format
0.XX
"""

In [3]:
# 1. Analyze the cost of a single prompt
analyze_prompt_cost(function_prompt, expected_output_length=100)

# 2. Analyze the cost of an entire pipeline
# For function analysis (all species, 34 regions per species, both hemispheres)
analyze_pipeline_cost(
    prompt_text=function_prompt,
    analysis_type="functions",
    species_count=3,                # human, macaque, mouse
    regions_per_species=34,         # 34 regions per species
    hemisphere_separation=True,     # Left and right hemispheres separately
    expected_output_length=100      # Expected output is about 100 characters
)

# For probability analysis (all species, 34 regions, both hemispheres, 5 functions)
analyze_pipeline_cost(
    prompt_text=probability_prompt,
    analysis_type="probability",
    species_count=3,                # human, macaque, mouse
    regions_per_species=34,         # 34 regions per species
    hemisphere_separation=True,     # Left and right hemispheres separately
    functions_count=5,              # Analyzing 5 different functions
    expected_output_length=4        # Expected output is a probability (0.XX)
)

## Single Prompt Cost Analysis

Unnamed: 0,Model,Input Tokens,Output Tokens,Input Cost ($),Output Cost ($),Total Cost ($)
0,GPT-4o-Mini,235,25,$0.00003525,$0.00001500,$0.00005025
1,Claude 3.7 Sonnet,235,25,$0.00070500,$0.00037500,$0.00108000
2,Gemini 2.0 Flash,235,25,$0.00002350,$0.00001000,$0.00003350
3,Qwen 2.5 7B Instruct,235,25,$0.00007050,$0.00000750,$0.00007800
4,Mistral 7B Instruct,235,25,$0.00004700,$0.00000500,$0.00005200
5,LLama 3.3 70B Instruct Turbo,235,25,$0.00020680,$0.00002200,$0.00022880
6,DeepSeek R1 Distilled Llama 70B,235,25,$0.00047000,$0.00005000,$0.00052000


## Pipeline Cost Analysis for Functions

Unnamed: 0,Model,Cost per Prompt ($),Total Prompts,Total Cost ($),Total Input Tokens,Total Output Tokens
0,GPT-4o-Mini,$0.00005025,204,$0.01025100,47940,5100
1,Claude 3.7 Sonnet,$0.00108000,204,$0.22032000,47940,5100
2,Gemini 2.0 Flash,$0.00003350,204,$0.00683400,47940,5100
3,Qwen 2.5 7B Instruct,$0.00007800,204,$0.01591200,47940,5100
4,Mistral 7B Instruct,$0.00005200,204,$0.01060800,47940,5100
5,LLama 3.3 70B Instruct Turbo,$0.00022880,204,$0.04667520,47940,5100
6,DeepSeek R1 Distilled Llama 70B,$0.00052000,204,$0.10608000,47940,5100


## Pipeline Cost Analysis for Probability

Unnamed: 0,Model,Cost per Prompt ($),Total Prompts,Total Cost ($),Total Input Tokens,Total Output Tokens
0,GPT-4o-Mini,$0.00003510,1020,$0.03580200,234600,1020
1,Claude 3.7 Sonnet,$0.00070500,1020,$0.71910000,234600,1020
2,Gemini 2.0 Flash,$0.00002340,1020,$0.02386800,234600,1020
3,Qwen 2.5 7B Instruct,$0.00006930,1020,$0.07068600,234600,1020
4,Mistral 7B Instruct,$0.00004620,1020,$0.04712400,234600,1020
5,LLama 3.3 70B Instruct Turbo,$0.00020328,1020,$0.20734560,234600,1020
6,DeepSeek R1 Distilled Llama 70B,$0.00046200,1020,$0.47124000,234600,1020
