# Model Selection & Parameter Tuning

This notebook explores how to select the appropriate models for different tasks and how to tune parameters to get the best results. Topics include:

- Comparing different models (capabilities/costs)
- Experimenting with temperature, top-p, and other parameters
- Understanding the effects of system prompts on model behavior
- Balancing between deterministic and creative outputs

## 1. Setup and Imports

First, let's import the necessary libraries and set up our environment:

In [None]:
import os
import json
import time
import sys
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display, HTML, clear_output, Markdown
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Import our custom utilities
sys.path.append('.')
from api_utils import (
    call_openrouter,
    extract_text_response,
    get_available_models,
    estimate_cost
)
from token_counter import count_tokens, count_message_tokens

Let's verify that our API key is loaded correctly:

In [None]:
# Verify that the API key is loaded
api_key = os.getenv("OPENROUTER_API_KEY")
if api_key:
    print("✅ API key loaded successfully!")
    # Show first and last three characters for verification
    masked_key = f"{api_key[:3]}...{api_key[-3:]}" if len(api_key) > 6 else "[key too short]"
    print(f"API key: {masked_key}")
else:
    print("❌ API key not found! Make sure you've created a .env file with your OPENROUTER_API_KEY.")

## 2. Model Comparison Framework

Let's create a framework to compare different models on the same prompts, so we can analyze their capabilities, performance, and cost-effectiveness.

In [None]:
def compare_models(prompt, models, system_prompt=None, temperature=0.7, max_tokens=300):
    """Compare responses from multiple models on the same prompt.
    
    Args:
        prompt: The prompt to send to the models
        models: List of model names to compare
        system_prompt: Optional system prompt
        temperature: Temperature setting (0-1)
        max_tokens: Maximum tokens in the response
        
    Returns:
        DataFrame with model responses and metrics
    """
    results = []
    
    # Format the prompt as a message if it's a string
    if isinstance(prompt, str):
        messages = [{"role": "user", "content": prompt}]
        if system_prompt:
            messages.insert(0, {"role": "system", "content": system_prompt})
    else:
        # Assume it's already in message format
        messages = prompt
    
    # Calculate input tokens
    input_tokens = count_message_tokens(messages)
    
    for model in models:
        print(f"📊 Testing model: {model}")
        start_time = time.time()
        
        # Call the API
        response = call_openrouter(
            prompt=messages,
            model=model,
            temperature=temperature,
            max_tokens=max_tokens
        )
        
        end_time = time.time()
        duration = end_time - start_time
        
        if response.get("success", False):
            text_response = extract_text_response(response)
            
            # Calculate output tokens
            output_tokens = count_tokens(text_response)
            
            # Estimate cost
            cost = estimate_cost(model, input_tokens, output_tokens)
            
            results.append({
                "model": model,
                "response": text_response,
                "duration": round(duration, 2),
                "input_tokens": input_tokens,
                "output_tokens": output_tokens,
                "total_tokens": input_tokens + output_tokens,
                "cost_usd": cost
            })
        else:
            results.append({
                "model": model,
                "response": f"Error: {response.get('error', 'Unknown error')}",
                "duration": round(duration, 2),
                "input_tokens": input_tokens,
                "output_tokens": 0,
                "total_tokens": input_tokens,
                "cost_usd": 0
            })
    
    # Create DataFrame from results
    df = pd.DataFrame(results)
    return df

def display_comparison(df, show_metrics=True):
    """Display the model comparison results in a readable format."""
    for idx, row in df.iterrows():
        model = row['model']
        response = row['response']
        
        # Create markdown content
        content = f"## Model: {model}\n\n"
        
        if show_metrics:
            content += f"**Metrics:** Time: {row['duration']}s | Input tokens: {row['input_tokens']} | Output tokens: {row['output_tokens']}\n\n"
            content += f"**Total tokens:** {row['total_tokens']} | **Estimated cost:** ${row['cost_usd']:.6f}\n\n"
        
        content += f"**Response:**\n\n{response}\n\n"
        content += "---\n\n"
        
        display(Markdown(content))

### 2.1 Compare Different Models

Let's compare some popular models on the same prompt to see how they differ in terms of quality, speed, and cost:

In [None]:
# Define models to compare
models_to_compare = [
    "openai/gpt-4o-mini-2024-07-18",  # Cost-efficient GPT-4o mini
    "openai/gpt-4o-2024-08-06",       # Full GPT-4o
    "anthropic/claude-3.5-haiku",  # Fastest Claude model
    "google/gemini-2.5-flash-preview-05-20"         # Fast Gemini model
]

# Test prompt
test_prompt = "Explain the concept of neural networks to a high school student"

# Compare models
comparison_results = compare_models(
    prompt=test_prompt,
    models=models_to_compare,
    temperature=0.7,
    max_tokens=300
)

# Display results
display_comparison(comparison_results)

### 2.2 Visualize Performance Metrics

Let's create some visualizations to better understand the differences between models:

In [None]:
def plot_model_metrics(df):
    """Plot performance metrics for compared models."""
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Plot response times
    axes[0, 0].bar(df['model'], df['duration'], color='skyblue')
    axes[0, 0].set_title('Response Time (seconds)')
    axes[0, 0].set_xticklabels(df['model'], rotation=45, ha='right')
    axes[0, 0].grid(axis='y', linestyle='--', alpha=0.7)
    
    # Plot token counts
    axes[0, 1].bar(df['model'], df['input_tokens'], label='Input Tokens', color='lightgreen')
    axes[0, 1].bar(df['model'], df['output_tokens'], bottom=df['input_tokens'], label='Output Tokens', color='coral')
    axes[0, 1].set_title('Token Usage')
    axes[0, 1].set_xticklabels(df['model'], rotation=45, ha='right')
    axes[0, 1].legend()
    axes[0, 1].grid(axis='y', linestyle='--', alpha=0.7)
    
    # Plot cost
    axes[1, 0].bar(df['model'], df['cost_usd'], color='purple')
    axes[1, 0].set_title('Estimated Cost (USD)')
    axes[1, 0].set_xticklabels(df['model'], rotation=45, ha='right')
    axes[1, 0].grid(axis='y', linestyle='--', alpha=0.7)
    
    # Plot tokens per second
    tokens_per_second = df['output_tokens'] / df['duration']
    axes[1, 1].bar(df['model'], tokens_per_second, color='orange')
    axes[1, 1].set_title('Output Tokens per Second')
    axes[1, 1].set_xticklabels(df['model'], rotation=45, ha='right')
    axes[1, 1].grid(axis='y', linestyle='--', alpha=0.7)
    
    plt.tight_layout()
    plt.show()

# Plot metrics for our comparison
plot_model_metrics(comparison_results)

## 3. Temperature and Creativity

The temperature parameter controls randomness in the model's responses. Let's experiment with different temperature settings to see how they affect creativity and determinism.

In [None]:
def compare_temperatures(prompt, model, temperatures, system_prompt=None, max_tokens=300, iterations=1):
    """Compare the same model's responses at different temperature settings.
    
    Args:
        prompt: The prompt to send to the model
        model: The model to use
        temperatures: List of temperature values to test
        system_prompt: Optional system prompt
        max_tokens: Maximum tokens in the response
        iterations: Number of iterations per temperature (to check consistency)
        
    Returns:
        DataFrame with responses at different temperatures
    """
    results = []
    
    # Format the prompt as a message if it's a string
    if isinstance(prompt, str):
        messages = [{"role": "user", "content": prompt}]
        if system_prompt:
            messages.insert(0, {"role": "system", "content": system_prompt})
    else:
        # Assume it's already in message format
        messages = prompt
    
    for temp in temperatures:
        for i in range(iterations):
            print(f"📝 Testing temperature: {temp} (iteration {i+1}/{iterations})")
            
            # Call the API
            response = call_openrouter(
                prompt=messages,
                model=model,
                temperature=temp,
                max_tokens=max_tokens
            )
            
            if response.get("success", False):
                text_response = extract_text_response(response)
                
                results.append({
                    "temperature": temp,
                    "iteration": i+1,
                    "response": text_response,
                })
            else:
                results.append({
                    "temperature": temp,
                    "iteration": i+1,
                    "response": f"Error: {response.get('error', 'Unknown error')}",
                })
    
    # Create DataFrame from results
    df = pd.DataFrame(results)
    return df

def display_temperature_comparison(df):
    """Display responses at different temperatures in a readable format."""
    # Group by temperature and get unique temperatures
    temperatures = df['temperature'].unique()
    
    for temp in temperatures:
        temp_df = df[df['temperature'] == temp]
        
        # Create markdown content
        content = f"## Temperature: {temp}\n\n"
        
        for idx, row in temp_df.iterrows():
            iteration = row['iteration']
            response = row['response']
            
            content += f"**Iteration {iteration}:**\n\n{response}\n\n"
        
        content += "---\n\n"
        display(Markdown(content))

In [None]:
# Define temperatures to test
temperatures_to_test = [0.0, 0.3, 0.7, 1.0]

# Creative writing prompt to show temperature effects clearly
creative_prompt = "Write a short, creative story about a robot discovering emotions"

# Compare responses at different temperatures
temperature_results = compare_temperatures(
    prompt=creative_prompt,
    model="openai/gpt-4o-mini-2024-07-18",  # Using a cost-effective model
    temperatures=temperatures_to_test,
    max_tokens=200,
    iterations=2  # Run each temperature twice to check consistency
)

# Display results
display_temperature_comparison(temperature_results)

### 3.1 Temperature Selection Guidelines

Based on our experiments, here are some guidelines for temperature selection:

| Temperature | Best Used For | Characteristics |
|-------------|--------------|------------------|
| 0.0-0.2 | Factual responses, coding, logical tasks | Deterministic, consistent, focused |
| 0.3-0.5 | Balanced responses, explanations | Mostly consistent with some variation |
| 0.6-0.8 | Creative writing, brainstorming | Good balance of coherence and creativity |
| 0.9-1.0 | Maximum creativity, diverse ideas | Most random, potentially unfocused |

Let's test this on a factual task:

In [None]:
# Test a factual question with different temperatures
factual_prompt = "What are three ways to prevent overfitting in machine learning models?"

factual_temperature_results = compare_temperatures(
    prompt=factual_prompt,
    model="openai/gpt-4o-mini-2024-07-18",
    temperatures=[0.0, 0.7],  # Compare low vs. medium temperature
    max_tokens=300,
    iterations=1
)

# Display results
display_temperature_comparison(factual_temperature_results)

## 4. Top-p (Nucleus Sampling)

Another parameter that controls text generation is top_p, which defines the percentage of probability mass to consider when sampling. Let's see how it affects responses.

In [None]:
def compare_top_p(prompt, model, top_p_values, temperature=0.7, system_prompt=None, max_tokens=300):
    """Compare the model's responses with different top_p settings.
    
    Args:
        prompt: The prompt to send to the model
        model: The model to use
        top_p_values: List of top_p values to test
        temperature: Temperature setting
        system_prompt: Optional system prompt
        max_tokens: Maximum tokens in the response
        
    Returns:
        DataFrame with responses at different top_p values
    """
    results = []
    
    # Format the prompt as a message if it's a string
    if isinstance(prompt, str):
        messages = [{"role": "user", "content": prompt}]
        if system_prompt:
            messages.insert(0, {"role": "system", "content": system_prompt})
    else:
        # Assume it's already in message format
        messages = prompt
    
    for top_p in top_p_values:
        print(f"📊 Testing top_p: {top_p}")
        
        # Call the API
        response = call_openrouter(
            prompt=messages,
            model=model,
            temperature=temperature,
            top_p=top_p,
            max_tokens=max_tokens
        )
        
        if response.get("success", False):
            text_response = extract_text_response(response)
            
            results.append({
                "top_p": top_p,
                "response": text_response,
            })
        else:
            results.append({
                "top_p": top_p,
                "response": f"Error: {response.get('error', 'Unknown error')}",
            })
    
    # Create DataFrame from results
    df = pd.DataFrame(results)
    return df

def display_top_p_comparison(df):
    """Display responses at different top_p values in a readable format."""
    for idx, row in df.iterrows():
        top_p = row['top_p']
        response = row['response']
        
        # Create markdown content
        content = f"## Top-p: {top_p}\n\n"
        content += f"**Response:**\n\n{response}\n\n"
        content += "---\n\n"
        
        display(Markdown(content))

In [None]:
# Define top_p values to test
top_p_values = [0.1, 0.5, 0.9]

# Creative prompt to showcase top_p effects
creative_prompt = "Come up with five unusual uses for a paperclip"

# Compare responses with different top_p values
top_p_results = compare_top_p(
    prompt=creative_prompt,
    model="openai/gpt-4o-mini-2024-07-18",
    top_p_values=top_p_values,
    temperature=0.7,  # Keep temperature constant
    max_tokens=300
)

# Display results
display_top_p_comparison(top_p_results)

### 4.1 Temperature vs. Top-p

| Parameter | Function | When to Use | Effect of Low Value | Effect of High Value |
|-----------|----------|-------------|---------------------|----------------------|
| Temperature | Controls randomness directly | Primary control for creativity | Deterministic | Creative, diverse |
| Top-p | Controls token selection pool | Fine-tuning after setting temperature | Only most likely tokens used | Wider distribution of tokens |

In practice, most developers focus primarily on the temperature parameter and leave top-p at its default (usually 1.0).

## 5. The Impact of System Prompts

System prompts are a powerful way to guide model behavior. Let's experiment with different system prompts and see how they affect responses.

In [None]:
def compare_system_prompts(prompt, model, system_prompts, temperature=0.7, max_tokens=300):
    """Compare the model's responses with different system prompts.
    
    Args:
        prompt: The user prompt to send to the model
        model: The model to use
        system_prompts: Dictionary of {label: system_prompt}
        temperature: Temperature setting
        max_tokens: Maximum tokens in the response
        
    Returns:
        DataFrame with responses using different system prompts
    """
    results = []
    
    for label, system_prompt in system_prompts.items():
        print(f"📝 Testing system prompt: {label}")
        
        # Create messages with the system prompt
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt}
        ]
        
        # Call the API
        response = call_openrouter(
            prompt=messages,
            model=model,
            temperature=temperature,
            max_tokens=max_tokens
        )
        
        if response.get("success", False):
            text_response = extract_text_response(response)
            
            results.append({
                "label": label,
                "system_prompt": system_prompt,
                "response": text_response,
            })
        else:
            results.append({
                "label": label,
                "system_prompt": system_prompt,
                "response": f"Error: {response.get('error', 'Unknown error')}",
            })
    
    # Create DataFrame from results
    df = pd.DataFrame(results)
    return df

def display_system_prompt_comparison(df):
    """Display responses with different system prompts in a readable format."""
    for idx, row in df.iterrows():
        label = row['label']
        system_prompt = row['system_prompt']
        response = row['response']
        
        # Create markdown content
        content = f"## System Prompt: {label}\n\n"
        content += f"**Prompt text:** {system_prompt}\n\n"
        content += f"**Response:**\n\n{response}\n\n"
        content += "---\n\n"
        
        display(Markdown(content))

In [None]:
# Define a set of different system prompts to test
system_prompts = {
    "Default": "You are a helpful AI assistant.",
    "Expert": "You are an expert in machine learning with a PhD in computer science. Provide detailed technical explanations.",
    "Teacher": "You are a patient teacher explaining concepts to high school students. Use simple language and analogies.",
    "Skeptic": "You are a skeptical thinker who points out limitations and potential problems in every concept.",
    "Concise": "You provide extremely concise answers using as few words as possible."
}

# Use a prompt that can be answered differently based on the system prompt
technical_prompt = "Explain how deep learning works"

# Compare responses with different system prompts
system_prompt_results = compare_system_prompts(
    prompt=technical_prompt,
    model="openai/gpt-4o-mini-2024-07-18",
    system_prompts=system_prompts,
    temperature=0.7,
    max_tokens=300
)

# Display results
display_system_prompt_comparison(system_prompt_results)

### 5.1 System Prompt Design Principles

Based on our experiments, here are some principles for effective system prompt design:

1. **Be specific about the role** - Define who the AI is supposed to be (expert, teacher, etc.)
2. **Set clear constraints** - Mention limitations or restrictions (word count, style, etc.)
3. **Define the format** - Specify how the response should be structured
4. **Outline the tone** - Indicate the desired tone (formal, casual, enthusiastic)
5. **Provide context** - Give relevant background information

Let's test a more comprehensive system prompt:

In [None]:
# Advanced system prompt example
advanced_system_prompt = """
You are a financial advisor with 15+ years of experience specializing in retirement planning. 
Follow these guidelines:
1. Provide personalized advice based on the information given
2. Always consider multiple strategies and explain tradeoffs
3. Include specific numbers and calculations when relevant
4. Use plain language, avoiding jargon when possible
5. Structure your response with clear headings
6. Always mention tax implications
7. End with 2-3 actionable next steps

Your goal is to help the user make informed financial decisions without overwhelming them.
"""

# Financial question
financial_prompt = "I'm 35 years old and want to retire by 60. I currently have $50,000 saved and can contribute $1,000 monthly. What strategies should I consider?"

# Test the advanced system prompt
advanced_prompt_response = call_openrouter(
    prompt=financial_prompt,
    model="openai/gpt-4o-mini-2024-07-18",
    system_prompt=advanced_system_prompt,
    temperature=0.7,
    max_tokens=700
)

if advanced_prompt_response.get("success", False):
    financial_answer = extract_text_response(advanced_prompt_response)
    display(Markdown(f"## Advanced System Prompt Response\n\n{financial_answer}"))
else:
    print(f"Error: {advanced_prompt_response.get('error', 'Unknown error')}")

## 6. Other Parameters

There are several other parameters that can influence model outputs. Let's explore some of them:

### 6.1 Frequency Penalty and Presence Penalty

These parameters control repetition in the model's responses:

- **Frequency Penalty**: Reduces repetition of specific tokens
- **Presence Penalty**: Reduces repetition of topics or concepts

In [None]:
def test_repetition_parameters(prompt, model):
    """Test how frequency and presence penalties affect repetition."""
    parameter_sets = [
        {"frequency_penalty": 0.0, "presence_penalty": 0.0, "label": "No penalties"},
        {"frequency_penalty": 1.0, "presence_penalty": 0.0, "label": "High frequency penalty"},
        {"frequency_penalty": 0.0, "presence_penalty": 1.0, "label": "High presence penalty"},
        {"frequency_penalty": 1.0, "presence_penalty": 1.0, "label": "Both penalties high"}
    ]
    
    results = []
    
    for params in parameter_sets:
        print(f"📝 Testing: {params['label']}")
        
        # Call the API with specified parameters
        response = call_openrouter(
            prompt=prompt,
            model=model,
            temperature=0.7,
            max_tokens=300,
            frequency_penalty=params["frequency_penalty"],
            presence_penalty=params["presence_penalty"]
        )
        
        if response.get("success", False):
            text_response = extract_text_response(response)
            
            results.append({
                "label": params['label'],
                "frequency_penalty": params["frequency_penalty"],
                "presence_penalty": params["presence_penalty"],
                "response": text_response
            })
        else:
            results.append({
                "label": params['label'],
                "frequency_penalty": params["frequency_penalty"],
                "presence_penalty": params["presence_penalty"],
                "response": f"Error: {response.get('error', 'Unknown error')}"
            })
    
    # Create DataFrame from results
    df = pd.DataFrame(results)
    return df

# Prompt designed to potentially cause repetition
repetition_prompt = "List 10 synonyms for 'good'"

# Test repetition parameters
repetition_results = test_repetition_parameters(
    prompt=repetition_prompt,
    model="openai/gpt-4o-mini-2024-07-18"
)

# Display results
for idx, row in repetition_results.iterrows():
    print(f"\n{'=' * 80}")
    print(f"{row['label']} (frequency={row['frequency_penalty']}, presence={row['presence_penalty']})")
    print(f"\nResponse:\n{row['response']}")
    print(f"{'=' * 80}")

## 7. Deterministic vs. Creative Outputs

Different tasks require different levels of creativity. Let's compare approaches for deterministic vs. creative outputs.

In [None]:
# Settings for deterministic output
deterministic_settings = {
    "temperature": 0.0,
    "top_p": 1.0,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0,
    "system_prompt": "You are a precise, deterministic assistant. Provide consistent, factual responses."
}

# Settings for creative output
creative_settings = {
    "temperature": 0.9,
    "top_p": 0.9,
    "frequency_penalty": 0.5,
    "presence_penalty": 0.5,
    "system_prompt": "You are a creative, imaginative assistant. Think outside the box and provide diverse, unique perspectives."
}

# Test prompts for different types of tasks
test_prompts = {
    "Factual": "What are the key components of a computer's CPU?",
    "Creative": "Imagine a world where humans can communicate with plants. Describe what a day might be like."
}

# Test each prompt with both settings
for prompt_type, prompt in test_prompts.items():
    print(f"\n{'=' * 100}")
    print(f"Testing {prompt_type} prompt: {prompt}")
    print(f"{'=' * 100}")
    
    # Deterministic response
    print("\nDeterministic settings:")
    deterministic_response = call_openrouter(
        prompt=prompt,
        model="openai/gpt-4o-mini-2024-07-18",
        temperature=deterministic_settings["temperature"],
        top_p=deterministic_settings["top_p"],
        frequency_penalty=deterministic_settings["frequency_penalty"],
        presence_penalty=deterministic_settings["presence_penalty"],
        system_prompt=deterministic_settings["system_prompt"],
        max_tokens=300
    )
    
    if deterministic_response.get("success", False):
        print(extract_text_response(deterministic_response))
    else:
        print(f"Error: {deterministic_response.get('error', 'Unknown error')}")
    
    # Creative response
    print("\nCreative settings:")
    creative_response = call_openrouter(
        prompt=prompt,
        model="openai/gpt-4o-mini-2024-07-18",
        temperature=creative_settings["temperature"],
        top_p=creative_settings["top_p"],
        frequency_penalty=creative_settings["frequency_penalty"],
        presence_penalty=creative_settings["presence_penalty"],
        system_prompt=creative_settings["system_prompt"],
        max_tokens=300
    )
    
    if creative_response.get("success", False):
        print(extract_text_response(creative_response))
    else:
        print(f"Error: {creative_response.get('error', 'Unknown error')}")

## 8. Parameter Selection Guide

Based on our experiments, here's a guide for choosing parameters for different types of tasks:

In [None]:
# Create a parameter selection guide as an HTML table with latest 2025 models
parameter_guide = """
<table style="width:100%; border-collapse: collapse; border: 1px solid #ddd;">
  <tr style="background-color: #f2f2f2;">
    <th style="padding: 12px; border: 1px solid #ddd;">Task Type</th>
    <th style="padding: 12px; border: 1px solid #ddd;">Temperature</th>
    <th style="padding: 12px; border: 1px solid #ddd;">Top-p</th>
    <th style="padding: 12px; border: 1px solid #ddd;">Frequency Penalty</th>
    <th style="padding: 12px; border: 1px solid #ddd;">Presence Penalty</th>
    <th style="padding: 12px; border: 1px solid #ddd;">System Prompt Focus</th>
    <th style="padding: 12px; border: 1px solid #ddd;">Recommended Models (2025)</th>
  </tr>
  <tr>
    <td style="padding: 8px; border: 1px solid #ddd;">Factual Q&A</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.0-0.3</td>
    <td style="padding: 8px; border: 1px solid #ddd;">1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Accuracy, precision, factuality</td>
    <td style="padding: 8px; border: 1px solid #ddd;">GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash</td>
  </tr>
  <tr style="background-color: #f9f9f9;">
    <td style="padding: 8px; border: 1px solid #ddd;">Code Generation</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.0-0.2</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.9-1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Code correctness, documentation</td>
    <td style="padding: 8px; border: 1px solid #ddd;">GPT-4o, DeepSeek R1, Claude 3.5 Sonnet</td>
  </tr>
  <tr>
    <td style="padding: 8px; border: 1px solid #ddd;">Tutorials/Explanations</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.3-0.5</td>
    <td style="padding: 8px; border: 1px solid #ddd;">1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.1-0.3</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.1-0.3</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Clarity, examples, step-by-step</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Claude 3.5 Sonnet, GPT-4o, Gemini 1.5 Pro</td>
  </tr>
  <tr style="background-color: #f9f9f9;">
    <td style="padding: 8px; border: 1px solid #ddd;">Creative Writing</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.7-0.9</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.9-1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.5-0.8</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.5-0.8</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Creativity, style, tone</td>
    <td style="padding: 8px; border: 1px solid #ddd;">GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro</td>
  </tr>
  <tr>
    <td style="padding: 8px; border: 1px solid #ddd;">Brainstorming</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.8-1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.9-1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.8-1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.8-1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Diverse ideas, out-of-box thinking</td>
    <td style="padding: 8px; border: 1px solid #ddd;">GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash</td>
  </tr>
  <tr style="background-color: #f9f9f9;">
    <td style="padding: 8px; border: 1px solid #ddd;">Summarization</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.3-0.5</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.9-1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.3-0.5</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.3-0.5</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Conciseness, key points</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Claude 3.5 Haiku, GPT-4o mini, Gemini 1.5 Flash</td>
  </tr>
  <tr>
    <td style="padding: 8px; border: 1px solid #ddd;">Conversational Chat</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.5-0.7</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.9-1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.5</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.5</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Personality, engagement</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Claude 3.5 Haiku, Gemini 2.0 Flash, GPT-4o mini</td>
  </tr>
  <tr style="background-color: #f9f9f9;">
    <td style="padding: 8px; border: 1px solid #ddd;">Mathematical/Reasoning</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.0-0.2</td>
    <td style="padding: 8px; border: 1px solid #ddd;">1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Logical reasoning, step-by-step</td>
    <td style="padding: 8px; border: 1px solid #ddd;">DeepSeek R1, GPT-4o, Claude 3.5 Sonnet</td>
  </tr>
  <tr>
    <td style="padding: 8px; border: 1px solid #ddd;">Long Document Analysis</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.3-0.5</td>
    <td style="padding: 8px; border: 1px solid #ddd;">1.0</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.2-0.4</td>
    <td style="padding: 8px; border: 1px solid #ddd;">0.2-0.4</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Comprehensive analysis, context retention</td>
    <td style="padding: 8px; border: 1px solid #ddd;">Gemini 1.5 Pro, Claude 3.5 Sonnet, GPT-4o</td>
  </tr>
</table>
"""

# Display the guide
display(HTML(parameter_guide))

## 9. Real-world Application: Customized Content Generator

Let's apply what we've learned to create a customized content generator where users can select parameters based on their needs.

In [None]:
def generate_customized_content(topic, content_type, audience, tone, model="openai/gpt-4o-mini-2024-07-18"):
    """Generate customized content based on user preferences.
    
    Args:
        topic: The topic to generate content about
        content_type: Type of content (article, social post, email, etc.)
        audience: Target audience (beginners, experts, children, etc.)
        tone: Desired tone (formal, casual, humorous, etc.)
        model: Model to use
        
    Returns:
        Generated content
    """
    # Select parameters based on content type
    params = {
        "article": {"temperature": 0.4, "presence_penalty": 0.3, "frequency_penalty": 0.3},
        "social post": {"temperature": 0.7, "presence_penalty": 0.5, "frequency_penalty": 0.5},
        "email": {"temperature": 0.3, "presence_penalty": 0.2, "frequency_penalty": 0.2},
        "tutorial": {"temperature": 0.3, "presence_penalty": 0.1, "frequency_penalty": 0.1},
        "creative": {"temperature": 0.8, "presence_penalty": 0.6, "frequency_penalty": 0.6}
    }
    
    # Use default parameters if content type not found
    content_params = params.get(content_type.lower(), {"temperature": 0.7, "presence_penalty": 0.0, "frequency_penalty": 0.0})
    
    # Create system prompt based on inputs
    system_prompt = f"""
    You are an expert content creator specializing in {content_type.lower()}s.
    Create content about '{topic}' for a {audience.lower()} audience.
    Use a {tone.lower()} tone throughout.
    Format the content appropriately for a {content_type.lower()}.
    Ensure the content is engaging, accurate, and valuable to the reader.
    """
    
    # Create user prompt
    user_prompt = f"Create a {content_type.lower()} about {topic} for {audience.lower()} with a {tone.lower()} tone."
    
    # Generate content
    response = call_openrouter(
        prompt=user_prompt,
        model=model,
        system_prompt=system_prompt,
        temperature=content_params["temperature"],
        frequency_penalty=content_params["frequency_penalty"],
        presence_penalty=content_params["presence_penalty"],
        max_tokens=700
    )
    
    if response.get("success", False):
        return extract_text_response(response)
    else:
        return f"Error: {response.get('error', 'Unknown error')}"

# Example usage
topic = "artificial intelligence ethics"
content_type = "article"
audience = "college students"
tone = "informative"

print(f"Generating {tone} {content_type} about '{topic}' for {audience}...\n")
content = generate_customized_content(topic, content_type, audience, tone)
print(content)

## 10. Exercises

Here are some exercises to practice what you've learned:

1. Create a function that tests different temperature values on the same prompt and calculates a "creativity score" based on lexical diversity or uniqueness of the responses

2. Design a system prompt optimization tool that tries several variations of a system prompt and evaluates which one better achieves a specific goal

3. Implement a function that dynamically adjusts model parameters based on user feedback (e.g., "make it more creative" or "make it more factual")

4. Create a specialized parameter preset for a specific domain (e.g., medical explanations, technical tutorials, or marketing copy) and test it against generic parameters

5. Implement a cost optimization function that selects the most cost-effective model for a given task based on performance metrics

## 11. Summary

In this notebook, we've explored:

- How to compare different models for capability and cost-effectiveness
- The impact of temperature on model creativity and determinism
- How top-p (nucleus sampling) affects token selection
- The powerful effect of system prompts on model behavior
- Using frequency and presence penalties to reduce repetition
- Parameter selection guidelines for different types of tasks
- Building a customized content generator with appropriate parameters

By understanding and effectively using these parameters, you can get the most out of LLMs for different use cases, optimizing for quality, cost, or creativity as needed.