# Model Comparison: Local vs Frontier Models

**Goal:** Compare different LLM models to understand tradeoffs

**Models tested:**
- Local (Ollama): llama3.2:1b, llama3.2, deepseek-r1:1.5b
- Frontier (OpenAI): gpt-4o-mini

**Focus:** Quality, speed, cost, privacy tradeoffs


In [None]:
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI
import time
import os
from dotenv import load_dotenv

# Test content (same for all models)
test_url = "https://ollama.com"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

# Scrape test content once
response = requests.get(test_url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
if soup.body:
    for irrelevant in soup.body(["script", "style", "img", "input"]):
        irrelevant.decompose()
    test_content = soup.body.get_text(separator="\n", strip=True)[:2000]
else:
    test_content = ""

print(f"Test content length: {len(test_content)} chars")


## Test Prompt

Same prompt for all models to ensure fair comparison.


In [None]:
system_prompt = "You are an assistant that provides concise summaries. Respond in markdown."
user_prompt = f"Summarize this website content:\n\n{test_content}"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]


## Local Models (Ollama)


In [None]:
OLLAMA_BASE_URL = "http://localhost:11434/v1"
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')

local_models = {
    "llama3.2:1b": "Fastest, smallest (1B params)",
    "llama3.2": "Balanced (3B params)",
    "deepseek-r1:1.5b": "Better reasoning (1.5B params)"
}

results_local = {}

for model_name, description in local_models.items():
    print(f"\n{'='*60}")
    print(f"Testing: {model_name}")
    print(f"Description: {description}")
    print(f"{'='*60}")
    
    try:
        start = time.time()
        response = ollama.chat.completions.create(
            model=model_name,
            messages=messages
        )
        elapsed = time.time() - start
        
        summary = response.choices[0].message.content
        results_local[model_name] = {
            "summary": summary,
            "time": elapsed,
            "tokens": getattr(response.usage, 'total_tokens', 'N/A')
        }
        
        print(f"Time: {elapsed:.2f}s")
        display(Markdown(summary))
        
    except Exception as e:
        print(f"Error: {e}")
        results_local[model_name] = {"error": str(e)}


## Frontier Model (OpenAI) - Optional


In [None]:
load_dotenv()
openai_key = os.getenv('OPENAI_API_KEY')

if openai_key:
    print(f"\n{'='*60}")
    print("Testing: gpt-4o-mini (Frontier)")
    print("Description: OpenAI frontier model, cloud-based")
    print(f"{'='*60}")
    
    try:
        openai_client = OpenAI()
        start = time.time()
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages
        )
        elapsed = time.time() - start
        
        summary = response.choices[0].message.content
        print(f"Time: {elapsed:.2f}s")
        print(f"Tokens: {response.usage.total_tokens}")
        display(Markdown(summary))
        
    except Exception as e:
        print(f"Error: {e}")
else:
    print("OpenAI API key not found - skipping frontier model comparison")


## Comparison Summary

**Key Tradeoffs:**

| Model Type | Quality | Cost | Privacy | Speed | Use Case |
|------------|---------|------|---------|-------|----------|
| Frontier (OpenAI) | High | Paid | Cloud | Fast | Production, quality-critical |
| Local (Ollama) | Medium | Free | Local | Variable | Privacy, experimentation, cost-sensitive |

**Model Size Tradeoffs:**
- Smaller models (1B): Faster, less capable, lower memory
- Larger models (3B+): Slower, more capable, higher memory
- Reasoning models: Better logic, may be slower

**Pattern:** Choose model based on requirements (quality vs cost vs privacy vs speed)
