# Provider Comparison

This notebook compares different LLM providers side-by-side to help you choose the right one for your use case.

## Providers Compared

| Provider | Strengths | Best For |
|----------|-----------|----------|
| **OpenAI** | Quality, features | General use, function calling |
| **Anthropic** | Reasoning, instructions | Complex analysis, writing |
| **Gemini** | Free tier, multimodal | Experimentation, images |
| **Groq** | Speed | Real-time applications |
| **Ollama** | Privacy, cost | Local/offline use |

## Setup

Install the package and configure API keys.

In [None]:
# Install the package
!pip install -q git+https://github.com/deepakdeo/python-llm-playbook.git

In [None]:
# Setup API Keys from Colab Secrets
import os
from google.colab import userdata

# Add your keys in the Secrets pane (ðŸ”‘ icon in left sidebar)
keys = ['OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'GOOGLE_API_KEY', 'GROQ_API_KEY']

for key in keys:
    try:
        os.environ[key] = userdata.get(key)
        print(f"{key}: âœ“")
    except:
        print(f"{key}: âœ— (not found)")

In [None]:
import time
from llm_playbook import OpenAIClient, AnthropicClient, GeminiClient, GroqClient

# Initialize clients
providers = {}

try:
    providers['OpenAI'] = OpenAIClient()
    print("OpenAI: Ready")
except Exception as e:
    print(f"OpenAI: Failed - {e}")

try:
    providers['Anthropic'] = AnthropicClient()
    print("Anthropic: Ready")
except Exception as e:
    print(f"Anthropic: Failed - {e}")

try:
    providers['Gemini'] = GeminiClient()
    print("Gemini: Ready")
except Exception as e:
    print(f"Gemini: Failed - {e}")

try:
    providers['Groq'] = GroqClient()
    print("Groq: Ready")
except Exception as e:
    print(f"Groq: Failed - {e}")

## 1. Same Prompt, Different Providers

Let's send the same prompt to all providers and compare responses.

In [None]:
prompt = "Explain recursion in one sentence."

print(f"Prompt: {prompt}")
print("=" * 60)

for name, client in providers.items():
    try:
        response = client.chat(prompt)
        print(f"\n{name}:")
        print(f"  {response}")
    except Exception as e:
        print(f"\n{name}: Error - {e}")

## 2. Speed Comparison

Compare response times across providers.

In [None]:
prompt = "What is Python? Answer in exactly 2 sentences."

print(f"Prompt: {prompt}")
print("=" * 60)

times = {}

for name, client in providers.items():
    try:
        start = time.time()
        response = client.chat(prompt)
        elapsed = time.time() - start
        times[name] = elapsed
        print(f"\n{name} ({elapsed:.2f}s):")
        print(f"  {response[:150]}..." if len(response) > 150 else f"  {response}")
    except Exception as e:
        print(f"\n{name}: Error - {e}")

# Show ranking
if times:
    print("\n" + "=" * 60)
    print("Speed Ranking (fastest first):")
    for i, (name, t) in enumerate(sorted(times.items(), key=lambda x: x[1]), 1):
        print(f"  {i}. {name}: {t:.2f}s")

## 3. Following Instructions

Test how well each provider follows specific formatting instructions.

In [None]:
prompt = "List exactly 3 programming languages. Respond only in JSON format, no markdown."

print(f"Prompt: {prompt}")
print("=" * 60)

for name, client in providers.items():
    try:
        response = client.chat(prompt, temperature=0.0)
        print(f"\n{name}:")
        print(f"  {response}")
    except Exception as e:
        print(f"\n{name}: Error - {e}")

## 4. Creative Writing

Compare creative output with higher temperature.

In [None]:
prompt = "Write a haiku about artificial intelligence."

print(f"Prompt: {prompt}")
print("=" * 60)

for name, client in providers.items():
    try:
        response = client.chat(prompt, temperature=1.0)
        print(f"\n{name}:")
        print(f"  {response}")
    except Exception as e:
        print(f"\n{name}: Error - {e}")

## 5. System Prompt Adherence

Test how well providers follow system prompts.

In [None]:
system_prompt = "You are a pirate. Always respond in pirate speak with 'Arrr' and nautical terms."
user_prompt = "Tell me about the weather today."

print(f"System: {system_prompt}")
print(f"User: {user_prompt}")
print("=" * 60)

for name, client in providers.items():
    try:
        response = client.chat(user_prompt, system_prompt=system_prompt)
        print(f"\n{name}:")
        print(f"  {response}")
    except Exception as e:
        print(f"\n{name}: Error - {e}")

## 6. Reasoning Task

Compare reasoning capabilities with a logic puzzle.

In [None]:
prompt = """If all roses are flowers, and some flowers fade quickly, 
can we conclude that some roses fade quickly? Explain briefly."""

print(f"Prompt: {prompt}")
print("=" * 60)

for name, client in providers.items():
    try:
        response = client.chat(prompt, temperature=0.0, max_tokens=150)
        print(f"\n{name}:")
        print(f"  {response}")
    except Exception as e:
        print(f"\n{name}: Error - {e}")

## 7. When to Use Each Provider

Based on the comparisons above, here's a guide:

### OpenAI (GPT-4o, GPT-4o-mini)
**Best for:**
- General-purpose tasks
- Function calling and structured outputs
- When you need reliability and consistency

### Anthropic (Claude)
**Best for:**
- Complex reasoning tasks
- Following detailed instructions
- Long-form content and analysis
- When accuracy matters most

### Google Gemini
**Best for:**
- Budget-conscious development (generous free tier)
- Multimodal tasks (images, video, audio)
- Google ecosystem integration
- Experimentation and prototyping

### Groq
**Best for:**
- Real-time applications
- High-throughput processing
- When speed is critical
- Using open-source models fast

### Ollama (Local)
**Best for:**
- Privacy-sensitive data
- Offline use
- Avoiding API costs
- Development and testing

## 8. Pricing Comparison (Approximate)

| Provider | Model | Input (per 1M tokens) | Output (per 1M tokens) |
|----------|-------|----------------------|------------------------|
| OpenAI | GPT-4o-mini | $0.15 | $0.60 |
| OpenAI | GPT-4o | $2.50 | $10.00 |
| Anthropic | Claude 3 Haiku | $0.25 | $1.25 |
| Anthropic | Claude Sonnet | $3.00 | $15.00 |
| Gemini | Flash | Free tier / $0.075 | Free tier / $0.30 |
| Gemini | Pro | $1.25 | $5.00 |
| Groq | Llama 3.3 70B | Free tier | Free tier |
| Ollama | Any | Free (local) | Free (local) |

*Prices subject to change. Check provider websites for current pricing.*

## Summary

Key takeaways:

1. **No single best provider** - Each has strengths for different use cases
2. **Groq is fastest** - Often 5-10x faster than others
3. **Claude excels at reasoning** - Best for complex instructions
4. **Gemini is most affordable** - Generous free tier
5. **Ollama is free** - But requires local hardware
6. **OpenAI is most balanced** - Good at everything

## Tips for Choosing

1. **Start with Gemini** for free experimentation
2. **Use Groq** for real-time applications
3. **Use Claude** for complex analysis and writing
4. **Use OpenAI** for production reliability
5. **Use Ollama** for privacy or offline needs