# Model Providers: Running AI Locally with Ollama

**Free, Private, and Powerful: Your Agents on Your Machine**

---

Welcome to the world of **local AI models**! This notebook demonstrates how to run powerful AI agents entirely on your own machine using Ollama and the Strands Agents SDK. By the end of this 10-minute tutorial, you'll be able to create agents that run completely offline, cost nothing after setup, and keep your data 100% private.

### 🎯 What You'll Learn

In this hands-on tutorial, you will:
- Set up Ollama for local model execution
- Create Strands agents using local models
- Compare different open-source models
- Understand when to use local vs cloud models
- Build privacy-first AI applications
- Save money on AI development

### 🏠 Why Run Models Locally?

Running AI models locally offers several advantages:
- **💰 Cost**: Zero API fees after initial setup
- **🔒 Privacy**: Your data never leaves your machine
- **⚡ Speed**: No network latency for API calls
- **🌐 Offline**: Works without internet connection
- **🛠️ Control**: Full control over model behavior

## 📦 Pre-Setup: Installing Ollama and Required Packages

### 🚀 Installing Ollama (One-Time Setup)

Before we begin, you need to install Ollama on your system. This is a one-time setup that enables local AI model execution. Choose your platform below:

#### 🪟 Windows Installation
1. **Download the Installer**
   - Visit [ollama.com](https://ollama.com)
   - Click "Download for Windows"
   - Save the `.exe` installer

2. **Run the Installer**
   - Double-click the downloaded file
   - Follow the installation wizard
   - Ollama will install as a Windows service

3. **Verify Installation**
   - Open a new Command Prompt or PowerShell
   - Type: `ollama --version`
   - You should see the version number

#### 🍎 macOS Installation

**Option 1: Using Homebrew (Recommended)**
```bash
# If you have Homebrew installed:
brew install ollama

# Start Ollama:
ollama serve
```

**Option 2: Direct Download**
1. Visit [ollama.com](https://ollama.com)
2. Click "Download for macOS"
3. Open the downloaded `.dmg` file
4. Drag Ollama to your Applications folder
5. Launch Ollama from Applications
6. You'll see the Ollama icon in your menu bar

#### 🐧 Linux Installation

**One-Line Install (Recommended)**
```bash
curl -fsSL https://ollama.com/install.sh | sh
```

This script will:
- Download the latest Ollama binary
- Install it to `/usr/local/bin`
- Set up systemd service (on supported systems)
- Start the Ollama service

**Manual Installation**
```bash
# Download the binary
sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/local/bin/ollama

# Make it executable
sudo chmod +x /usr/local/bin/ollama

# Start Ollama
ollama serve
```

### ⏱️ Installation Time
- Download: 1-2 minutes (depending on internet speed)
- Installation: 1-2 minutes
- First model download: 3-5 minutes

**Note**: This setup time is NOT included in our 10-minute tutorial!

### 🐍 Installing Python Dependencies

Now let's install the Strands SDK with Ollama support:

In [None]:
# Install Strands with Ollama support
%pip install strands-agents -q

print("✅ Strands SDK installed successfully!")

## 🔍 Step 1: Checking Ollama Installation

### Smart Installation Checker
This cell will check if Ollama is installed on your system and provide platform-specific installation instructions if needed.

### ⏱️ Time Note
If you need to install Ollama, this is a one-time setup that doesn't count toward our 10-minute tutorial time!

In [None]:
import platform
import subprocess
import sys
import os

def check_ollama_installation():
    """Check if Ollama is installed and provide installation instructions if not."""
    try:
        # Try to run ollama version command
        result = subprocess.run(['ollama', '--version'], 
                              capture_output=True, text=True, 
                              shell=(platform.system() == 'Windows'))
        
        if result.returncode == 0:
            print("✅ Great! Ollama is installed!")
            print(f"   Version: {result.stdout.strip()}")
            return True
        else:
            raise Exception("Ollama command failed")
            
    except (subprocess.CalledProcessError, FileNotFoundError, Exception):
        print("❌ Ollama is not installed on your system.")
        print("\n📋 Installation Instructions:\n")
        
        system = platform.system()
        
        if system == "Darwin":  # macOS
            print("🍎 macOS detected. You have two options:\n")
            print("Option 1 - Using Homebrew (recommended):")
            print("   brew install ollama\n")
            print("Option 2 - Direct download:")
            print("   1. Visit https://ollama.com")
            print("   2. Download the macOS installer")
            print("   3. Run the installer")
            print("   4. Start Ollama from your Applications folder")
            
        elif system == "Linux":
            print("🐧 Linux detected. Install with this command:\n")
            print("   curl -fsSL https://ollama.com/install.sh | sh\n")
            print("After installation, Ollama will run as a service.")
            
        elif system == "Windows":
            print("🪟 Windows detected. Installation steps:\n")
            print("   1. Visit https://ollama.com")
            print("   2. Download the Windows installer")
            print("   3. Run the downloaded .exe file")
            print("   4. Follow the installation wizard")
            print("   5. Ollama will run as a Windows service")
            
        else:
            print(f"⚠️  Unknown system: {system}")
            print("   Visit https://ollama.com for installation instructions")
        
        print("\n⏸️  After installing Ollama, restart this notebook and continue!")
        return False

# Check installation
ollama_installed = check_ollama_installation()

if not ollama_installed:
    print("\n⚡ Quick tip: Installation usually takes just 2-3 minutes!")

## 🤖 Step 2: Downloading AI Models

### Smart Model Management
We'll download three popular models for our demonstrations. Don't worry - if you already have them, we won't download them again!

### 📊 Models We'll Use:
1. **Llama 3.2 (3B)** - Latest from Meta, great all-around model
2. **Mistral (7B)** - Excellent for coding and technical tasks
3. **Phi-3 Mini (3.8B)** - Microsoft's efficient model, great for quick responses

In [None]:
def check_and_pull_model(model_name):
    """Check if a model is installed, pull if not."""
    try:
        # Get list of installed models
        result = subprocess.run(['ollama', 'list'], 
                              capture_output=True, text=True,
                              shell=(platform.system() == 'Windows'))
        
        if result.returncode != 0:
            print(f"❌ Error checking models: {result.stderr}")
            return False
            
        installed_models = result.stdout.lower()
        
        # Check if model is already installed
        if model_name.lower() in installed_models:
            print(f"✅ {model_name} is already installed!")
            return True
        else:
            print(f"📥 Downloading {model_name}... (this may take a few minutes)")
            
            # Pull the model
            result = subprocess.run(['ollama', 'pull', model_name],
                                  shell=(platform.system() == 'Windows'))
            
            if result.returncode == 0:
                print(f"✅ {model_name} downloaded successfully!")
                return True
            else:
                print(f"❌ Failed to download {model_name}")
                return False
                
    except Exception as e:
        print(f"❌ Error with {model_name}: {e}")
        return False

# Models to use in this tutorial
models_to_install = [
    "llama3.2",     # Meta's latest, 3B parameters
    "mistral",      # Great for coding, 7B parameters
    "phi3:mini"     # Microsoft's efficient model, 3.8B parameters
]

print("🚀 Setting up AI models for local execution...\n")

# Check and install each model
all_models_ready = True
for model in models_to_install:
    if not check_and_pull_model(model):
        all_models_ready = False
    print()  # Add spacing

if all_models_ready:
    print("🎉 All models are ready! Let's create some agents!")
else:
    print("⚠️  Some models couldn't be installed. You can continue with the available ones.")

## 📋 Step 3: Viewing Available Models

Let's see what models are available on your system.

In [None]:
# List all available models
print("📋 Available Local Models:")
print("=" * 50)

try:
    result = subprocess.run(['ollama', 'list'], 
                          capture_output=True, text=True,
                          shell=(platform.system() == 'Windows'))
    
    if result.returncode == 0:
        print(result.stdout)
    else:
        print("❌ Could not list models")
except Exception as e:
    print(f"❌ Error: {e}")

print("\n💡 Tip: Model sizes shown are compressed. They'll use ~2x RAM when running.")

## 🚀 Step 4: Creating Your First Local Agent

### From Cloud to Local
Creating an agent with Ollama is just as easy as with cloud providers. The main difference? It's FREE and PRIVATE!

### 🔄 Provider Comparison
```python
# Cloud (AWS Bedrock)
from strands.models import BedrockModel
model = BedrockModel(model_id="claude-3")

# Local (Ollama)
from strands.models import OllamaModel
model = OllamaModel(model_id="llama3.2")
```

In [None]:
from strands import Agent
from strands.models import OllamaModel

# Create a local agent with Llama 3.2
local_agent = Agent(
    model=OllamaModel(model_id="llama3.2"),
    system_prompt="You are a helpful assistant running locally. Be concise and friendly."
)

print("🎉 Your first local AI agent is ready!")
print("   Model: Llama 3.2 (3B parameters)")
print("   Location: Running entirely on your machine")
print("   Cost: $0.00")
print("   Privacy: 100% - No data leaves your computer")

## 💬 Step 5: Your First Local Conversation

Let's test our local agent! Notice how it responds just like cloud-based models, but everything happens on your machine.

In [None]:
# Test the local agent
import time

question = "What are the benefits of running AI models locally?"

print(f"👤 You: {question}")
print("\n🤖 Local Llama 3.2 Agent:")
print("-" * 50)

start_time = time.time()
response = local_agent(question)
end_time = time.time()

print(response)
print("-" * 50)
print(f"\n⏱️  Response time: {end_time - start_time:.2f} seconds")
print("💡 Note: First response may be slower as the model loads into memory.")

## 🔄 Step 6: Model Switching - Same Code, Different Models

### The Power of Strands
One of the best features of Strands is how easy it is to switch between models. Let's create agents with different local models and see how they compare.

In [None]:
# Create agents with different models
models = {
    "llama3.2": "Meta's Llama 3.2 - Great all-around model",
    "mistral": "Mistral 7B - Excellent for technical tasks",
    "phi3:mini": "Microsoft Phi-3 - Fast and efficient"
}

agents = {}
for model_id, description in models.items():
    try:
        agents[model_id] = Agent(
            model=OllamaModel(model_id=model_id),
            system_prompt="You are a helpful AI assistant. Be concise."
        )
        print(f"✅ Created agent with {model_id}")
        print(f"   {description}")
    except Exception as e:
        print(f"❌ Could not create agent with {model_id}: {e}")

print(f"\n🎯 Successfully created {len(agents)} different local agents!")

## 📊 Step 7: Model Comparison - Side by Side

### Real-World Testing
Let's ask the same question to different models and compare their responses. This helps you choose the right model for your use case.

In [None]:
# Compare models with the same question
test_question = "Write a Python function to calculate the factorial of a number."

print(f"🔬 Test Question: {test_question}")
print("=" * 80)

results = {}

for model_name, agent in agents.items():
    print(f"\n🤖 {model_name.upper()} Response:")
    print("-" * 40)
    
    try:
        start_time = time.time()
        response = agent(test_question)
        end_time = time.time()
        
        print(response)
        
        results[model_name] = {
            "time": end_time - start_time,
            "length": len(str(response))
        }
        
        print(f"\n⏱️  Time: {results[model_name]['time']:.2f}s")
        print(f"📏 Length: {results[model_name]['length']} characters")
        
    except Exception as e:
        print(f"❌ Error: {e}")
    
    print("-" * 40)

# Summary
print("\n📊 Performance Summary:")
for model, metrics in results.items():
    print(f"   {model}: {metrics['time']:.2f}s response time")

## 🛠️ Step 8: Practical Use Cases

### When to Use Local Models
Let's explore scenarios where local models shine and create specialized agents for different use cases.

In [None]:
# Create specialized local agents for different use cases

# 1. Privacy-First Assistant (e.g., for personal notes, health data)
privacy_agent = Agent(
    model=OllamaModel(model_id="llama3.2"),
    system_prompt="""You are a private assistant for personal and sensitive information. 
    Remind users that all data stays local and private."""
)

# 2. Offline Code Assistant
code_agent = Agent(
    model=OllamaModel(model_id="mistral"),
    system_prompt="""You are a coding assistant that works offline. 
    Provide clear, concise code examples and explanations."""
)

# 3. Quick Response Agent (for real-time applications)
quick_agent = Agent(
    model=OllamaModel(model_id="phi3:mini"),
    system_prompt="""You are optimized for quick responses. 
    Keep answers brief and to the point."""
)

print("🎯 Specialized Local Agents Created:\n")
print("1️⃣ Privacy-First Assistant (Llama 3.2)")
print("   Perfect for: Personal journals, health data, financial planning")
print("\n2️⃣ Offline Code Assistant (Mistral)")
print("   Perfect for: Development in secure environments, air-gapped systems")
print("\n3️⃣ Quick Response Agent (Phi-3)")
print("   Perfect for: Real-time chat, quick lookups, rapid prototyping")

# Demo: Privacy-first use case
print("\n" + "="*60)
print("🔒 Demo: Privacy-First Assistant")
private_query = "I want to analyze my personal health metrics. Is my data safe?"
print(f"\n👤 You: {private_query}")
print("\n🤖 Privacy Agent:")
response = privacy_agent(private_query)
print(response)

## 💰 Step 9: Cost Analysis - Local vs Cloud

### The Economics of Local AI
Let's calculate how much you save by running models locally.

In [None]:
# Cost comparison calculator
print("💰 COST COMPARISON: Local vs Cloud AI")
print("=" * 60)

# Typical cloud pricing (approximate)
cloud_costs = {
    "GPT-4": {"input": 0.03, "output": 0.06},      # per 1K tokens
    "Claude-3": {"input": 0.015, "output": 0.075},
    "GPT-3.5": {"input": 0.0005, "output": 0.0015}
}

# Calculate monthly costs for different usage levels
usage_scenarios = [
    {"name": "Light Developer", "requests": 1000, "avg_tokens": 500},
    {"name": "Active Developer", "requests": 10000, "avg_tokens": 500},
    {"name": "Production App", "requests": 100000, "avg_tokens": 500}
]

print("\n📊 Monthly Cost Estimates (USD):\n")
print(f"{'Usage Level':<20} {'Cloud (GPT-3.5)':<15} {'Cloud (Claude-3)':<15} {'Local (Ollama)':<15}")
print("-" * 65)

for scenario in usage_scenarios:
    requests = scenario["requests"]
    tokens = scenario["avg_tokens"]
    total_tokens = requests * tokens / 1000  # Convert to thousands
    
    # Calculate costs
    gpt35_cost = total_tokens * (cloud_costs["GPT-3.5"]["input"] + cloud_costs["GPT-3.5"]["output"])
    claude_cost = total_tokens * (cloud_costs["Claude-3"]["input"] + cloud_costs["Claude-3"]["output"])
    local_cost = 0  # Always free!
    
    print(f"{scenario['name']:<20} ${gpt35_cost:<14.2f} ${claude_cost:<14.2f} ${local_cost:<14.2f}")

print("\n💡 Key Insights:")
print("   • Local models: $0 operational cost (after initial hardware)")
print("   • Cloud models: Costs scale with usage")
print("   • Break-even: Usually within 1-3 months for active developers")

## 🎓 Step 10: Best Practices and Tips

### Making the Most of Local Models
Here are key recommendations for using local models effectively.

In [None]:
print("🎓 BEST PRACTICES FOR LOCAL AI MODELS")
print("=" * 60)

# Quick reference guide
best_practices = {
    "🎯 Model Selection": [
        "Llama 3.2 (3B): Best general-purpose model",
        "Mistral (7B): Best for coding and technical tasks",
        "Phi-3 Mini: Fastest responses, good for real-time apps",
        "Llama 3.1 (8B): Best quality if you have 16GB+ RAM"
    ],
    "💻 Hardware Requirements": [
        "Minimum: 8GB RAM for 3B models",
        "Recommended: 16GB RAM for 7B models",
        "Optimal: 32GB RAM for multiple models",
        "GPU: Optional but 2-3x faster if available"
    ],
    "⚡ Performance Tips": [
        "Keep models loaded with 'ollama serve'",
        "Use smaller models for quick tasks",
        "Batch similar requests together",
        "Consider quantized versions for speed"
    ],
    "🔧 Development Workflow": [
        "Develop with local models (free & fast)",
        "Test edge cases without API limits",
        "Switch to cloud for production if needed",
        "Use same code for local and cloud models"
    ]
}

for category, tips in best_practices.items():
    print(f"\n{category}")
    for tip in tips:
        print(f"   • {tip}")

# When to use what
print("\n\n📋 QUICK DECISION GUIDE")
print("=" * 60)
print("\n✅ Use LOCAL models when:")
print("   • Working with sensitive/private data")
print("   • Developing and testing (no API costs)")
print("   • Need offline capability")
print("   • Want predictable latency")
print("   • Building privacy-first applications")

print("\n☁️  Use CLOUD models when:")
print("   • Need the absolute best quality")
print("   • Require specific model features (GPT-4 vision, etc.)")
print("   • Limited local compute resources")
print("   • Building for massive scale")

## 🎉 Congratulations!

### 🏆 What You've Accomplished
In just 10 minutes (excluding setup), you've:
- ✅ Set up Ollama for local AI execution
- ✅ Downloaded and configured multiple AI models
- ✅ Created agents with different local models
- ✅ Compared performance and capabilities
- ✅ Built privacy-first AI applications
- ✅ Learned when to use local vs cloud models

### 🚀 Your Journey Continues

You now have the power to:
- Build AI applications with zero API costs
- Keep sensitive data completely private
- Develop offline-capable AI systems
- Switch seamlessly between local and cloud models

### 📚 Next Steps

Ready to dive deeper? Check out:
1. **Video 4.5**: Advanced Ollama Configuration
2. **Video 5**: Understanding the Agent Loop
3. **Video 6**: Streaming and Real-Time Responses

### 💡 Remember

With Ollama and Strands, you have:
- **Freedom**: No API rate limits or costs
- **Privacy**: Your data stays yours
- **Flexibility**: Same code works everywhere
- **Power**: State-of-the-art models on your machine

### 🌟 Final Tip

Start developing with local models to save money and iterate faster. When you're ready for production, you can easily switch to cloud models if needed - or stick with local for maximum privacy!

Happy coding with your new local AI superpowers! 🚀