# Model Providers: Running AI Locally with Ollama

**Free, Private, and Powerful: Your Agents on Your Machine**

---

Welcome to the world of **local AI models**! This notebook demonstrates how to run powerful AI agents entirely on your own machine using Ollama and the Strands Agents SDK. By the end of this 10-minute tutorial, you'll be able to create agents that run completely offline, cost nothing after setup, and keep your data 100% private.

### 🎯 What You'll Learn

In this hands-on tutorial, you will:
- Set up Ollama for local model execution
- Create Strands agents using local models
- Compare different open-source models
- Understand when to use local vs cloud models
- Build privacy-first AI applications
- Save money on AI development

### 🏠 Why Run Models Locally?

Running AI models locally offers several advantages:
- **💰 Cost**: Zero API fees after initial setup
- **🔒 Privacy**: Your data never leaves your machine
- **🌐 Offline**: Works without internet connection
- **🛠️ Control**: Full control over model 

Disadvantage:
- **⚡ Speed**: LLMs require highly-optimized hardware, normal PCs will be very slow
- **💰 Cost**: For the cost of decent GPU-based hardware, you can create **trillions of tokens**. 

## 📦 Pre-Setup: Installing Ollama and Required Packages

### 🚀 Installing Ollama (One-Time Setup)

Before we begin, you need to install Ollama on your system. This is a one-time setup that enables local AI model execution. Choose your platform below:

#### 🪟 Windows Installation
1. **Download the Installer**
   - Visit [ollama.com](https://ollama.com)
   - Click "Download for Windows"
   - Save the `.exe` installer

2. **Run the Installer**
   - Double-click the downloaded file
   - Follow the installation wizard
   - Ollama will install as a Windows service

3. **Verify Installation**
   - Open a new Command Prompt or PowerShell
   - Type: `ollama --version`
   - You should see the version number

#### 🍎 macOS Installation

**Option 1: Using Homebrew (Recommended)**
```bash
# If you have Homebrew installed:
brew install ollama

# Start Ollama:
ollama serve
```

**Option 2: Direct Download**
1. Visit [ollama.com](https://ollama.com)
2. Click "Download for macOS"
3. Open the downloaded `.dmg` file
4. Drag Ollama to your Applications folder
5. Launch Ollama from Applications
6. You'll see the Ollama icon in your menu bar

#### 🐧 Linux Installation

**One-Line Install (Recommended)**
```bash
curl -fsSL https://ollama.com/install.sh | sh
```

This script will:
- Download the latest Ollama binary
- Install it to `/usr/local/bin`
- Set up systemd service (on supported systems)
- Start the Ollama service

**Manual Installation**
```bash
# Download the binary
sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/local/bin/ollama

# Make it executable
sudo chmod +x /usr/local/bin/ollama

# Start Ollama
ollama serve
```

### ⏱️ Installation Time
- Download: 1-2 minutes (depending on internet speed)
- Installation: 1-2 minutes
- First model download: 3-5 minutes

**Note**: This setup time is NOT included in our 10-minute tutorial!

### 🐍 Installing Python Dependencies

Now let's install the Strands SDK with Ollama support:

In [1]:
# Install Strands with Ollama support
%pip install strands-agents -q
%pip install strands-agents[ollama] -q

print("✅ Strands SDK installed successfully!")

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.
✅ Strands SDK installed successfully!



[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## 🔍 Step 1: Checking Ollama Installation

### Smart Installation Checker
This cell will check if Ollama is installed on your system and provide platform-specific installation instructions if needed.

### ⏱️ Time Note
If you need to install Ollama, this is a one-time setup that doesn't count toward our 10-minute tutorial time!

In [2]:
import platform
import subprocess
import sys
import os

def check_ollama_installation():
    """Check if Ollama is installed and provide installation instructions if not."""
    try:
        # Try to run ollama version command
        result = subprocess.run(['ollama', '--version'], 
                              capture_output=True, text=True, 
                              shell=(platform.system() == 'Windows'))
        
        if result.returncode == 0:
            print("✅ Great! Ollama is installed!")
            print(f"   Version: {result.stdout.strip()}")
            return True
        else:
            raise Exception("Ollama command failed")
            
    except (subprocess.CalledProcessError, FileNotFoundError, Exception):
        print("❌ Ollama is not installed on your system.")
        print("\n📋 Installation Instructions:\n")
        
        system = platform.system()
        
        if system == "Darwin":  # macOS
            print("🍎 macOS detected. You have two options:\n")
            print("Option 1 - Using Homebrew (recommended):")
            print("   brew install ollama\n")
            print("Option 2 - Direct download:")
            print("   1. Visit https://ollama.com")
            print("   2. Download the macOS installer")
            print("   3. Run the installer")
            print("   4. Start Ollama from your Applications folder")
            
        elif system == "Linux":
            print("🐧 Linux detected. Install with this command:\n")
            print("   curl -fsSL https://ollama.com/install.sh | sh\n")
            print("After installation, Ollama will run as a service.")
            
        elif system == "Windows":
            print("🪟 Windows detected. Installation steps:\n")
            print("   1. Visit https://ollama.com")
            print("   2. Download the Windows installer")
            print("   3. Run the downloaded .exe file")
            print("   4. Follow the installation wizard")
            print("   5. Ollama will run as a Windows service")
            
        else:
            print(f"⚠️  Unknown system: {system}")
            print("   Visit https://ollama.com for installation instructions")
        
        print("\n⏸️  After installing Ollama, restart this notebook and continue!")
        return False

# Check installation
ollama_installed = check_ollama_installation()

if not ollama_installed:
    print("\n⚡ Quick tip: Installation usually takes just 2-3 minutes!")

✅ Great! Ollama is installed!
   Version: ollama version is 0.9.2


## 🤖 Step 2: Downloading AI Models

### Smart Model Management
We'll download three popular models for our demonstrations. Don't worry - if you already have them, we won't download them again!

### 📊 Models We'll Use:
1. **Llama 3.2 (3B)** - Latest from Meta, great all-around model
2. **Mistral (7B)** - Excellent for coding and technical tasks
3. **Phi-3 Mini (3.8B)** - Microsoft's efficient model, great for quick responses

In [3]:
def check_and_pull_model(model_name):
    """Check if a model is installed, pull if not."""
    try:
        # Get list of installed models
        result = subprocess.run(['ollama', 'list'], 
                              capture_output=True, text=True,
                              shell=(platform.system() == 'Windows'))
        
        if result.returncode != 0:
            print(f"❌ Error checking models: {result.stderr}")
            return False
            
        installed_models = result.stdout.lower()
        
        # Check if model is already installed
        if model_name.lower() in installed_models:
            print(f"✅ {model_name} is already installed!")
            return True
        else:
            print(f"📥 Downloading {model_name}... (this may take a few minutes)")
            
            # Pull the model
            result = subprocess.run(['ollama', 'pull', model_name],
                                  shell=(platform.system() == 'Windows'))
            
            if result.returncode == 0:
                print(f"✅ {model_name} downloaded successfully!")
                return True
            else:
                print(f"❌ Failed to download {model_name}")
                return False
                
    except Exception as e:
        print(f"❌ Error with {model_name}: {e}")
        return False

# Models to use in this tutorial
models_to_install = [
    "llama3.2",     # Meta's latest, 3B parameters
    "mistral",      # Great for coding, 7B parameters
    "phi3:mini"     # Microsoft's efficient model, 3.8B parameters
]

print("🚀 Setting up AI models for local execution...\n")

# Check and install each model
all_models_ready = True
for model in models_to_install:
    if not check_and_pull_model(model):
        all_models_ready = False
    print()  # Add spacing

if all_models_ready:
    print("🎉 All models are ready! Let's create some agents!")
else:
    print("⚠️  Some models couldn't be installed. You can continue with the available ones.")

🚀 Setting up AI models for local execution...

✅ llama3.2 is already installed!

✅ mistral is already installed!

✅ phi3:mini is already installed!

🎉 All models are ready! Let's create some agents!


## 📋 Optional Step 3: Viewing Available Models

Let's see what models are available on your system.

In [4]:
# List all available models
print("📋 Available Local Models:")
print("=" * 50)

try:
    result = subprocess.run(['ollama', 'list'], 
                          capture_output=True, text=True,
                          shell=(platform.system() == 'Windows'))
    
    if result.returncode == 0:
        print(result.stdout)
    else:
        print("❌ Could not list models")
except Exception as e:
    print(f"❌ Error: {e}")

print("\n💡 Tip: Model sizes shown are compressed. They'll use ~2x RAM when running.")

📋 Available Local Models:
NAME               ID              SIZE      MODIFIED    
phi3:mini          4f2222927938    2.2 GB    6 weeks ago    
mistral:latest     3944fe81ec14    4.1 GB    6 weeks ago    
llama3.2:latest    a80c4f17acd5    2.0 GB    6 weeks ago    


💡 Tip: Model sizes shown are compressed. They'll use ~2x RAM when running.


## 🚀 Step 4: Creating Your First Local Agent

### From Cloud to Local
Creating an agent with Ollama is just as easy as with cloud providers.

### 🔄 Provider Comparison
```python
# Cloud (AWS Bedrock)
from strands.models import BedrockModel
model = BedrockModel(model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0")

# Local (Ollama)
from strands.models import OllamaModel
model = OllamaModel(host="http://localhost:11434", model_id="llama3.2")
```

In [5]:
from strands import Agent
from strands.models.ollama import OllamaModel

ollama_host = "http://localhost:11434"

# Create a local agent with Llama 3.2
local_agent = Agent(
    model=OllamaModel(
        host=ollama_host,
        model_id="llama3.2"
    ),
    system_prompt="You are a helpful assistant running locally. Be concise and friendly."
)

print("🎉 Your first local AI agent is ready!")
print("   Model: Llama 3.2 (3B parameters)")
print("   Location: Running entirely on your machine")

🎉 Your first local AI agent is ready!
   Model: Llama 3.2 (3B parameters)
   Location: Running entirely on your machine


## 💬 Step 5: Your First Local Conversation

Let's test our local agent! Notice how it responds just like cloud-based models, but everything happens on your machine. Note that the first invocation can take a long time (even minutes) and depending on your hardware, subsequent invocations still may take 20 seconds or more.

In [6]:
# Test the local agent
import time

question = "Give me one sentence with advantages of running LLMs locally."

print(f"👤 You: {question}")
print("\n🤖 Local Llama 3.2 Agent:")
print("-" * 50)

start_time = time.time()
response = local_agent(question)
end_time = time.time()

print("\n")
print("-" * 50)
print(f"\n⏱️  Response time: {end_time - start_time:.2f} seconds")
print("💡 Note: First response may be slower as the model loads into memory.")

👤 You: Give me one sentence with advantages of running LLMs locally.

🤖 Local Llama 3.2 Agent:
--------------------------------------------------
Running Large Language Models (LLMs) locally can provide faster inference speeds, reduced dependence on internet connectivity, and improved data privacy without sacrificing the accuracy and functionality of these powerful models.

--------------------------------------------------

⏱️  Response time: 52.24 seconds
💡 Note: First response may be slower as the model loads into memory.


## 🔄 Step 6: Model Switching - Same Code, Different Models

### The Power of Strands
One of the best features of Strands is how easy it is to switch between models. Let's create agents with different local models and see how they compare.

In [7]:
# Create agents with different models
models = {
    "llama3.2": "Meta's Llama 3.2 - Great all-around model",
    "mistral": "Mistral 7B - Excellent for technical tasks",
    "phi3:mini": "Microsoft Phi-3 - Fast and efficient"
}

agents = {}
for model_id, description in models.items():
    try:
        agents[model_id] = Agent(
            model=OllamaModel(host=ollama_host, model_id=model_id),
            system_prompt="You are a helpful AI assistant. Be concise."
        )
        print(f"✅ Created agent with {model_id}")
        print(f"   {description}")
    except Exception as e:
        print(f"❌ Could not create agent with {model_id}: {e}")

print(f"\n🎯 Successfully created {len(agents)} different local agents!")

✅ Created agent with llama3.2
   Meta's Llama 3.2 - Great all-around model
✅ Created agent with mistral
   Mistral 7B - Excellent for technical tasks
✅ Created agent with phi3:mini
   Microsoft Phi-3 - Fast and efficient

🎯 Successfully created 3 different local agents!


## 📊 Step 7: Model Comparison - Side by Side

### Real-World Testing
Let's ask the same question to different models and compare their responses. This helps you choose the right model for your use case.

In [8]:
# Compare models with the same question
test_question = "Write a Python HelloWorld program."

print(f"🔬 Test Question: {test_question}")
print("=" * 80)

results = {}

for model_name, agent in agents.items():
    print(f"\n🤖 {model_name.upper()} Response:")
    print("-" * 40)
    
    try:
        start_time = time.time()
        response = agent(test_question)
        end_time = time.time()
        
        print(response)
        
        results[model_name] = {
            "time": end_time - start_time,
            "length": len(str(response))
        }
        
        print(f"\n⏱️  Time: {results[model_name]['time']:.2f}s")
        print(f"📏 Length: {results[model_name]['length']} characters")
        
    except Exception as e:
        print(f"❌ Error: {e}")
    
    print("-" * 40)

# Summary
print("\n📊 Performance Summary:")
for model, metrics in results.items():
    print(f"   {model}: {metrics['time']:.2f}s response time")

🔬 Test Question: Write a Python HelloWorld program.

🤖 LLAMA3.2 Response:
----------------------------------------
Here is a simple "Hello, World!" program in Python:

```python
# hello.py

def main():
    print("Hello, World!")

if __name__ == "__main__":
    main()
```

To run this program, save it to a file called `hello.py` and execute it with Python: 

```bash
python hello.py
```

This will output:

```
Hello, World!
```Here is a simple "Hello, World!" program in Python:

```python
# hello.py

def main():
    print("Hello, World!")

if __name__ == "__main__":
    main()
```

To run this program, save it to a file called `hello.py` and execute it with Python: 

```bash
python hello.py
```

This will output:

```
Hello, World!
```


⏱️  Time: 203.59s
📏 Length: 315 characters
----------------------------------------

🤖 MISTRAL Response:
----------------------------------------
 Here is a simple "Hello, World!" program in Python:

```python
print("Hello, World!")
```

When you run thi

## 💰 Step 10: Cost Analysis - Local vs Cloud

With LLMs in the cloud you pay per use, i.e., how many tokens do you put into and get out of the LLM. For small models (still bigger than 3B used above), this can cost **$0.15 for 1 million tokens** or even less (see [Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/)). Now there are more powerful, larger models that can charge up to $15 per million tokens, but the models we have been running here and that can be run on commodity hardware is not capable of running these extremely large models. So to keep this an apples-to-apples comparison, let's look at the cheaper cloud models with $0.15 for 1 million tokens.  

How much is a million tokens? About 750k words, which equals 2500 single spaced pages or 10 PhD dissertations or the entire works of Shakespeare. In other words: **A lot of text for 15 cents**.

When you run your model locally, all tokens cost $0. But you pay for a.) the (potentially beefy) hardware, b.) the power to run the hardware and c.) potentially software licenses including those for GPUs. Even if you spend just $100 on a PC, your power and licenses, you'd need to generate more than 666 million tokens. That's more than 1.6 million single-spaces pages of text. So in a nutshell, from a cost perspective it almost never makes sense to run this on your own hardware, but there may be other reasons as discussed above.

## 🎉 Congratulations!

### 🏆 What You've Accomplished
In just 10 minutes (excluding setup), you've:
- ✅ Set up Ollama for local AI execution
- ✅ Downloaded and configured multiple AI models
- ✅ Created agents with different local models
- ✅ Compared performance and capabilities
- ✅ Built privacy-first AI applications
- ✅ Learned when to use local vs cloud models

### 🚀 Your Journey Continues

You now have the power to:
- Build AI applications with zero API costs
- Keep sensitive data completely private
- Develop offline-capable AI systems
- Switch seamlessly between local and cloud models

### 📚 Next Steps

Ready to dive deeper? Check out:
1. **Video 4.5**: Advanced Ollama Configuration
2. **Video 5**: Understanding the Agent Loop
3. **Video 6**: Streaming and Real-Time Responses

### 💡 Remember

With Ollama and Strands, you have:
- **Freedom**: No API rate limits or costs
- **Privacy**: Your data stays yours
- **Flexibility**: Same code works everywhere
- **Power**: State-of-the-art models on your machine