# Module 04 - Notebook 03: Ollama Local Setup

## Learning Objectives
- Install and configure Ollama
- Pull and manage local models
- Run inference locally with Python SDK
- Compare local vs cloud inference

## Prerequisites
- 8GB+ RAM recommended
- Ollama installed (https://ollama.ai)

---

## 1. Installing Ollama

```bash
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from https://ollama.com/download/windows

# Verify installation
ollama --version
```

## 2. Pulling Models

In [None]:
# Pull a model (run in terminal first)
# ollama pull llama3.2:3b
# ollama pull mistral

# List available models
!ollama list

## 3. Python SDK Setup

In [None]:
!pip install -q ollama

In [None]:
import ollama
import time

# Test connection
try:
    models = ollama.list()
    print("‚úì Connected to Ollama")
    print(f"\nAvailable models: {len(models['models'])}")
    for model in models['models']:
        print(f"  - {model['name']}")
except Exception as e:
    print(f"‚ùå Error: {e}")
    print("Make sure Ollama is running: ollama serve")

## 4. Basic Inference

In [None]:
# Simple chat
response = ollama.chat(
    model='llama3.2:3b',
    messages=[
        {'role': 'user', 'content': 'What is the capital of France?'}
    ]
)

print(response['message']['content'])

## 5. Streaming Responses

In [None]:
# Stream tokens as they're generated
stream = ollama.chat(
    model='llama3.2:3b',
    messages=[{'role': 'user', 'content': 'Write a short story about a robot.'}],
    stream=True
)

print("Streaming response:")
for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)
print("\n\n‚úì Complete")

## 6. Multi-Turn Conversations

In [None]:
# Maintain conversation context
messages = [
    {'role': 'system', 'content': 'You are a helpful Python tutor.'},
    {'role': 'user', 'content': 'How do I define a function in Python?'}
]

response1 = ollama.chat(model='llama3.2:3b', messages=messages)
print("Assistant:", response1['message']['content'])

# Continue conversation
messages.append(response1['message'])
messages.append({'role': 'user', 'content': 'Can you show me an example?'})

response2 = ollama.chat(model='llama3.2:3b', messages=messages)
print("\nAssistant:", response2['message']['content'])

## 7. Performance Comparison: Local vs Cloud

In [None]:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

prompt = "Explain the difference between lists and tuples in Python."

# Local inference (Ollama)
start = time.time()
local_response = ollama.chat(
    model='llama3.2:3b',
    messages=[{'role': 'user', 'content': prompt}]
)
local_time = time.time() - start

# Cloud inference (OpenAI)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
start = time.time()
cloud_response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt}]
)
cloud_time = time.time() - start

print(f"Local (Ollama): {local_time:.2f}s")
print(f"Cloud (OpenAI): {cloud_time:.2f}s")
print(f"\nSpeedup: {local_time/cloud_time:.2f}x")

## 8. Model Management

In [None]:
# Show model info
info = ollama.show('llama3.2:3b')
print("Model Info:")
print(f"Size: {info.get('size', 'Unknown')}")
print(f"Format: {info.get('format', 'Unknown')}")
print(f"Family: {info.get('family', 'Unknown')}")

## Exercise: Build a Local Chat Interface

Create a simple chat loop that:
1. Takes user input
2. Maintains conversation history
3. Uses Ollama for responses
4. Allows switching models

In [None]:
# TODO: Complete this exercise
def local_chat(model: str = 'llama3.2:3b'):
    """Interactive chat with local model."""
    messages = []
    
    print(f"Chat with {model} (type 'quit' to exit)\n")
    
    # Your code here
    pass

# Uncomment to test
# local_chat()

## Summary

You learned:
- ‚úÖ Installing and configuring Ollama
- ‚úÖ Managing local models
- ‚úÖ Running inference with Python SDK
- ‚úÖ Comparing local vs cloud performance

## Next Steps
- üìò Notebook 04: Prompt Injection Attacks