# Module 2: Frontier Models
## Lesson 12: Local Inference with the OpenAI SDK (Ollama)

### üìÑ Overview
We apply the "Universal Client" pattern to run models entirely offline. By pointing the OpenAI SDK to our local Ollama server (`localhost`), we can run inference without internet access, API costs, or data privacy concerns.

### üóùÔ∏è Key Concepts
* **Localhost Endpoint:** Ollama runs a local server at `http://localhost:11434/v1`. This mimics the OpenAI API structure.
* **API Key Ignorance:** Local models don't check for valid API keys, but the SDK requires *some* string to be present. We typically use `"ollama"` as a placeholder.
* **Distillation (Deep Dive):** The lesson introduces **DeepSeek-R1-Distill-Qwen-1.5B**.
    * *What is it?* A small 1.5B parameter model (Qwen) that was trained on the output of a massive reasoning model (DeepSeek-R1).
    * *Why use it?* It punches way above its weight class, offering high intelligence with very low RAM usage.

### üõ†Ô∏è Technical Implementation
To connect to your local machine:
1.  **Base URL:** `http://localhost:11434/v1`
2.  **API Key:** `"ollama"` (or any string)
3.  **Model Name:** Must match exactly what you pulled (e.g., `llama3.2` or `deepseek-r1:1.5b`).

*Prerequisite: Ensure you have run `ollama pull deepseek-r1:1.5b` in your terminal first.*

In [None]:
from openai import OpenAI

# 1. Initialize the Client pointing to Localhost
# No internet connection required for this!
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Required but unused
)

# 2. Define the Model
# Make sure you ran: 'ollama pull deepseek-r1:1.5b' (or llama3.2)
MODEL_NAME = "deepseek-r1:1.5b" 

print(f"üè† Asking {MODEL_NAME} (Running Locally)...")

try:
    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[
            {"role": "user", "content": "Tell me a fun fact about jellyfish."}
        ]
    )
    
    print("\n--- Local Response ---")
    print(response.choices[0].message.content)

except Exception as e:
    print(f"‚ùå Error: {e}")
    print("üí° Tip: Is Ollama running? Run 'ollama serve' in a separate terminal.")

### üß™ Lab Notes: The "Swap" Strategy

*This lesson confirms a powerful architectural pattern for my projects:*

| Component | Dev Environment (Laptop) | Production Environment (Server) |
| :--- | :--- | :--- |
| **Model** | `llama3.2` (via Ollama) | `gpt-4o` (via OpenAI) |
| **Cost** | $0.00 | Pay-per-token |
| **Privacy** | 100% Local | Data sent to Cloud |
| **Code Changes** | **None** (Only .env config changes) | **None** |

**Homework Assignment:**
The instructor challenged us to replicate the **Website Summarizer** from Lesson 6, but using a local model.