# Local LLM (Ollama): Ollama Chat & OpenAI Chat Completions

This notebook demonstrates two different ways to interact with local LLMs using Ollama:
- **Ollama Chat API**: The native Ollama Python library - simple and direct
- **OpenAI Chat Completions API Format**: Use OpenAI's client library with Ollama's OpenAI-compatible endpoint - useful for code compatibility

Both methods call the same local Ollama models running on your machine!

## Prerequisites

- Create Virtual Environment

    Example:
```bash
        uv venv --python=python3.12
        source .venv/bin/activate
        uv pip install -r requirements.txt -q
```

- For Ollama:
  - Ensure [Ollama is installed](https://ollama.ai/download)
  - Start Ollama by running: `ollama serve` in your Terminal
  - Pull a model (e.g., `llama3.2:3b`) by running: `ollama pull llama3.2:3b`


## Part 1: Ollama Chat (Local LLM)

Ollama allows you to run large language models locally on your machine. This provides:
- **Privacy**: Your data never leaves your machine
- **No API costs**: Free to use once installed
- **Offline capability**: Works without internet connection
- **Customization**: Run any model you download


### Import Ollama Library


In [1]:
import ollama

### 1. Basic Ollama Chat Request

**What:** Simple single-message chat with a local model.

**Why:** Demonstrates the basic structure of Ollama's chat API - similar to OpenAI but runs locally.

**How:** Use `ollama.chat()` with a model name and messages list.


In [2]:
# Basic chat request with Ollama
# Using llama3.2:3b - a small, fast model good for demos
# You can use other models like llama3.2, mistral, codellama, etc.

try:
    response = ollama.chat(
        model='llama3.2:3b',
        messages=[
            {
                'role': 'user',
                'content': 'What is machine learning? Explain in 2-3 sentences.'
            }
        ]
    )
    
    print("Ollama Response:")
    print(response['message']['content'])
except Exception as e:
    print(f"Error: {e}")
    print("\nMake sure Ollama is running (ollama serve) and the model is pulled (ollama pull llama3.2:3b)")


Ollama Response:
Machine learning (ML) is a subset of artificial intelligence that enables computers to learn from data and improve their performance on specific tasks without being explicitly programmed. It involves training algorithms on large datasets, allowing them to identify patterns and make predictions or decisions based on the information they've learned. This process enables machines to become increasingly accurate and autonomous over time.


### 2. Ollama with System Message

**What:** Adding a system message to control the assistant's behavior.

**Why:** System messages help set the tone, style, and constraints for the model.

**How:** Include a message with `role: "system"` in the messages list.


In [3]:
response = ollama.chat(
    model='llama3.2:3b',
    messages=[
        {
            'role': 'system',
            'content': 'You are a helpful assistant that explains technical concepts in simple terms.'
        },
        {
            'role': 'user',
            'content': 'What is machine learning?'
        }
    ]
)

print("Ollama Response (with system message):")
print(response['message']['content'])


Ollama Response (with system message):
Machine learning (ML) is a type of artificial intelligence (AI) that enables computers to learn from data and improve their performance on a task without being explicitly programmed.

Think of it like this: Imagine you're trying to teach a child to recognize different types of animals. You show them pictures of dogs, cats, birds, etc., and they try to classify each one as "dog" or "cat". At first, they might get some of the answers wrong, but with more practice and exposure to more data (pictures), they become better at recognizing the differences.

Machine learning works in a similar way. You provide computers with large amounts of data, such as images, text, or audio files, and let them learn from it to make predictions, classify objects, or identify patterns.

There are three main types of machine learning:

1. **Supervised Learning**: The computer is shown labeled data (e.g., pictures of dogs with labels "dog") and learns to recognize the patt

### 3. Multi-Turn Conversation with Ollama

**What:** Building a conversation by maintaining message history.

**Why:** Demonstrates context retention - the model remembers previous exchanges.

**How:** Include all previous messages (user and assistant) in the messages list.


In [4]:
# First turn
messages = [
    {
        'role': 'system',
        'content': 'You are a helpful coding assistant.'
    },
    {
        'role': 'user',
        'content': 'Write a simnple, non-recursive Python function to calculate factorial.'
    }
]

response = ollama.chat(
    model='llama3.2:3b',
    messages=messages
)

assistant_message = response['message']['content']
print("First Response:")
print(assistant_message)

# Add assistant's response to the conversation
messages.append({
    'role': 'assistant',
    'content': assistant_message
})


First Response:
**Calculating Factorial with a Simple Loop**

Here is an example of a simple Python function that calculates the factorial of a given number using a loop:

```python
def calculate_factorial(n):
    """
    Calculate the factorial of a given number.

    Args:
        n (int): The number to calculate the factorial for.

    Returns:
        int: The factorial of the given number.
    """
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result

# Example usage:
number = 5
factorial_result = calculate_factorial(number)
print(f"The factorial of {number} is: {factorial_result}")
```

In this code:

*   We define a function `calculate_factorial` that takes an integer `n` as input.
*   We initialize a variable `result` to 1, which will store the factorial result.
*   We use a for loop to iterate from 1 to `n` (inclusive).
*   Inside the loop, we multiply the current value of `result` by the current iteration number `i`.
*   After the loop finishes, w

In [5]:
# Second turn - the model remembers the previous conversation
messages.append({
    'role': 'user',
    'content': 'Can you make it recursive?'
})

response = ollama.chat(
    model='llama3.2:3b',
    messages=messages
)

print("Second Response (with context):")
print(response['message']['content'])


Second Response (with context):
**Calculating Factorial using Recursion**

Here is an example of a Python function that calculates the factorial of a given number using recursion:

```python
def calculate_factorial_recursive(n):
    """
    Calculate the factorial of a given number recursively.

    Args:
        n (int): The number to calculate the factorial for.

    Returns:
        int: The factorial of the given number.
    """
    if n == 0 or n == 1:
        # Base case: factorial of 0 and 1 is 1
        return 1
    else:
        # Recursive case: factorial of n is n * factorial of (n-1)
        return n * calculate_factorial_recursive(n - 1)

# Example usage:
number = 5
factorial_result = calculate_factorial_recursive(number)
print(f"The factorial of {number} is: {factorial_result}")
```

In this code:

*   We define a function `calculate_factorial_recursive` that takes an integer `n` as input.
*   We check if the input number `n` is 0 or 1. If so, we return 1 (since the facto

### 4. Ollama Parameters: Temperature

**What:** Temperature controls randomness/creativity in responses.

**Why:** Lower temperature (0.0-0.3) = more deterministic. Higher temperature (0.7-1.0) = more creative.

**How:** Set the `options` parameter with `temperature` value.


In [6]:
content = "Write a haiku about coding."

# Low temperature - deterministic
response = ollama.chat(
    model='llama3.2:3b',
    messages=[
        {
            'role': 'user',
            'content': content
        }
    ],
    options={
        'temperature': 0.0
    }
)

print("Temperature 0.0 (deterministic):")
print(response['message']['content'])


Temperature 0.0 (deterministic):
Lines of code descend
Logic's gentle, guiding hand
Beauty in the byte


In [7]:
# High temperature - creative
response = ollama.chat(
    model='llama3.2:3b',
    messages=[
        {
            'role': 'user',
            'content': content
        }
    ],
    options={
        'temperature': 1.0
    }
)

print("Temperature 1.0 (creative):")
print(response['message']['content'])


Temperature 1.0 (creative):
Pixels dance on screen
Code whispers secrets in wind
Algorithms sing


---

## Comparison: Native Ollama API vs OpenAI-Compatible API

### Why Use Each Approach?

#### Why Use Native Ollama API?

The native Ollama API is the simplest and most direct way to interact with Ollama:

- **Simpler syntax**: `ollama.chat()` is more straightforward than `client.chat.completions.create()`
- **Direct access**: No need to configure base URLs or worry about API compatibility
- **Ollama-specific features**: Access to features that might not be available through the OpenAI-compatible endpoint
- **Less overhead**: Fewer layers between your code and Ollama
- **Better for learning**: When you're specifically learning Ollama, the native API is more intuitive

#### Why Use OpenAI-Compatible API Format?

The OpenAI-compatible API format provides several important benefits:

- **Code reusability**: If you have existing code written for OpenAI's API, you can use it with Ollama by just changing the `base_url`. This means you can:
  - Test your OpenAI code locally with Ollama before deploying
  - Switch between OpenAI and Ollama without rewriting code
  - Use libraries and frameworks designed for OpenAI's API

- **Provider flexibility**: Build your application once and easily switch between:
  - Local Ollama models (privacy, no cost)
  - OpenAI cloud models (more powerful, requires API key)
  - Other OpenAI-compatible providers (Anthropic, etc.)

- **Team consistency**: If your team is already familiar with OpenAI's API structure, using the same format with Ollama reduces learning curve

- **Library ecosystem**: Many Python libraries and tools are built for OpenAI's API format, and they'll work with Ollama too

- **Production readiness**: If you plan to deploy to production and might switch between local and cloud models, using the OpenAI format makes that transition seamless

**Example scenario**: You're building an application that needs to work both locally (for development/testing) and in production (using OpenAI). With the OpenAI-compatible format, you can use the same code and just change the `base_url` and `api_key` based on the environment.

### Key Differences

| Feature | Native Ollama API | OpenAI-Compatible API |
|---------|-------------------|----------------------|
| **Library** | `ollama` Python package | `openai` Python package |
| **Syntax** | `ollama.chat()` | `client.chat.completions.create()` |
| **Response Format** | Dictionary with `['message']['content']` | `response.choices[0].message.content` |
| **Parameters** | `options={'temperature': 0.7}` | `temperature=0.7` |
| **Code Compatibility** | Ollama-specific | Compatible with OpenAI code |
| **Base URL** | Built-in (localhost:11434) | Must specify `base_url="http://localhost:11434/v1"` |

### When to Use Each

- **Use Native Ollama API when:**
  - You're building Ollama-specific applications
  - You prefer simpler, more direct syntax
  - You want to use Ollama-specific features
  - You're learning or experimenting with Ollama

- **Use OpenAI-Compatible API when:**
  - You want to reuse existing OpenAI code
  - You're building applications that might switch between providers
  - You prefer OpenAI's API structure
  - You want to maintain compatibility with OpenAI-based codebases


---

## Part 2: Using OpenAI Chat Completions API Format with Ollama

Now that you understand the differences, let's see how to use OpenAI's API format with Ollama. Ollama provides an OpenAI-compatible API endpoint, allowing you to use OpenAI's client library to call local models:
- **Same API, local models**: Use familiar OpenAI syntax with your local Ollama models
- **Code compatibility**: Easily switch between OpenAI and Ollama by changing the base URL
- **No API key needed**: Still runs entirely locally
- **Same privacy**: All data stays on your machine

### Import Libraries and Setup


In [8]:
from openai import OpenAI

openai_format_available = False

# Initialize OpenAI client pointing to Ollama's OpenAI-compatible endpoint
# Ollama runs on localhost:11434 by default
# The /v1 endpoint provides OpenAI-compatible API
try:
    client = OpenAI(
        base_url="http://localhost:11434/v1",
        api_key="ollama"  # Ollama doesn't require a real API key, but the client needs one
    )
    print("✓ Connected to Ollama's OpenAI-compatible endpoint")
    openai_format_available = True
except Exception as e:
    print(f"⚠️  Could not connect to Ollama: {e}")
    print("   Make sure Ollama is running (ollama serve)")


✓ Connected to Ollama's OpenAI-compatible endpoint


### 1. Basic Chat Completions (OpenAI Format)

**What:** Simple single-message chat using OpenAI's API format, but calling Ollama.

**Why:** Demonstrates how to use OpenAI's client library syntax with local Ollama models.

**How:** Use `client.chat.completions.create()` with Ollama model name and messages list.


In [9]:
if openai_format_available:
    response = client.chat.completions.create(
        model="llama3.2:3b",  # Using Ollama model name
        messages=[
            {"role": "user", "content": "What is machine learning? Explain in 2-3 sentences."}
        ]
    )
    
    print("Response (using OpenAI API format with Ollama):")
    print(response.choices[0].message.content)
    if hasattr(response, 'usage') and response.usage:
        print(f"\nTokens used: {response.usage.total_tokens}")
else:
    print("Ollama not available. Skipping this example.")


Response (using OpenAI API format with Ollama):
Machine learning (ML) is a subset of artificial intelligence that enables computers to learn and improve on their performance by analyzing data without being explicitly programmed. By recognizing patterns and relationships within the data, ML algorithms can make predictions, classify objects, and even create new insights, all without human intervention. This process is often driven by statistical and mathematical techniques.

Tokens used: 108


### 2. Chat Completions with System Message (OpenAI Format)

**What:** Adding a system message using OpenAI's API format.

**Why:** System messages help set the tone, style, and constraints for the model.

**How:** Include a message with `role: "system"` in the messages list.


In [None]:
if openai_format_available:
    response = client.chat.completions.create(
        model="llama3.2:3b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that explains technical concepts in simple terms."},
            {"role": "user", "content": "What is machine learning?"}
        ]
    )
    
    print("Response (using OpenAI API format with Ollama, with system message):")
    print(response.choices[0].message.content)
else:
    print("Ollama not available. Skipping this example.")


### 3. Multi-Turn Conversation (OpenAI Format)

**What:** Building a conversation using OpenAI's API format.

**Why:** Demonstrates context retention - the model remembers previous exchanges.

**How:** Include all previous messages (user and assistant) in the messages list.


In [None]:
if openai_format_available:
    # First turn
    messages = [
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a simnple, non-recursive Python function to calculate factorial."}
    ]
    
    response = client.chat.completions.create(
        model="llama3.2:3b",
        messages=messages
    )
    
    assistant_message = response.choices[0].message.content
    print("First Response (using OpenAI API format):")
    print(assistant_message)
    
    # Add assistant's response to the conversation
    messages.append({"role": "assistant", "content": assistant_message})
else:
    print("Ollama not available. Skipping this example.")


In [None]:
if openai_format_available:
    # Second turn - the model remembers the previous conversation
    messages.append({"role": "user", "content": "Can you make it recursive?"})
    
    response = client.chat.completions.create(
        model="llama3.2:3b",
        messages=messages
    )
    
    print("Second Response (with context, using OpenAI API format):")
    print(response.choices[0].message.content)
else:
    print("Ollama not available. Skipping this example.")
