<a href="https://colab.research.google.com/github/Dntfreitas/introduction-agents-ai/blob/main/3_local_models_and_structured_outputs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Introduction to Ollama**

Ollama is a powerful command-line tool that makes it easy to run and interact with large language models locally. Designed for speed, privacy, and simplicity, Ollama enables developers and enthusiasts to use open models directly on their machines—no cloud required.

**Pulling a Model: `ollama pull`**

Before running a model, you need to download it using the `pull` command:

```
ollama pull <model>
```

Example:

```
ollama pull llama3
```

This downloads the `llama3` model to your local system so it’s ready for use.

**Running a Model: `ollama run`**

Once the model is pulled, start an interactive session with:

```
ollama run <model>
```

Example:

```
ollama run llama3
```

This opens a terminal chat session where you can interact with the model in real time.

## OpenAI-Compatible API

Ollama also provides an OpenAI-compatible API interface. This means you can use local models in place of OpenAI's models in existing applications with minimal changes—ideal for developers who want to integrate open-source LLMs into their tools while maintaining control and privacy.


Please, keep into account that the models are large and may take a while to download. Also, check your hardware requirements to ensure you have enough resources to run the models locally.

> [!CAUTION]
> This notebook must be run in your local environment, not in Google Colab!

# Local Models: Advantages and Disadvantages

## Advantages

- **Privacy**: Your data stays on your machine, reducing the risk of data leaks.
- **Cost**: No ongoing cloud costs; you only pay for the hardware.
- **Customization**: You can fine-tune models to better suit your specific needs.
- **No API Limits**: You are not subject to API rate limits or usage caps.
- **Offline Access**: You can run models without an internet connection.

## Disadvantages
- **Hardware Requirements**: Running large models requires significant computational resources (GPU/TPU).
- **Setup Complexity**: Initial setup can be more complex than using a cloud API.
- **Maintenance**: You are responsible for maintaining and updating the models.

In [None]:
# Start by pulling the model
!ollama pull gemma3:1b
!ollama pull deepseek-r1:7b

In [None]:
# Now, jump into the interactive shell and run the model
!ollama run gemma3:1b

In [None]:
# Congratulations! You have successfully pulled and run a model using Ollama.

In [None]:
# Now, let's import the necessary libraries and set up our environment.

import os

from dotenv import load_dotenv
from openai import OpenAI
from pydantic import BaseModel


In [None]:
# Load the environment variables from a `.env` file

load_dotenv(override=True)

In [None]:
# Now, let's initialize some models

gemma = OpenAI(base_url="http://localhost:11434/v1", api_key="gemma")
deepseek = OpenAI(base_url="http://localhost:11434/v1", api_key="deepseek")
gpt = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

models = {
    "gemma3:1b": gemma,
    "deepseek-r1:7b": deepseek,
    "gpt-4.1-nano": gpt,
}

# Structured Outputs

Structured outputs are a powerful feature of the OpenAI API that allows you to define the format of the output you want from the model. This is particularly useful when you need the model to return data in a specific structure, such as JSON or a table.

In [None]:
# We are going PyDantic to define structured data types for our models
class Competitor(BaseModel):
    """
    Competitor model to store the name and score of the competitor.
    """
    name: str
    """ The name of the competitor. """
    score: float
    """ The score of the competitor. The score is a integer between 0 and 20, with higher scores being better. """
    reason: str
    """ The reason for the score. """

In [None]:
prompt = "Write a touristic guide for the city of Funchal, Madeira."

In [None]:
def evaluate_response(prompt, model_name, model):
    """
    Evaluate the response of a model to a given prompt.
    """
    # Call the model to get the response
    response_model = model.chat.completions.create(
        model=model_name,
        messages=[{"role": "user", "content": prompt}],
    )

    response = response_model.choices[0].message.content

    # Evaluate the response
    evaluation = gpt.beta.chat.completions.parse(
        model="gpt-4.1-2025-04-14",
        messages=[
            {"role": "user",
             "content": f"Evaluate the following response for clarity and strength of argument: {response}"},
        ],
        response_format=Competitor,
    )

    return evaluation, response

In [None]:
# Iterate over the models and get their evaluations
evaluations = []
for name, model in models.items():
    competitor, response = evaluate_response(prompt, name, model)
    evaluations.append(
        {
            "name": name,
            "evaluation": competitor.choices[0].message.parsed,
            "response": response,
        }
    )

In [None]:
# Select the best response
best_response_idx = max(
    range(len(evaluations)),
    key=lambda i: evaluations[i]["evaluation"].score,
)

In [None]:
print(f"""

The best response was given by the model:{evaluations[best_response_idx]["name"]}
The score was: {evaluations[best_response_idx]["evaluation"].score}
The reason for the score was: {evaluations[best_response_idx]["evaluation"].reason}
The answer was: {evaluations[best_response_idx]["response"]}

""")