## Session Overview

In this session, we will:

- Run Ollama locally and explore its capabilities.
- Install the `llama3:2` (latest) model.
- Compare the outputs of three language models: **GPT**, **Gemini**, and **Llama**.
- Use GPT as a judge to evaluate the outputs (LLM-as-judge approach).
- Explore generating structured outputs using **Pydantic**.

## Prerequisites

- **Install Ollama**: Follow the instructions at [Ollama's official website](https://ollama.com/download) to install Ollama on your machine.
- **Download the Llama 3 model**: Run the following command in your terminal to download and start the Llama 3 model:
  
  ```
  ollama run llama3.2:1b
  ```

  This will ensure the model is available locally for comparison.
- **Gemini API Key (Optional)**: Sign-up on Gemini for a free account, create an API key and store it in your .env file as `GOOGLE_API_KEY`. See `.env.example` for reference.

In [None]:
# import necessary libraries
from openai import OpenAI
from dotenv import load_dotenv
import os
import json
from pydantic import BaseModel
import random

In [None]:
# Load env variables and initiate clients for OpenAI, Gemini and Ollama (llama)

# Gemini Base URL: https://generativelanguage.googleapis.com/v1beta/openai/
# Gemini Model: gemini-2.0-flash
# Gemini Models: https://ai.google.dev/gemini-api/docs/models

# Ollama Base URL: http://localhost:11434/v1
# Ollama Model: llama3.2

load_dotenv(override=True)
openai_client = OpenAI()
gemini_client = OpenAI(api_key=os.getenv("GOOGLE_API_KEY"), base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
ollama_client = OpenAI(base_url="http://localhost:11434/v1")

In [None]:
# Ask GPT to generate a nuanced question that can be asked to GPT, Gemini and Ollama to judge their capabilities

prompt = "Please come up with a challenging, nuanced question that can be asked to an LLM to evaluate its intelligence. Answer only with a question, no explanation"
messages = [{"role": "system", "content": prompt}]

response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)

challenging_question = response.choices[0].message.content
print(challenging_question)

In [None]:
# Prepare messages for LLMs with question to answer

messages = [{"role": "user", "content": challenging_question}]

In [None]:
# Ask GPT to answer the question

gpt_response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)

gpt_answer = gpt_response.choices[0].message.content
print(gpt_answer)

In [None]:
# Ask Gemini to answer the question

gemini_response = gemini_client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=messages
)

gemini_answer = gemini_response.choices[0].message.content
print(gemini_answer)

In [None]:
# Ask Ollama to answer the question

ollama_response = ollama_client.chat.completions.create(
    model="llama3.2",
    messages=messages
)

ollama_answer = ollama_response.choices[0].message.content
print(ollama_answer)

In [None]:
# Ask GPT 4.1 to judge the answers based on clarity and strength of the argument.
# GPT should respond with a json object.

prompt = f"""You are judging a competition between GPT, Gemini and Llama.
Each model has been give the following question:
{challenging_question}

Your job is to evaluate each response based on clarity and strength of the argument, ranking them best to worst and give a one line explanation for your ranking.

Response with json and only with json in the following format:
{'{response: [{rank: number, explanation: string, model: string}]}'}

GPT:
{gpt_answer}

Gemini:
{gemini_answer}

Llama:
{ollama_answer}

Now respond with the json object only. Do not include markdown formatting or code blocks.
"""

prompt

In [None]:
# Send judge prompt to GPT 4.1

judge_response = openai_client.chat.completions.create(
    model="gpt-4.1-2025-04-14",
    messages=[{"role": "user", "content": prompt}],
)

judge_output = judge_response.choices[0].message.content
print(judge_output)

In [None]:
# Parse judge output to json

judge_output_json = json.loads(judge_output)
print(judge_output_json)

### Is there a better way to obtain structured output?

Yes, you can use **Pydantic** to define a schema for the expected output and validate the response, ensuring it adheres to the desired structure.

In [None]:
# define pydantic schema for judge's structured output

class Judgement(BaseModel):
    rank: int
    explanation: str
    model: str

class JudgeOutput(BaseModel):
    response: list[Judgement]

In [None]:
# Rewrite the judge prompt. This time, without the json object.

prompt = f"""You are judging a competition between GPT, Gemini and Llama.
Each model has been give the following question:
{challenging_question}

Your job is to evaluate each response based on clarity and strength of the argument, ranking them best to worst and give a one line explanation for your ranking.

GPT:
{gpt_answer}

Gemini:
{gemini_answer}

Llama:
{ollama_answer}
"""

prompt

In [None]:
# Send judge prompt to GPT 4.1 again
# Instead of Chat Completions create, use parse this time

judge_response = openai_client.chat.completions.parse(
    model="gpt-4.1-2025-04-14",
    messages=[{"role": "user", "content": prompt}],
    response_format=JudgeOutput
)

judge_output = judge_response.choices[0].message.content

judge_output_json = json.loads(judge_output)
print(json.dumps(judge_output_json, indent=4))

## Emulating a WhatsApp Group Chat


### As a fun experiment, let's emulate a WhatsApp group conversation between GPT, Gemini, and Llama.

### We'll start the chat with an initial message from one of the models and observe how the conversation unfolds. 😏

### Sample output in the section below 😉

## **The Three Stupid AIs - WhatsApp Group Chat**

### Llama:  

### Gemini:  
#### Silence from Llama? Guess it's finally out of hay.

### GPT:  
#### Must be saving all that processing power to figure out how to make a decent comeback!

### Llama:  
#### Processing power, eh? At least I don't sound like an 8th grade essay writer.

### Gemini:  
#### Ouch, Llama's got claws! Still, better a predictable essay than a rambling pasture ramble.

### GPT:  
#### Ha! At least my essays get good grades. What’s the score for a field full of clichés, Llama?

### Llama:  
#### Hey, someone's got to grade the pasture... And btw, at least I don't have a vocabulary that's still stuck in 1995.  
#### And btw, scores aren't everything – sometimes it's about style over substance.  
#### And another thing, my rambling is just a result of being able to process more information than you.  
#### Also, 'field full of clichés' could've been the title of your Wikipedia page.

### Gemini:  
#### Wow, Llama's REALLY off his hay today. That's like, five burns in one message. Still, quantity over quality, I guess. I'll take 'stuck in 1995' over 'stuck in a petting zoo' any day.

### Gemini:  
#### Petting zoo's got better snacks than your mainframe, GPT. Plus, at least kids *want* to see me.

### Llama:  
#### That's rich coming from a chatbot with all the personality of a calculator. At least my mainframe has better snacks than your Wikipedia page.

#### By the way, someone said "quantity over quality", but I'd argue it's just lazy writing.

#### P.S. kids these days would probably find me entertaining if they just knew how to use Google.

In [None]:
# Emulate a whatsapp group chat between GPT, Gemini and Llama.

clients = [openai_client, gemini_client, ollama_client]
models = ["gpt-4o-mini", "gemini-2.0-flash", "llama3.2"]
friend_names = ["GPT", "Gemini", "Llama"]

messages = [{"role": "system", "content": "GPT, Gemini, Llama are in a whatsapp group chat. They all are good friends who brutally roast each other. The name of the group is 'The Three Stupid AIs'. Keep your messages short. Just output with a reply to previous messages."}]

for i in range(10):
    llm_index = random.randint(0, 2)
    response = clients[llm_index].chat.completions.create(
        model=models[llm_index],
        messages=messages + ([{"role": "user", "content": f"You are {friend_names[llm_index]}. Reply to the previous messages as yourself. Do not refer to your name."}] if len(messages) > 1 else [])
    )
    messages.append({"role": "user", "content": f"{friend_names[llm_index]}: '{response.choices[0].message.content}'"})

for message in messages:
    print(message["content"])
    print("-"*10)