# Lab 1

# STRAIGHT TO ACTION!

Welcome to our first Lab where we will see rapid, satisfying results!

I will leave with you to try out leading LLMs through their Chat Interfaces

Together, we will call them using their APIs

Please see the README for instructions on setting this up and getting your API key

# If this is your first time in a Jupyter Notebook..

Welcome to the world of Data Science experimentation. Warning: Jupyter Notebooks are very addictive and you may find it hard to go back to IDEs afterwards!!

Simply click in each cell with code and press `Shift + Enter` to execute the code and print the results.

There's a notebook called "Guide to Jupyter" in the parent directory that will give you a handy tutorial on all things Jupyter Lab.

## First: Calling Frontier Models through APIs

## Setting up your keys

If you haven't done so already, you'll need to create API keys from OpenAI, Anthropic and Google, and also DeepSeek and Groq if you wish.

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api  

When you get your API keys, you need to set them as environment variables.

EITHER (recommended) create a file called `.env` in this project root directory, and set your keys there:

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
```

OR enter the keys directly in the cells below.

## Two purposes of these APIs:

1. Illustrate the different APIs
2. Experiment with some LLMs

In [None]:
# imports

import os
import json
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display, update_display

In [None]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

## Connecting to Python Client libraries

We call Cloud APIs by making REST calls to an HTTP endpoint, passing in our keys.

For convenience, OpenAI has provided a lightweight python client library that makes the HTTP calls for us.

In [None]:
# Connect to OpenAI client library
# A thin wrappes around calls to REST endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)

## Asking LLMs a hard question that will put them to the test and illustrate their power

We will come up with a challenging question to test out model performance with language and nuance.

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A **system message** that gives overall context for the role the LLM is playing
- A **user message** that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

### The standard format of messages with an LLM, first used by OpenAI in its API and now adopted more widely

Conversations use this format:

```python
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```


In [None]:
# The hardest question I could come up with

system_message = "You explain concepts concisely with powerful analogies"

user_prompt = "In 1 short sentence, describe a rainbow to someone who's never been able to see. \
Then in 1 short sentence, describe the imaginary number i to someone who doesn't understand math. \
Then in 1 short sentence, find a connection between rainbows and imaginary numbers. \
Then end by stating how many words are in your answer."

In [None]:
challenge = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
]

In [None]:
challenge

In [None]:
# new gpt-5-nano with new reasoning_effort="minimal" setting

model_name = "gpt-5-nano"
response = openai.chat.completions.create(model=model_name, messages=challenge, reasoning_effort="minimal")
reply = response.choices[0].message.content
print(reply)

In [None]:
models = []
answers = []

In [None]:
def record(model, stream):
    prefix = f"### Response from {model}:\n\n"
    reply = ""
    display_handle = display(Markdown(prefix), display_id=True)
    for chunk in stream:
        reply += chunk.choices[0].delta.content or ''
        update_display(Markdown(prefix+reply), display_id=display_handle.display_id)
    words = reply.split('</think>')[1] if '</think>' in reply else reply
    reply += f"\n\n#### Calculated true word count: {len(words.split())}"
    update_display(Markdown(prefix+reply), display_id=display_handle.display_id)
    
    models.append(model)
    answers.append(reply)

In [None]:
model_name = "gpt-5-mini"

stream = openai.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# GPT-5-nano

model_name = "gpt-5-nano"

stream = openai.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# GPT-5

model_name = "gpt-5"

stream = openai.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# Claude 4.5 Sonnet

model_name = "claude-sonnet-4-5"

stream = anthropic.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# Claude 4.5 Haiku

model_name = "claude-haiku-4-5"

stream = anthropic.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# Gemini 2.5 Flash

model_name = "gemini-2.5-flash"

stream = gemini.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# Gemini 2.5 Pro

model_name = "gemini-2.5-pro"

stream = gemini.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# Deepseek-V3

model_name = "deepseek-chat"

stream = deepseek.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# Deepseek-R1
# This takes too long! It can get stuck in a loop 

# model_name = "deepseek-reasoner"

# response = deepseek.chat.completions.create(model=model_name, messages=challenge)
# reply = response.choices[0].message.content

# record(model_name, reply)

In [None]:
# Grok from x.ai

model_name = "grok-4"

stream = grok.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# Groq - OpenAI OSS 120B

model_name = "openai/gpt-oss-120b"

stream = groq.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

# Now for local models

### First we called models in the cloud

### And now:

Now try direct inference of Open Source Models running locally with Ollama
Visit the README for instructions on installing Ollama locally.

You can see some comparisons of Open Source models on the HuggingFace OpenLLM Leaderboard.

Ollama provides an OpenAI-style local endpoint, so this will look very similar to part 2!


In [None]:
!ollama pull gemma3:270m
!ollama pull gpt-oss

In [None]:
ollama_url = 'http://localhost:11434/v1'
ollama = OpenAI(base_url=ollama_url, api_key='ollama')

In [None]:
requests.get("http://localhost:11434").content

In [None]:
# llama3.2

model_name = "gemma3:270m"

stream = ollama.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# OpenAI OSS - the 20B version, not 120B, running on my computer!!

model_name = "gpt-oss:20b"

stream = ollama.chat.completions.create(model=model_name, messages=challenge, stream=True)
record(model_name, stream)

In [None]:
# So where are we?

print(len(models))
print(models)
print(answers)

In [None]:
together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [None]:
display(Markdown(together))

In [None]:
judge = f"""You are judging a competition between {len(models)} competitors.
Each model has been given this question:

{challenge[1]["content"]}

Your job is to evaluate each response for clarity and strength of argument and accuracy of word count, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor; after each response, the actual word count of their answer is given:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""

In [None]:
display(Markdown(judge))

In [None]:
judge_messages = [{"role": "user", "content": judge}]

# Not very scientific - but quite interesting!

In [None]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b",messages=judge_messages)
results = response.choices[0].message.content
print(results)

In [None]:
results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = models[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [None]:
# Let's make a conversation between GPT-4o-mini and Claude-3-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-nano"
claude_model = "claude-haiku-4-5"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [None]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [None]:
print(call_gpt())

In [None]:
def call_claude():
    messages = []
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [None]:
call_claude()

In [None]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

# Takeaways

This was an entertaining exercise!

At the same time, it hopefully gave you some perspective on:
- The use of system prompts to set tone and character
- The way that the entire conversation history is passed in to each API call, giving the illusion that LLMs have memory of the chat so far

# Exercises

Try different characters; try swapping Claude with Gemini