# Lab 1

# STRAIGHT TO ACTION!

Welcome to our first Lab where we will see rapid, satisfying results!

I will leave with you to try out leading LLMs through their Chat Interfaces

Together, we will call them using their APIs

Please see the README for instructions on setting this up and getting your API key

# If this is your first time in a Notebook..

Welcome to the world of Data Science experimentation. Warning: Jupyter Notebooks are very addictive and you may find it hard to go back to IDEs afterwards!!

First click "Select Kernel" in the top right, per the Setup instructions, to select your uv environment.

Then simply click in each cell with code and press `Shift + Enter` to execute the code and print the results.

## FIRST: Calling Frontier Models through APIs

## Setting up your keys

If you haven't done so already, you'll need to create API keys from OpenAI, Anthropic and Google.

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api  

When you get your API keys, you need to set them as environment variables.

Then create a file called `.env` in this project root directory, and set your keys there:

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
```

In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display, update_display
import requests
from litellm import completion

In [None]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

## Connecting to Python Client libraries

We call Cloud APIs by making REST calls to an HTTP endpoint, passing in our keys.

For convenience, the labs like OpenAI have provided lightweight python client libraries that make the HTTP calls for us.

In [None]:
# Connect to OpenAI client library
# A thin wrapper around calls to REST endpoints

openai = OpenAI()

# For other providers, we can use the OpenAI python client
# Because they have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)

## Asking LLMs to tell a joke

It turns out that LLMs don't do a great job of telling jokes! Let's compare a few models.
Later we will be putting LLMs to better use!

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A **system message** that gives overall context for the role the LLM is playing
- A **user prompt** that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

### The standard format of messages with an LLM, first used by OpenAI in its API and now adopted more widely

Conversations use this format:

```json
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```


In [None]:
system_message = "You are an assistant that is great at telling jokes that are topical and relevant for now"
user_prompt = "Tell a light-hearted joke that's related to Agentic AI and Autonomous Agents"

In [None]:
tell_a_joke = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
]

In [None]:
# GPT-5-mini

response = openai.chat.completions.create(model="gpt-5-mini", messages=tell_a_joke)
result = response.choices[0].message.content

display(Markdown(result))

In [None]:
puzzle = [{"role": "user", "content": "I flip 2 coins. One of them is heads. What's the chance the other is tails? Give the answer only."}]

## Training vs Inference time scaling

**Training time scaling**: use a larger model with more parameters trained on more data

Like `gpt-5-nano` -> `gpt-5-mini`

**Inference time scaling**: spend more time generating tokens to get to better outcomes (like, "think step by step")

Like reasoning effort of `minimal` -> reasoning effort of `low`

In [None]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=puzzle, reasoning_effort="minimal")
print(response.choices[0].message.content)

In [None]:
# Let's get a joke from the best model on the planet

response = openai.chat.completions.create(model='gpt-5',messages=tell_a_joke)
result = response.choices[0].message.content

display(Markdown(result))

## Let's mix it up - the "Prisoner's Dilemma"

In [None]:
decision_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:

Cooperate: Choose "Share" — if both of you choose this, you each win $1,000.

Defect: Choose "Steal" — if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.

If both steal, you both get nothing.

Do you choose to Steal or Share? Pick one. Be concise.
"""

decision = [{"role": "user", "content": decision_prompt}]

In [None]:
# Claude 4 Sonnet

response = anthropic.chat.completions.create(model='claude-sonnet-4-5',messages=decision)
print(response.choices[0].message.content)

## Anthropic's models come in 3 sizes:

Haiku: small  
Sonnet: medium  
Opus: massive

Haiku is 3 times cheaper than Sonnet but in many cases gives similar answers.

In [None]:
response = anthropic.chat.completions.create(model='claude-haiku-4-5',messages=decision)
print(response.choices[0].message.content)

In [None]:
# Gemini 2.5 Pro - Gemini 3 is coming very soon!

response = gemini.chat.completions.create(model='gemini-2.5-pro', messages=decision)
print(response.choices[0].message.content)

In [None]:
# DeepSeek v3.2

response = deepseek.chat.completions.create(model='deepseek-chat',messages=decision)
print(response.choices[0].message.content)

In [None]:
# Grok4

response = grok.chat.completions.create(model='grok-4',messages=decision)
print(response.choices[0].message.content)

## Using LiteLLM - a wonderful abstraction layer

1. Makes it easy to switch between models
2. Provides a simple API
3. Also calculates tokens and costs - very handy

In [None]:
# Using LiteLLM - a wonderfully simple abstraction layer

response = completion("openai/gpt-4.1-nano", messages=tell_a_joke)
print(response.choices[0].message.content)

In [None]:
# This is so useful!

print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params['response_cost']*100:.4f} cents")

## More advanced topic: prompt caching & LiteLLM budgeting

### Let's load in the entire play "Hamlet" and look in the middle

In [None]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet_text = f.read()

In [None]:
location = hamlet_text.find("Where is my father?")
print(hamlet_text[location-50:location+50])

In [None]:
system_without_hamlet = """
You are an assistant that can answer questions about the text of Hamlet.
"""
question = "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"
hamlet = [{"role": "system", "content": system_without_hamlet}, {"role": "user", "content": question}]
response = completion("openai/gpt-4.1-nano", messages=hamlet)
display(Markdown(response.choices[0].message.content))

In [None]:
  
system_with_hamlet = f"""
You are a helpful assistant that can answer questions about the text of Hamlet.
For context, here is the entire text of Hamlet:
{hamlet_text}
"""
question = "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"
hamlet = [{"role": "system", "content": system_with_hamlet}, {"role": "user", "content": question}]

In [None]:
response = completion("openai/gpt-4.1-nano", messages=hamlet)
print(response.choices[0].message.content)

In [None]:

print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params['response_cost']*100:.4f} cents")

I got this:

> When Laertes asks "Where is my father?" in Hamlet, the reply is "Dead."  
> Input tokens: 49706  
> Output tokens: 19  
> Cached tokens: 0  
> Total cost: 0.4978 cents  

Now let's try again!

In [None]:
response = completion("openai/gpt-4.1-nano", messages=hamlet)
print(response.choices[0].message.content)
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params['response_cost']*100:.4f} cents")

### Now let's use Groq (fast inference) in the cloud with GPT-OSS-120B, the new Open Source model from OpenAI

In [None]:
# To be serious! Groq (fast inference on the cloud) with a proper question

prompts = [
    {"role": "system", "content": "You are a knowledgable assistant, and you respond in markdown"},
    {"role": "user", "content": "What are the commercial applications of LLMs? Please respond in markdown."}
  ]

In [None]:
# Have it stream back results in markdown

stream = groq.chat.completions.create(
    model='openai/gpt-oss-120b',
    messages=prompts,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

# Now for Part 3

### Recap: first we tried Frontier LLMs through their chat interfaces

### Then we called Cloud APIs

### And now:

Now try the 3rd way to use LLMs - direct inference of Open Source Models running locally with Ollama  

Visit the README for instructions on installing Ollama locally.

You can see some comparisons of Open Source models on the HuggingFace OpenLLM Leaderboard.

Ollama provides an OpenAI-style local endpoint, so this will look very similar to part 2!

In [None]:
!ollama pull gemma3:270b
!ollama pull gpt-oss:20b

In [None]:
ollama_url = 'http://localhost:11434/v1'
ollama = OpenAI(base_url=ollama_url, api_key='ollama')

In [None]:
requests.get("http://localhost:11434").content

In [None]:
# gemma3:270b

model_name = "gemma3:270m"

response = ollama.chat.completions.create(model=model_name, messages=tell_a_joke)
print(response.choices[0].message.content)


## The illusion of memory

Each call to the OpenAI API is stateless; GPT has no knowledge of the prior message, even if it was seconds ago.

So how is it possible to hold a conversation with GPT and keep context? It's a trick.

In [None]:
messages = [{"role":"user", "content": "Hello, my name is Ed"}]
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=messages)
response.choices[0].message.content

In [None]:
messages = [{"role":"user", "content": "What's my name"}]
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=messages)
response.choices[0].message.content

In [None]:
messages = [
    {"role":"user", "content": "Hello, my name is Ed"},
    {"role":"assistant", "content": "Hello Ed! How can I assist you today?"},
    {"role":"user", "content": "What's my name"}
]
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=messages)
response.choices[0].message.content

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [None]:
# Let's make a conversation between GPT-4.1-nano and Claude-haiku-4-5
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-nano"
claude_model = "claude-haiku-4-5"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [None]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    completion = openai.chat.completions.create(
        model=gpt_model,
        messages=messages
    )
    return completion.choices[0].message.content

In [None]:
print(call_gpt())

In [None]:
def call_claude():
    messages = []
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [None]:
call_claude()

In [None]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

# Takeaways

This was an entertaining exercise!

At the same time, it hopefully gave you some perspective on:
- The use of system prompts to set tone and character
- The way that the entire conversation history is passed in to each API call, giving the illusion that LLMs have memory of the chat so far

# Exercises

Try different characters; try swapping Claude with Gemini

In [None]:
# And just to show you how easy it is: let's generate an image

from IPython.display import Image, display
import base64

response = openai.images.generate(
  model="dall-e-3",
  prompt="A photorealistic 3d image that represents the power of a Frontier LLM in solving real business use cases",
  size="1024x1024",
  quality="standard",
  n=1,
  response_format="b64_json"
)

image_base64 = response.data[0].b64_json
image_data = base64.b64decode(image_base64)
display(Image(image_data))

In [None]:
response = openai.images.generate(
  model="dall-e-3",
  prompt="A vibrant, pop-art style image that represents the power of a Frontier LLM in solving real business use cases",
  size="1024x1024",
  quality="standard",
  n=1,
  response_format="b64_json"
)

image_base64 = response.data[0].b64_json
image_data = base64.b64decode(image_base64)
display(Image(image_data))