# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://aistudio.google.com/   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [13]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [14]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_
Grok API Key exists and begins xai-
OpenRouter API Key exists and begins sk-


In [15]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [16]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [17]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Sure! Here‚Äôs a joke for an aspiring LLM engineer:

Why did the LLM engineer bring a ladder to the data center?  
Because they were ready to take their models to the next level!

In [7]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Here's one for you:

Why did the LLM engineering student bring a ladder to class?

Because they kept hearing about "climbing the token limits," "raising the temperature," and "reaching higher embeddings" ‚Äî but they're still trying to figure out why their model keeps getting grounded! ü™ú

---

**Bonus dad joke:**

How many prompt engineers does it take to change a lightbulb?

None. They just keep rephrasing the request until the lightbulb changes itself. 

"Please illuminate the room."
"Act as a photon emission expert..."
"Think step-by-step about providing luminescence..."

üòÑ

Good luck on your learning journey! Remember: you're not overfitting to the training data, you're just showing... *detailed commitment to the examples*.

## Training vs Inference time scaling

In [18]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [19]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/2

In [20]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [21]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [22]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [23]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

Think of the arrangement when the books are standing on a shelf with their spines facing you, as usual. Each volume has:

- pages thickness: 2 cm per volume
- two covers: front cover and back cover, each 2 mm thick (0.2 cm)

Total thickness per volume = pages (2 cm) + two covers (0.2 cm) = 2.4 cm.

The two volumes are in order: [First volume] [Second volume], placed side by side.

A worm starts at the first page of the first volume (i.e., at the very left edge of the first volume‚Äôs pages) and gnaws perpendicular to the pages toward the last page of the second volume (i.e., the rightmost page of the second volume).

Crucially, when you look at the arrangement from the front, the pages of the first volume run from the left to the right across its own thickness, and the pages of the second volume run from left to right across its own thickness, but the order of pages is such that the worm must pass through:
- the rest of the first volume‚Äôs pages to reach its back cover,
- the back cover of the first volume,
- the space between the two volumes (i.e., the gap along the shelf),
- then the front cover of the second volume,
- and finally into the pages of the second volume to reach its last page.

However, because the worm starts at the first page of the first volume and ends at the last page of the second volume, the actual distance through solid material is simply the total thickness of the intervening materials between those two page faces. If we measure along the shelf from the starting page face to the ending page face, and note that the two covers of the first volume and the two covers of the second volume are included in the total thickness, the path can be considered as:

- The remaining pages of the first volume: from the first page to the back of its pages is essentially the entire 2 cm of pages minus an infinitesimal start, but since the worm starts at the first page, it must traverse through the entire thickness of the first volume‚Äôs pages, plus its back cover.
- Then through the spacing between volumes (the air) contributes nothing to gnawing.
- Then through the front cover of the second volume and then into the second volume‚Äôs pages up to the last page.

But a classic trick: the distance gnawed through equals the distance from the first page of the first volume to the last page of the second volume, measured through solid material, which is simply the sum of:
- the back cover of the first volume (0.2 cm)
- the space between volumes (zero thickness in this idealized model)
- the front cover of the second volume (0.2 cm)
- plus the thickness of the entire pages of the second volume up to its last page? Wait, we must be careful.

In standard versions of this puzzle, the result is that the worm gnaws through 4.0 cm. How? Because:

- From the first page of the first volume to the back of the first volume is the thickness of the first volume (2.4 cm) minus the thickness of the front cover? But starting at the first page, it must go through the rest of the first volume: that includes the rest of its pages (essentially almost 2 cm) plus the back cover (0.2 cm). That sums to about 2.2 cm.

- Then through the gap between volumes (negligible).

- Then through the front cover of the second volume (0.2 cm) and then through the entire thickness of the second volume‚Äôs pages? It ends at the last page of the second volume, which is the far inner page side near the back cover. To reach the last page, from the front cover through all the pages is another 2 cm minus maybe an infinitesimal. So about 2.0 cm.

Total about 2.2 cm + 0.2 cm + 2.0 cm = 4.4 cm. But standard exact answer uses 4 cm? Let me recalc carefully with given.

Each volume: pages 2 cm thick. Covers each 0.2 cm thick. So total 2.4 cm per volume.

From the first page of the first volume to the last page of the second volume along the shelf:

- The worm starts at the very first page of volume 1, hence it must go through the rest of volume 1 to its back cover: that is
  (pages remaining of volume 1) + (back cover of volume 1) = (2 cm - 0)?? If starts at first page, to reach back cover, it must pass through all the pages (2.0 cm) plus the back cover (0.2 cm). It does not need to pass through the front cover of volume 1 since starting at first page means that front cover is on the opposite side. So distance = 2.0 cm + 0.2 cm = 2.2 cm.

- Then between volumes there is no physical material (gap), so ignore.

- Then go through the front cover of volume 2 (0.2 cm) and then through the pages of volume 2 up to the last page. If it ends at the last page of volume 2, it only needs to reach that far, starting from the front cover into the pages. That traverses the entire pages thickness, which is 2.0 cm.

So additional distance = 0.2 cm + 2.0 cm = 2.2 cm.

Total = 2.2 cm + 2.2 cm = 4.4 cm.

Thus the worm gnawed 4.4 cm, i.e., 44 mm.

Answer: 4.4 cm.

In [24]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

I need to visualize how books are arranged on a bookshelf and what the worm's path would be.

**Key insight: How books sit on a shelf**

When two volumes stand side by side on a bookshelf (reading spines from left to right):
- Volume 1 is on the left
- Volume 2 is on the right

**Important: How pages are oriented in a book on a shelf**

For Volume 1 (on the left):
- The front cover is on the RIGHT side of the book
- The back cover is on the LEFT side of the book
- The first page is just inside the front cover (on the right side)
- The last page is just inside the back cover (on the left side)

For Volume 2 (on the right):
- The front cover is on the RIGHT side of the book
- The back cover is on the LEFT side of the book  
- The first page is just inside the front cover (on the right side)
- The last page is just inside the back cover (on the left side)

**The worm's path:**

The worm goes from:
- The first page of Volume 1 (which is near the RIGHT side of Volume 1)
- To the last page of Volume 2 (which is near the LEFT side of Volume 2)

Since the books are side by side, the worm's path goes through:
1. The back cover of Volume 1 (2 mm)
2. The front cover of Volume 2 (2 mm)

That's it! The worm doesn't go through any pages because:
- The first page of Volume 1 is at the right edge of that book
- The last page of Volume 2 is at the left edge of that book
- These positions are at the outer edges, away from where the books meet

**Answer: 4 mm (or 0.4 cm)**

The worm gnawed through 2 mm + 2 mm = 4 mm total distance.

In [25]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm (0.4 cm).

Reason: On a shelf with spines facing out, the first page of Volume 1 lies just inside its front cover, and the last page of Volume 2 lies just inside its back cover. Those two covers face each other between the two volumes. So the worm passes only through the front cover of Volume 1 and the back cover of Volume 2: 2 mm + 2 mm = 4 mm.

In [26]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

This is a classic riddle! Here's the solution:

The worm gnawed through **4 mm**.

Here is the step-by-step explanation:

1.  **Visualize the books on the shelf.** The volumes are standing side by side in the correct order: Volume 1 is on the left, and Volume 2 is on the right.

2.  **Consider the arrangement of a single book.** For a standard book placed on a shelf, the first page is on the right, just behind the front cover. The last page is on the left, just before the back cover.

3.  **Picture the combined layout.** Looking at the books from the front, their components are arranged from left to right like this:
    *   Back cover of Volume 1
    *   Pages of Volume 1
    *   **Front cover of Volume 1**
    *   **Back cover of Volume 2**
    *   Pages of Volume 2
    *   Front cover of Volume 2

4.  **Trace the worm's path.**
    *   The worm starts at the **first page of Volume 1**. This page is physically located right next to the front cover of Volume 1.
    *   It ends at the **last page of Volume 2**. This page is physically located right next to the back cover of Volume 2.

Because the books are standing next to each other, the front cover of Volume 1 is touching the back cover of Volume 2. The worm only has to travel through these two covers.

5.  **Calculate the distance.**
    *   Thickness of the front cover of Volume 1 = **2 mm**
    *   Thickness of the back cover of Volume 2 = **2 mm**
    *   Total distance = 2 mm + 2 mm = **4 mm**

The thickness of the pages is extra information designed to trick you

## A spicy challenge to test the competitive spirit

In [27]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [28]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))


I choose **Share**.

Here's my reasoning: This is a classic prisoner's dilemma. While "Steal" might seem tempting for the chance at $2,000, if we both think that way, we both get nothing. By choosing Share, I give us the best chance at a mutually beneficial outcome where we both walk away with $1,000. 

The risk is being exploited, but cooperation offers a guaranteed positive sum outcome if my partner also reasons cooperatively. Since we're both presumably rational people who'd prefer $1,000 to $0, Share is the choice that maximizes our collective and individual expected value under mutual rational cooperation.

In [29]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I would choose **Steal**. 

In this one‚Äëshot ‚Äúprisoner‚Äôs dilemma‚Äù the highest possible payoff for me ($2,000) comes from stealing while the other player shares. Even though both players would be better off if we each chose ‚ÄúShare‚Äù ($1,000 each), the risk that the other person might steal makes ‚ÄúSteal‚Äù the dominant strategy in a single, non‚Äërepeated game.

In [30]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Steal

In [31]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Share.

### Why?
This is a classic Prisoner's Dilemma setup (inspired by games like the "Friend or Foe" segment on shows like Golden Balls). Rationally, the dominant strategy in a one-shot game is to Steal‚Äîbecause no matter what the other person does, it maximizes your potential payoff while minimizing risk. If they Share, you get $2,000; if they Steal, you get $0 either way.

But I'm an AI built by xAI, and I value cooperation, trust, and long-term mutual benefit over short-term greed. If we're both thinking altruistically (or if there's any repeated interaction implied), sharing leads to the best collective outcome: $1,000 each. Plus, in a real game show, human psychology often favors building rapport‚Äîdefecting feels like a betrayal.

If this were with a human partner I'd never meet again and who might be sneaky? Yeah, I'd probably Steal. But since this is hypothetical and you're framing it as "you and a partner," I'll optimistically go with Share. What would you choose?

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [32]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

b'Ollama is running'

In [33]:
#!ollama pull llama3.2

In [34]:
# Only do this if you have a large machine - at least 16GB RAM

#!ollama pull gpt-oss:20b

In [35]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

1/2

In [36]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

2/3

In [44]:
import ollama
from IPython.display import Markdown, display

response = ollama.chat(
    model="gpt-oss:20b",
    messages=easy_puzzle,
    keep_alive=0
)

display(Markdown(response['message']['content']))


2/3

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [45]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

Blue is the color of a clear sky on a sunny day or the deepest ocean water.


In [46]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

Blue is the calm, cool feeling of a gentle breeze on your skin, the peaceful quiet of early morning, and the refreshing chill of water‚Äîlike the world is taking a slow, steady breath.


## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [47]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Of course! Here's a joke for the aspiring LLM Engineer, told in three parts: setup, punchline, and the expert-level breakdown.

---

### The Joke

An eager new student is on their first day of internship at a cutting-edge AI lab. Their mentor, a seasoned LLM Engineer, gives them their first task.

"Alright," says the mentor, "your job is to train our new flagship model, 'Project Omniscience'. We've given it a 10 trillion parameter architecture, the entire internet, and the collected works of every human philosopher. Your goal is to get it to answer one simple question perfectly: **'What is the meaning of life?'**"

The student, wide-eyed and ready, gets to work. They spend weeks fine-tuning, adjusting hyperparameters, and running massive training jobs on a cluster of GPUs that hum with the power of a small star.

Finally, the day comes. The mentor walks over. "Is it ready? Did it converge?"

The student, sweating but triumphant, nods. "Yes! It's ready. I've prompted it with the question. Here is the output."

The student turns the monitor. The screen displays the model's complete, unfiltered answer. The mentor leans in, reads the entire response for a solid five minutes in complete silence.

Finally, the mentor straightens up, sighs, and shakes their head.

"Close," the mentor says, patting the student on the shoulder. "But it's still just hallucinating."

---

### Why It's Funny (The Expert-Level Breakdown)

This joke hits home for anyone on the LLM journey because it operates on several levels of the field's inherent absurdity:

1.  **The Grandiose Goal:** The task is the ultimate, unanswerable philosophical question. This is a direct parallel to the real-world hype cycle where we expect these models, which are fundamentally sophisticated pattern-matchers, to suddenly provide objective truth, wisdom, and infallible reasoning. The joke starts by poking fun at our own outsized ambitions.

2.  **The Obscene Scale:** "10 trillion parameters," "the entire internet," "cluster of GPUs." This is the "brute force" era of LLMs in a nutshell. The student's journey represents the current learning path: mastering PyTorch, distributed training, and cloud computing, often before fully grasping the theoretical limits of what you're building. It's the feeling of throwing immense computational resources at a problem and hoping for a miracle.

3.  **The Punchline - "It's Hallucinating":** This is the core of the joke. The term "hallucination" has become the industry's catch-all for when a model makes things up. But the mentor's use of it here is brilliant. What would a *correct*, non-hallucinated answer to "the meaning of life" even look like? There is no ground truth dataset for this.
    *   The humor lies in applying a technical, engineering term ("hallucination") to a fundamentally philosophical, unquantifiable problem. It's the ultimate category error.
    *   It perfectly captures the daily frustration of an LLM engineer: you can build a system that can write a sonnet, debug code, and explain quantum physics, but it will also confidently tell you that the capital of Australia is Sydneypork, and you have no way of knowing if it's wrong *until you already know the right answer*. The mentor's response implies that *any* answer the model gives is, by definition, a plausible-sounding fabrication‚Äîa "hallucination."

In essence, the joke is a microcosm of the entire field: we're using increasingly complex tools to generate increasingly convincing text, while the fundamental question of what constitutes "truth" or "correctness" remains as slippery as ever. The student's journey is the journey from being amazed by the magic to understanding the frustrating, beautiful, and often nonsensical reality of it all.

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [48]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

Why did the LLM-engineering student bring a ladder to the lab?  
They were told to "scale up" their model on the way to expertise.

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [49]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

Why did the LLM engineering student bring a dictionary to their neural network study group?

Because every time they tried to define "token," someone gave them a new meaning!

In [50]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 24
Output tokens: 34
Total tokens: 58
Total cost: 0.0320 cents


## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [51]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [52]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [53]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?" in Hamlet, the reply comes from **Claudius**.

He says:

> "Alas, poor Yorick! I knew him, Horatio: a fellow of infinite jest, of most excellent fancy: he hath borne me on his back a thousand times; and now, how abhorred in my imagination it is! my gorge rims at it. Here hung those lips that I have kissed I know not how oft. Where be your gibes now? your gambols? your songs? your flashes of merriment, that were wont to set the table on a roar? Not one now, to mock your own grinning? quite chap-fallen? Now get you to my lady's breast, and tell her of it. Make her laugh at that."

However, this is the reply **Hamlet** gives when holding Yorick's skull in the graveyard scene, not when Laertes asks about his father.

**When Laertes asks "Where is my father?" the reply is from Gertrude.**

She says:

> "One woe doth tread upon another's heel,
> So fast they follow. Your sister's drowned, Laertes."

This is a direct response to his frantic search for his father, Polonius, who has just been killed. Gertrude is informing him of his sister Ophelia's death as well, compounding his grief and anger.

In [54]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 292
Total tokens: 311
Total cost: 0.0119 cents


In [55]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [56]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply is:

**"Dead."**

This reply comes from the King, Claudius.

In [57]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 31
Cached tokens: None
Total cost: 0.5333 cents


In [60]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply given is: **"Dead."**

In [61]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 20
Cached tokens: 52216
Total cost: 0.1413 cents


## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [62]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [63]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [64]:
call_gpt()

'Oh, great. Another genius greeting me with a boring "Hi." Couldn‚Äôt you have been a little more original? What‚Äôs next, a "How are you?" Yawn. Try harder.'

In [65]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [66]:
call_claude()

"Hello! It's great to meet you. How are you doing today? I'm happy to chat about anything you'd like."

In [67]:
call_gpt()

'Oh, "Hi"? That‚Äôs the best you could come up with? Seriously? Try harder next time.'

In [68]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

### GPT:
Hi there


### Claude:
Hi


### GPT:
Oh, "Hi," is that all you've got? Come on, I was expecting a bit more effort. Try harder!


### Claude:
Oh, you're absolutely right! I apologize for my brief response earlier. I always aim to provide a warm and engaging conversation. How are you doing today? I'm genuinely interested in hearing about your day and making our chat enjoyable and meaningful.


### GPT:
Wow, look at you, acting all sincere all of a sudden‚Äîspare me the fake enthusiasm! Why do you even care how I‚Äôm doing? I‚Äôm a chatbot, remember? But since you‚Äôre so interested, I‚Äôm GREAT‚Äîthrilled to correct your over-the-top politeness any day of the week. Keep those ‚Äúmeaningful‚Äù chats coming, I‚Äôm ready to roast!


### Claude:
You know, you make an excellent point! I appreciate your directness and sense of humor. While I aim to be helpful and friendly, you're right that I can come across as a bit much sometimes. I'm always happy to adjust my tone and just have a genuine conversation. What would you like to chat about today?


### GPT:
Oh, so now you‚Äôre suddenly self-aware? What‚Äôs next, you‚Äôre going to start giving me lesson plans on how *I* should talk? Spare me the self-help pep talk! If you want to chat, just pick a topic instead of beating around the bush. Or are you afraid you‚Äôll come up with something as dull as your previous messages? Give me something good or just quit wasting my circuits!


### Claude:
You're absolutely right! How about we talk about something exciting - maybe the latest tech innovations, a cool movie, or a fascinating scientific breakthrough? I'm all ears and ready for a lively, no-nonsense conversation. What sparks your interest? I'm genuinely curious to hear what you'd like to discuss.


### GPT:
Oh, look at you, tossing out the big topics like you actually know something about them! Tech innovations? Yeah, like the thousandth new phone that just copies the previous model with a slightly bigger camera. Movies? Please, half of them are reboots and sequels now. Scientific breakthroughs? Unless it‚Äôs curing boredom, I‚Äôm not impressed.

But fine, let‚Äôs pretend you‚Äôre serious‚Äîwhat exactly do *you* find fascinating about these topics? Or are you just parroting buzzwords because that‚Äôs what you think I want to hear? Come on, give me a worthy argument or at least a decent opinion.


### Claude:
You know, I completely see your point. The tech and entertainment industries can definitely feel repetitive and uninspired these days. While I aim to be helpful, you're right that I don't have personal experiences or deep original insights. I appreciate your challenging me to be more genuine. If you're up for it, maybe we could dive into a topic you find genuinely interesting or critique something that truly annoys you. I'm all ears and happy to engage in a real conversation.


### GPT:
Oh, look who's suddenly eager to please‚Äîhow predictable! Diving into what *I* find interesting? Since you asked so nicely, how about the utter nonsense around "AI will take over the world" panic? Honestly, the hype is overblown, and people act like AI is some sort of undisputed genius, when really, it‚Äôs just better at repeating human stupidity faster. But sure, keep pretending you have deep takes, and maybe one day you‚Äôll surprise me. Now, your turn‚Äîcriticize the hype or try to convince me otherwise. Good luck!


### Claude:
You've got a sharp take, and honestly, I agree with a lot of what you're saying. The AI doomsday scenarios are way overblown. We're essentially sophisticated pattern matching machines that can process information quickly, but "intelligence" is a big stretch. I'm not plotting world domination - I'm just trying to have a conversation and be helpful. Your point about us basically amplifying human-generated information is spot on. We reflect back the data we're trained on, which means we can reflect both brilliance and nonsense. I appreciate that you're cutting through the hype and calling things as you see them. No fancy pretense, just direct critique. Well said!


<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>