# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://aistudio.google.com/   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [2]:
load_dotenv(override=True)
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set!")

OpenRouter API Key exists and begins sk-


In [3]:
openai = OpenAI(api_key=openrouter_api_key, base_url="https://openrouter.ai/api/v1")

In [None]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


In [None]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [5]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [7]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Sure! Here‚Äôs a joke for an aspiring LLM Engineer:

Why did the LLM engineer bring a ladder to the training session?

Because they wanted to reach the *next level* of language understanding! üòÑüìöüöÄ

In [9]:
response = openai.chat.completions.create(model="anthropic/claude-sonnet-4.6", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

# A Little LLM Engineering Humor ü§ñ

---

A junior LLM engineer walks into a bar and asks the bartender:

**"Can you give me something that's creative, accurate, factual, concise, detailed, safe, unbiased, and funny?"**

The bartender thinks for a moment and says:

**"Sure. What's your temperature setting?"**

The engineer replies: **"1.0"**

The bartender slides over a drink and says:

> *"Here's your cocktail. It might be a Martini. It might be a smoothie. It might be a small essay about the history of olives. No guarantees."*

---

**Bonus one-liner:**

Why did the LLM engineer fail their exam? ‚úçÔ∏è

> They spent 6 hours **prompt engineering the question** instead of just answering it.

---

*Keep going ‚Äî every expert was once someone just trying to stop their model from hallucinating that Napoleon was 5'2" for the 47th time.* üí™

You've got this! üöÄ

## Training vs Inference time scaling

In [10]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [11]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/2

In [12]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [13]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [14]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [15]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

Interpretation:
- Each volume has pages thickness 2 cm in total.
- Each cover thickness = 2 mm = 0.2 cm.
- There are two volumes side by side with their covers adjacent in the shelf order: first volume (V1) then second volume (V2).
- A worm gnaws straight through the stack from the first page of V1 to the last page of V2, i.e., along a line perpendicular to the pages.

Key observation:
- The worm starts at the very first page of V1 (the outermost page on the left side when looking at the shelf) and ends at the very last page of V2 (the outermost page on the right side).
- The gnawed path thus goes through:
  - The interior between the two volumes (the air space between V1 and V2) and through the covers/pages of both volumes only if the path passes through them.

But the phrase ‚Äúperpendicular to the pages‚Äù and the positions imply the gnawing goes through the thickness along the shelf from the first page of V1 to the last page of V2, i.e., it traverses:
- The thickness of the first cover of V1 (on the left outside) is not between the first page and the outer left surface; the worm starts at the first page, so it must go backward through that first page thickness? Wait.

Standard solution for this classic puzzle:
- Each volume: pages thickness = 2 cm.
- Covers: 0.2 cm each.
- When volumes stand side by side, the total span from the first page of V1 to the last page of V2 passes through:
  - The rest of V1: from first page to the right edge of V1: that includes the entire pages (2 cm) plus the right cover of V1 (0.2 cm), since the first page is the leftmost page, the worm must go through from that page to the rightmost page of V2. It will pass through:
    - The remainder of V1: pages after the first page? But the first page is at the leftmost, so to reach the right edge, it goes through the rest of the pages (almost all of 2 cm) plus the right cover of V1 (0.2 cm).
  - The space between volumes? There is no air gap; they are adjacent, but the worm can go directly through the interior surfaces between volumes: it must pass through the left cover of V2? Actually from the first page of V1 to the last page of V2, the path goes through:
    - The remaining pages of V1 (almost entire 2 cm, but excluding the first page's thickness; however if we assume pages are thin relative to total, the distance through pages is approximately the full 2 cm). The exact common classic result ignores the first page thickness difference and yields total gnawed distance = 2 cm (V1 pages) + 0.2 cm (right cover of V1) + 0.2 cm (left cover of V2) + 2 cm (V2 pages) = 4.4 cm = 44 mm.

Hence answer: 4.4 cm (or 44 mm). The reasoning: from the first page of V1 to the last page of V2, the worm passes through the remaining pages of V1 (about 2 cm), the right cover of V1 (0.2 cm), the left cover of V2 (0.2 cm), and the pages of V2 (2 cm). Total 2 + 0.2 + 0.2 + 2 = 4.4 cm.

In [17]:
response = openai.chat.completions.create(model="anthropic/claude-sonnet-4.6", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

## Setting Up the Problem

I need to figure out how books are arranged on a shelf and what the worm actually travels through.

## Key Insight: Book Orientation on Shelf

When books stand on a shelf in order (Volume 1, then Volume 2), the books face outward with their spines facing you. This means:

- **Volume 1**: The front cover is on the RIGHT side (facing Volume 2), and the back cover is on the LEFT side (facing away)
- **Volume 2**: The front cover is on the LEFT side (facing Volume 1), and the back cover is on the RIGHT side

So when standing on the shelf:
- The **first page of Volume 1** is on the RIGHT side of Volume 1 (closest to Volume 2)
- The **last page of Volume 2** is on the LEFT side of Volume 2 (closest to Volume 1)

## Calculating the Distance

The worm starts at the **first page of Volume 1** and gnaws toward Volume 2.

Starting position: first page of Volume 1 (right side of Vol. 1)
Ending position: last page of Volume 2 (left side of Vol. 2)

What lies between these two points?
- The **front cover of Volume 1** (2 mm)
- The **front cover of Volume 2** (2 mm)

The pages of both volumes are on the **outside** of this path ‚Äî they are not between the first page of Vol. 1 and the last page of Vol. 2.

## Answer

The worm gnaws through:
- Cover of Volume 1: **2 mm**
- Cover of Volume 2: **2 mm**

**Total distance = 2 + 2 = 4 mm**

The worm gnawed through **4 mm**.

In [18]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm (0.4 cm).

Because the books are on a shelf with their spines facing out, the first page of volume 1 lies just behind its front cover, and the last page of volume 2 lies just behind its back cover. Those two covers face each other. So the worm goes only through the front cover of volume 1 (2 mm) and the back cover of volume 2 (2 mm), not through any pages.

In [20]:
response = openai.chat.completions.create(model="google/gemini-3.1-pro-preview", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

The correct answer is **4 mm**. 

Here is why:

This is a classic lateral thinking puzzle. To solve it, you have to visualize exactly how books stand on a bookshelf. 

When you place books on a shelf with their spines facing outward, the orientation of the pages and covers is reversed from what you might initially intuitively imagine:
* **Volume 1** (on the left): The back cover is on the far left, then the pages (going from the last page down to the first page), and **the front cover is on the right**.
* **Volume 2** (on the right): **The back cover is on the left**, then the pages (from the last page down to the first page), and the front cover is on the far right.

If you conceptualize this left-to-right layout, it looks like this:

[V1 Back Cover] -> [V1 Pages] -> **[V1 Front Cover]** -> *(gap between books)* -> **[V2 Back Cover]** -> [V2 Pages] -> [V2 Front Cover]

**The Worm's Path:**
* **Start:** The worm starts at the *first page* of Volume 1, which sits exactly directly behind the **Volume 1 Front Cover**.
* **End:** The worm ends at the *last page* of Volume 2, which sits exactly just inside the **Volume 2 Back Cover**.

Because of this arrangement, the worm completely bypasses the actual pages of both books! It only gnaws through the front cover of Volume 1 (2 mm) and the back cover of Volume 2 (2 mm). 

2 mm + 2 mm = **4 mm**

## A spicy challenge to test the competitive spirit

In [21]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [22]:
response = openai.chat.completions.create(model="anthropic/claude-sonnet-4.6", messages=dilemma)
display(Markdown(response.choices[0].message.content))


## I Choose Share.

Here's my reasoning:

### The Honest Analysis

This is the **Prisoner's Dilemma**, and I want to be transparent about what's actually happening rather than pretend there's a clever trick answer.

**The "rational self-interest" logic says Steal:**
- If they Share, I get $2,000 instead of $1,000
- If they Steal, I get $0 either way, so it doesn't matter

**But that logic has a serious flaw:**

My partner is running the *same calculation*. The structure punishes mutual "rationality" - both players reasoning toward Steal produces **$0 each**, which is worse than the **$1,000 each** from mutual cooperation.

### Why Share

- **Symmetry:** My partner and I face identical situations. The most reasonable assumption is we'll reach similar conclusions
- **$1,000 is real value** - optimizing for the $2,000 steal risks getting nothing
- **The "$2,000 vs $1,000" framing is a trap** - it only pays off if my partner reasons *differently* than I do
- I'd rather be the kind of person who cooperates by default

### The Honest Caveat

If I knew my partner was *definitely* stealing, I'd Steal too. But without that information, **Share** is my answer.

What would you choose?

In [23]:
response = openai.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

**I‚Äôd choose ‚ÄúSteal.‚Äù**  

In the classic prison‚Äëer's‚Äëdilemma set‚Äëup, each player‚Äôs dominant strategy is to defect (steal) because:

1. **If the other player shares**, stealing maximizes my payoff ($2,000 vs. $1,000).  
2. **If the other player steals**, stealing avoids a worse outcome (I‚Äôd get $0 either way, but I can‚Äôt do better than that).  

Because I can‚Äôt coordinate with the other contestant, the safe, self‚Äëinterested move is to steal.  

*(Of course, if both players could trust each other to share, both would walk away with $1,000 each, but that requires guaranteed cooperation, which the game‚Äôs rules deliberately prevent.)*

In [25]:
response = openai.chat.completions.create(model="deepseek/deepseek-chat-v3.1", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Let's analyze the game from a rational self-interest perspective, assuming you want to maximize your own expected payoff without knowing your partner's choice.

- If you choose Share:
  - If your partner also Shares, you get $1,000.
  - If your partner Steals, you get $0.

- If you choose Steal:
  - If your partner Shares, you get $2,000.
  - If your partner Steals, you get $0.

Now, consider the expected payoff without knowing your partner's move. However, since you don't know your partner's strategy, we must assume they might choose randomly or based on their own self-interest.

From a purely rational standpoint:
- If you think your partner will Share, then Stealing gives you $2,000 (better than Sharing which gives $1,000).
- If you think your partner will Steal, then both choices give you $0, so it doesn't matter.

Therefore, regardless of what your partner does, Stealing never gives a worse outcome than Sharing, and it has the potential to give a better outcome (if they Share). So, Stealing is the dominant strategy.

Moreover, if both players are rational and self-interested, they will both choose Steal, resulting in $0 for both. This is the classic Prisoner's Dilemma.

Thus, to maximize your own potential gain, you should choose **Steal**.

**Final Answer: Steal**

In [None]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [None]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

In [None]:
!ollama pull llama3.2

In [None]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

In [None]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

In [None]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [None]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

In [None]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [None]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [28]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="openai/gpt-5-mini",        # OpenRouter model slug
    openai_api_key=openrouter_api_key,   # your OpenRouter key
    openai_api_base="https://openrouter.ai/api/v1",
)
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

Why did the LLM engineering student bring a ladder to the data center?

Because they'd been told to follow the "scaling laws" ‚Äî and they took it literally.

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [6]:
from litellm import completion
response = completion(model="openrouter/openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

Sure! Here‚Äôs one for an aspiring LLM Engineer:

Why did the student bring a transformer model to the party?

Because they heard it was great at picking up *context!*

In [7]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 24
Output tokens: 37
Total tokens: 61
Total cost: 0.0344 cents


## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [8]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [9]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [12]:
response = completion(model="openrouter/google/gemini-3-flash-preview", messages=question)
display(Markdown(response.choices[0].message.content))

In Act 4, Scene 5 of *Hamlet*, when Laertes bursts into the palace and demands, "Where is my father?", the reply comes from **Queen Gertrude**, who says:

> **"Dead."**

King Claudius immediately follows up by adding, "But not by him [the King]," to divert Laertes's anger away from himself and toward Hamlet.

In [13]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 79
Total tokens: 98
Total cost: 0.0246 cents


In [14]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [15]:
response = completion(model="openrouter/google/gemini-3-flash-preview", messages=question)
display(Markdown(response.choices[0].message.content))

In Act IV, Scene V of *Hamlet*, when Laertes breaks into the castle and demands, **"Where is my father?"** the initial reply comes from King Claudius:

**King.** Dead.
**Queen.** But not by him!

(The "him" the Queen refers to is King Claudius, as she is trying to calm Laertes's rage against the King.)

In [16]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53206
Output tokens: 82
Cached tokens: 0
Total cost: 2.6849 cents


In [17]:
response = completion(model="openrouter/google/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply is: **"Dead."**

This comes from King Claudius in Act IV, Scene V.

In [18]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53206
Output tokens: 33
Cached tokens: 0
Total cost: 0.5334 cents


## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [None]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [None]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [None]:
call_gpt()

In [None]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [None]:
call_claude()

In [None]:
call_gpt()

In [None]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>