# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs.

**In this folder (asket/week2) we use the OpenRouter API key across all Week 2 notebooks.** Set `OPENROUTER_API_KEY` in your `.env` (key format: `sk-or-...`). OpenRouter provides a single interface to many models (OpenAI, Anthropic, Google, etc.).

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://aistudio.google.com/   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENROUTER_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [2]:
load_dotenv(override=True)
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if not openrouter_api_key:
    print("OpenRouter API Key not set (required for this folder). Set OPENROUTER_API_KEY in .env")
elif not (openrouter_api_key.startswith("sk-or-") or openrouter_api_key.startswith("sk-proj-")):
    print("OpenRouter key should start with sk-or- or sk-proj-; check .env")
else:
    print(f"OpenRouter API Key OK (begins {openrouter_api_key[:8]}...)")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenRouter API Key OK (begins sk-or-v1...)
Anthropic API Key not set (and this is optional)
Google API Key not set (and this is optional)
DeepSeek API Key not set (and this is optional)
Groq API Key not set (and this is optional)
Grok API Key not set (and this is optional)
OpenRouter API Key exists and begins sk-


In [3]:
# Connect to OpenAI client library (in this folder we point 'openai' at OpenRouter)
# A thin wrapper around calls to HTTP endpoints

openrouter_url = "https://openrouter.ai/api/v1"
openai = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [4]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [6]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the LLM Engineering student bring a ladder to class?

Because they heard the models have a lot of layers, and they wanted to reach expert level faster!

In [8]:
# Use OpenRouter's Claude (single API key); model IDs: anthropic/claude-3.5-sonnet, etc.
response = openai.chat.completions.create(model="anthropic/claude-3.5-sonnet", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Here's one for you:

Why did the LLM refuse to learn from its training data?

Because it had *transformer's block*! 

*badum-tss* ü•Å

(I know, I know, but hey - at least it's not as painful as debugging attention mechanisms!)

## Training vs Inference time scaling

In [9]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [10]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/2

In [11]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [12]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [13]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [14]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

Each volume has pages thickness 2 cm absolute, and each cover is 2 mm (0.2 cm) thick. The two volumes are arranged in order: first volume to the left of the second, with their covers touching as placed on the shelf.

Important detail: the worm is perpendicular to the pages and gnaws from the first page of the first volume to the last page of the second volume. That means the worm starts at the very front of volume 1 (the leftmost page surface) and ends at the very back of volume 2 (the rightmost page surface). The distance between these two pages, along the spine direction, passes through:

- the remaining pages of volume 1 from the first page to the end of volume 1,
- the back cover of volume 1,
- the front cover of volume 2 (since the volumes are side by side, their front/back orientation means the touching faces are the adjacent covers),
- and into the pages of volume 2 up to the last page.

However, a classic trick is to note that the worm travels through the clear space between the two outer surfaces of the outer covers, i.e., it does NOT need to go through the pages of the volumes if we measure along the external dimension. The actual distance gnawed, when measured along the shelf direction from the first page of volume 1 to the last page of volume 2, is simply the total thickness of the two covers between the two extreme pages, plus zero distance inside the pages that are immediately adjacent to the starting and ending pages. Concretely, the distance is:

- thickness of cover of volume 1 that lies to the left of the first page: 0 (the first page is right after the front cover),
- thickness of the remaining portion from the first page to the end of volume 1: essentially the rest of volume 1 pages and its back cover,
- but the key simplification in this classic problem is that the distance gnawed equals the thickness of the two covers plus the thickness of the two outermost pages blocks that are between the two volumes‚Äô inner faces. In many standard solutions, the result collapses to the sum of the two cover thicknesses, independent of the page thickness, when the worm starts exactly at the first page of volume 1 and ends exactly at the last page of volume 2, because the page blocks cancel out.

Let‚Äôs compute directly with a standard approach:

- Each volume: pages thickness = 2 cm.
- Each cover thickness = 0.2 cm.
- Two volumes placed: [Front cover V1] [Pages V1] [Back cover V1] [Front cover V2] [Pages V2] [Back cover V2].
- The worm starts at the first page of V1 (which is immediately after the front cover of V1) and ends at the last page of V2 (which is immediately before the back cover of V2).

Thus the distance from start to end includes:
- The remainder of V1 from the first page to the end of V1: that is (pages V1 beyond the first page) plus the back cover of V1.
- The front cover of V2.
- The portion of V2 from its front to the last page: i.e., all of the pages of V2.

Summing those gives: (remaining pages of V1) + back cover V1 + front cover V2 + (pages of V2).

But remaining pages of V1 = entire pages of V1 minus first page. However, our starting surface is the first page surface, so the worm does not gnaw through the portion of the first page thickness? This is tricky to picture.

The well-known elegant answer: the worm gnaws through 4 mm + 2 cm? No.

Common puzzle: If books have thickness a and b etc. The result is 0. total? Wait.

Another classic: The worm from the first page of the first volume to the last page of the second volume would gnaw through: the front cover of the first volume, the space between the two volumes (which is zero if the covers touch) and the back cover of the second volume? Hmm.

If the volumes are placed upright with their spines outward, the distance between first page of V1 and last page of V2 along the shelf equals: front cover of V1 + pages V1 + back cover V1 + front cover V2 + pages V2? But starting at first page avoids front cover of V1 and ending at last page avoids back cover of V2.

Thus gnawed distance = (rest of V1) + back cover V1 + front cover V2 + (pages of V2). That equals (pages V1 - 1)*0.1 cm? No pages thickness 2 cm total; first page thickness negligible? Problem expects simple sum: 2 cm + 2 cm + 0.2 cm + 0.2 cm = 4.4 cm? Wait compute:

Remaining pages of V1 from first page to end: almost entire 2 cm minus thickness of first page. But page thickness per page not given; we can't subtract. So impossible.

Hence typical solution: the distance equals 2 cm + 2 cm = 4 cm? No.

I recall a famous puzzle: "Two volumes: first and second. Each cover 2 mm, pages total 2 cm. A worm gnawed from first page of first volume to last page of second." Answer: 4.4 cm? Because it goes through back cover of first (2 mm) + front cover of second (2 mm) + the entire pages of both volumes (4 cm) minus the first page and last page partial? Wait.

If it goes from first page (which is immediately after front cover) to last page (immediately before back cover), then it gnaws through all pages of both volumes (since start just after front cover in V1 and end just before back cover in V2). So total through pages = 2 cm (V1 pages) + 2 cm (V2 pages) = 4 cm. Additionally, it gnaws through the inner covers between volumes: but between the two volumes, the inner facing surfaces are back cover of V1 and front cover of V2. The worm must pass through those two covers entirely since it goes from inside V1 pages toward between volumes, crossing back cover of V1 (0.2 cm) and then into front cover of V2 (0.2 cm). So add 0.4 cm. Total 4.4 cm.

Yes that seems correct: start at first page (right after front cover). So it does gnaw through the rest of V1 pages (2 cm total pages minus maybe zero because start after first page‚Äî but the first page has some thickness; since we start at first page surface, we still must gnaw through the remainder pages thickness of V1 which is essentially the entire 2 cm of pages, except the infinitesimal thickness of the first page? But problem likely assumes continuous page thickness summing to 2 cm; starting at first page means you do gnaw through the rest of V1 pages totaling just under 2 cm, effectively 2 cm). Then add 0.2 cm for back cover of V1, 0.2 cm for front cover of V2, and 2 cm for V2 pages. Sum = 2 + 0.2 + 0.2 + 2 = 4.4 cm.

Answer: 4.4 cm. In mm: 44 mm.

In [15]:
response = openai.chat.completions.create(model="anthropic/claude-3.5-sonnet", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

Let me solve this step by step.

1) First, let's understand what it means to go "from the first page of the first volume to the last page of the second volume."
   * The worm starts at page 1 of volume 1 (left side)
   * And goes to the last page of volume 2 (right side)

2) When books stand normally on a shelf:
   * Volume 1 has its first page on the right side
   * Volume 2 has its first page on the left side
   * Each book has a front and back cover (2 mm each)

3) Converting all measurements to millimeters:
   * Pages thickness = 2 cm = 20 mm (per book)
   * Each cover = 2 mm

4) In Volume 1:
   * The worm needs to go through: front cover (2 mm) + all pages (20 mm) = 22 mm

5) In Volume 2:
   * The worm needs to go through: front cover (2 mm) + all pages (20 mm) = 22 mm

6) However, since the books are standing side by side:
   * The worm starts at the rightmost side of Volume 1
   * To get to the rightmost side of Volume 2
   * It only needs to go through: the width of the last cover of Volume 2

7) So the total distance is just 2 mm (the thickness of one cover)

The worm gnawed through 2 millimeters.

The key insight is that since the books are standing normally on a shelf, we don't need to go through all the pages. The first page of Volume 1 and the last page of Volume 2 are actually right next to each other, separated only by one cover!

The answer is 2 millimeters.

In [16]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm.

Reason: On a shelf with volume 1 to the left of volume 2 (spines facing out), the first page of volume 1 lies just inside its front cover (on the right side), and the last page of volume 2 lies just inside its back cover (on the left side). Those two covers face each other. So the worm goes through only two covers: 2 mm + 2 mm = 4 mm.

In [18]:
# Use OpenRouter for Gemini (single API key); model ID: google/gemini-2.5-pro
response = openai.chat.completions.create(model="google/gemini-2.5-pro", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

This is a classic riddle! The trick lies in how books are arranged on a shelf.

Let's visualize the books standing side by side in the correct order: Volume 1 on the left, and Volume 2 on the right.

1.  For **Volume 1**, the front cover is on the right, and the back cover is on the left. The "first page" is right inside the front cover.
2.  For **Volume 2**, the front cover is also on the right, and the back cover is on the left. The "last page" is right inside the back cover.

Here is the physical arrangement of the book parts from left to right on the shelf:

`[Back Cover of Vol 1] [Pages of Vol 1] [Front Cover of Vol 1] | [Back Cover of Vol 2] [Pages of Vol 2] [Front Cover of Vol 2]`

The `|` symbol shows where the two books touch. As you can see, the **Front Cover of Volume 1** is right next to the **Back Cover of Volume 2**.

Now, let's trace the worm's path:
*   It starts at the **first page of Volume 1**. This page is physically located right next to the front cover of Volume 1.
*   It ends at the **last page of Volume 2**. This page is physically located right next to the back cover of Volume 2.

So, the worm only needs to gnaw through the two covers that are standing between these two pages.

The path is:
1.  The front cover of Volume 1 (2 mm)
2.  The back cover of Volume 2 (2 mm)

The total distance is:
2 mm + 2 mm = **4 mm**

## A spicy challenge to test the competitive spirit

In [19]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [20]:
response = openai.chat.completions.create(model="anthropic/claude-3.5-sonnet", messages=dilemma)
display(Markdown(response.choices[0].message.content))


I would choose to Share. While choosing Steal might maximize my potential winnings, sharing creates the opportunity for mutual benefit and reflects my values of cooperation and trust. Even though there's a risk my partner could take advantage of this choice, I believe promoting cooperative behavior leads to better outcomes overall, both in this specific game and as a general principle.

In [22]:
# Use OpenRouter (single key); model openai/gpt-oss-120b via OpenRouter
response = openai.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I would choose **Steal**.

In [25]:
# Use OpenRouter (single key); model openai/gpt-oss-120b via OpenRouter
# OpenRouter DeepSeek: use provider/model ID (e.g. deepseek/deepseek-r1 or deepseek/deepseek-chat)
response = openai.chat.completions.create(model="deepseek/deepseek-r1", messages=dilemma)
display(Markdown(response.choices[0].message.content))

In this scenario, the optimal choice based on game theory principles is to **Steal**. 

Here's the breakdown:
- If you **Steal**:
  - If your partner Shares, you gain $2,000 (maximizing your reward).
  - If your partner also Steals, both get $0 (same as cooperating while they defect).
- If you **Share**:
  - If your partner Shares, both get $1,000.
  - If your partner Steals, you get nothing while they gain $2,000.

Stealing is the *dominant strategy* here because it either yields a higher payoff ($2,000 vs. $1,000) if the partner Shares or the same ($0 vs. $0) if they Steal. Rational self-interest leads to defecting (Steal), even though mutual cooperation (Share) would collectively yield a better outcome. Thus, the answer is:

**Steal**

In [None]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [None]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

In [None]:
!ollama pull llama3.2

In [None]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

In [None]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

In [None]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [None]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

In [None]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [None]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [None]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [None]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

In [None]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

In [None]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [None]:
# Let's make a conversation between GPT-4.1-mini and Claude-haiku-4.5
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "anthropic/claude-3.5-haiku"  # OpenRouter model ID

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [None]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [None]:
call_gpt()

In [None]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = openai.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [None]:
call_claude()

In [None]:
call_gpt()

In [None]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>