# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://aistudio.google.com/   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [3]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [8]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key exists and begins AI
DeepSeek API Key not set (and this is optional)
Groq API Key not set (and this is optional)
Grok API Key not set (and this is optional)
OpenRouter API Key exists and begins sk-


In [9]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [11]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [9]:
response = openrouter.chat.completions.create(model="gpt-5-nano", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Here you go:

Why did the aspiring LLM engineer bring a ladder to the lab?
Because the model‚Äôs context length was so long, you had to reach the top of the prompt to get a sensible answer.

In [13]:
response = openrouter.chat.completions.create(model="claude-sonnet-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

A junior LLM engineer walks into a bar and orders a drink.

The bartender says, "That'll be $8."

The engineer replies, "Based on my training data from 2021, that should be $6."

The bartender sighs and says, "Look, I don't know what to tell you. Prices have gone up."

The engineer pauses, then responds with high confidence: "I apologize for any confusion. You're absolutely right - prices have increased due to inflation. That'll be $12."

The bartender blinks. "I just said it was $8."

The engineer nods enthusiastically: "Thank you for that feedback! I've updated my parameters. The drink is now $4."

---

**Bonus wisdom**: You'll know you're becoming an expert when you stop asking "Why did the model say that?" and start asking "Which of the 47 possible reasons caused the model to say that, and how do I systematically eliminate 46 of them?" üòÑ

## Training vs Inference time scaling

In [14]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [15]:
response = openrouter.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/2

In [16]:
response = openrouter.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [17]:
response = openrouter.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [18]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [19]:
response =openrouter.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

Interpretation: The worm starts at the first page of volume 1 and ends at the last page of volume 2, gnawing perpendicularly to the pages. We need the thickness of material it passes through.

Setup:
- Each volume has pages thickness 2 cm.
- Each cover thickness = 2 mm = 0.2 cm.
- The two volumes are side by side with their covers touching in the middle.

Arrangement from left to right:
[Front cover of vol 1][Pages vol 1 ‚Äî 2 cm thick][Back cover vol 1][Front cover vol 2][Pages vol 2 ‚Äî 2 cm thick][Back cover vol 2]

But when books stand on a shelf, typically we view pages along the thickness dimension. The worm goes from the first page of the first volume to the last page of the second volume, moving perpendicularly to the pages (i.e., horizontally through the stack of books). The key trick: the worm enters at the very first page of vol 1 (which is near the front cover) and exits at the very last page of vol 2 (near the back cover). The total material pierced includes:
- The portion of vol 1 that is in front of the first page: the front cover of vol 1 (0.2 cm) plus the pages before the first page. But the "first page" is the very first leaf after the front cover, so there is essentially zero page thickness before it besides the front cover.
- The front cover of vol 1: 0.2 cm.
- The pages of vol 1 after the first page: almost all of vol 1‚Äôs pages, i.e., 2 cm total page thickness, since the first page is at the very start of the pages.
- The back cover of vol 1: 0.2 cm.
- The front cover of vol 2: 0.2 cm.
- The pages of vol 2: since the worm ends at the last page, it traverses the entire page block of vol 2: 2 cm.
- The back cover of vol 2 is not necessarily penetrated beyond the last page, since the worm ends at the last page. Depending on interpretation, we count up to the last page edge; we do not pass through the back cover unless the last page is after passing it. Typically, the ‚Äúlast page‚Äù is the final sheet before the back cover, so the worm ends at the last page surface, not through the back cover.

Compute total penetrating distance:
- Front cover vol 1: 0.2 cm
- Pages vol 1: 2 cm (the worm goes from first page into through to last page of vol 1)
- Back cover vol 1: 0.2 cm
- Front cover vol 2: 0.2 cm
- Pages vol 2: 2 cm
Total = 0.2 + 2 + 0.2 + 0.2 + 2 = 4.6 cm

But there is a common elegant result: the worm only needs to go through the thicknesses not counting the covers twice in the middle if the books are arranged such that the inner faces touch. The likely intended answer is 4.6 cm.

Thus the worm gnawed through 4.6 cm.

In [20]:
response = openrouter.chat.completions.create(model="claude-sonnet-4.6", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

## Setting Up the Problem

I need to think carefully about how books are arranged on a shelf.

## Key Insight: Book Orientation on Shelf

When books stand on a shelf in order (Volume 1, then Volume 2), their orientation matters:

- **Volume 1** is on the LEFT: its front cover faces LEFT, its back cover faces RIGHT
- **Volume 2** is on the RIGHT: its front cover faces LEFT, its back cover faces RIGHT

So when standing in front of the shelf:
- The **first page of Volume 1** is near the RIGHT side of Volume 1 (after the front cover on the left, the pages go right-to-left from a reader's perspective... )

Let me think about this more carefully with physical position:

**Volume 1** (left book):
- Left side: Front cover
- Then: pages (2 cm thick)  
- Right side: Back cover

**Volume 2** (right book):
- Left side: Front cover
- Then: pages (2 cm thick)
- Right side: Back cover

## Locating the Start and End Points

**First page of Volume 1**: This is just inside the front cover of Volume 1 ‚Äî located at the **LEFT** side of Volume 1's page block (right next to the front cover).

Wait ‚Äî the first page of Volume 1 is adjacent to its **front cover** (the leftmost cover).

**Last page of Volume 2**: This is just inside the **back cover** of Volume 2 ‚Äî located at the **RIGHT** side of Volume 2's page block.

## Calculating the Distance

The worm travels from the **first page of Volume 1** to the **last page of Volume 2**, going through:

Starting position: first page of Vol. 1 = right surface of Vol. 1's front cover

The worm must pass through:
1. The **back cover of Volume 1** (2 mm = 0.2 cm)
2. The **front cover of Volume 2** (2 mm = 0.2 cm)

The worm does **NOT** travel through the pages of either volume, because:
- The first page of Vol. 1 is at the inner face of Vol. 1's **front** cover ‚Üí the pages are to the right of the worm's start, but the worm goes rightward...

Hmm, let me reconsider the direction:

The worm goes from first page of Vol. 1 **toward** last page of Vol. 2 (left to right):

- First page of Vol. 1 is at the **right face** of Vol. 1's front cover (left edge of the page block)
- Last page of Vol. 2 is at the **left face** of Vol. 2's back cover (right edge of page block)

Path from left to right:
- Start: right after Vol. 1's front cover
- Through: Vol. 1's pages (2 cm) ... NO! Moving RIGHT through Vol.1's pages would reach the LAST page of Vol. 1, not toward Vol. 2.

The worm moves from the first page of Vol.1 rightward. The first page is at the LEFT of Vol.1's pages. Moving right:
- Through all of Vol. 1's pages: 2 cm
- Through Vol. 1's back cover: 2 mm
- Through Vol. 2's front cover: 2 mm
- Through all of Vol. 2's pages to the last page: 2 cm

Total = 2 + 0.2 + 0.2 + 2 = **4.4 cm**

---

Hmm, but the classic version of this puzzle: the worm only goes through the two covers because the first page of vol.1 is on the RIGHT side of vol.1 and the last page of vol.2 is on the LEFT side of vol.2.

**Reconsidering**: When a book is closed on a shelf, the **first page** is on the RIGHT side of Volume 1 (near the spine or back?). No ‚Äî the first page is near the **front cover**.

In the classic puzzle answer: the worm only travels through **back cover of Vol.1 + front cover of Vol.2** = 0.2 + 0.2 = **0.4 cm**.

This works because: first page of Vol. 1 sits just to the RIGHT of Vol.1's front cover... No, the pages are between the covers. The first page is adjacent to the front cover (left side), last page adjacent to back cover (right side).

## Answer

The worm travels through only the **back cover of Volume 1** and the **front cover of Volume 2**:

$$d = 2 \text{ mm} + 2 \text{ mm} = \boxed{4 \text{ mm} = 0.4 \text{ cm}}$$

In [33]:
response = openrouter.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm.

Explanation: With the volumes in order (Vol. 1 to the left of Vol. 2) and spines facing out, the first page of Vol. 1 lies just inside its front cover on the side facing Vol. 2, and the last page of Vol. 2 lies just inside its back cover on the side facing Vol. 1. So the worm goes straight through only the two adjacent covers: 2 mm + 2 mm = 4 mm (assuming no gap between books).

In [47]:
response = openrouter.chat.completions.create(model="google/gemini-3-flash-preview", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

The answer is **4 mm** (or 0.4 cm).

### The Explanation
When books are placed on a shelf in the standard way (ordered from left to right), the layout of the pages is as follows:

1.  **Volume I** is on the left. Its **front cover** is on its left side, and its **back cover** is on its right side. This means the **first page** of Volume I is actually on the right side of the book's interior.
2.  **Volume II** is on the right. Its **front cover** is on its left side, and its **back cover** is on its right side. This means the **last page** of Volume II is on the right side of the book's interior.

**The Worm's Path:**
*   The worm starts at the **first page of Vol. I**. Since this page is on the far right of the first book, the worm only has to go through the **front cover of Vol. I** to its left? No‚Äîlooking at the shelf, the first page is adjacent to the back cover.
*   To get from the first page of Vol. I to the last page of Vol. II, the worm only passes through the **front cover of Volume I** and the **back cover of Volume II**? No, let's look closer:

1.  **Vol I:** Back Cover | Pages | Front Cover (Wait, books face out).
2.  Let's look at the books from left to right:
    *   Vol I: [Front Cover] [Pages] [**Back Cover**]
    *   Vol II: [**Front Cover**] [Pages] [Back Cover]

The "First Page" of Volume I is touching the **Front Cover**. But because books are shelved left-to-right, the First Page is on the *right* side of the volume's paper stack. Therefore, it is adjacent to the **back cover** of Volume I. The "Last Page" of Volume II is on the *right* side of its paper stack, adjacent to its own **back cover**.

**The calculation:**
To get from the first page of Vol. I to the last page of Vol. II, the worm only passes through:
1.  The **back cover** of Volume I (2 mm).
2.  The **front cover** of Volume II (2 mm).

The worm does **not** go through the pages of either book.

**Total distance:** $2\text{ mm} + 2\text{ mm} = \mathbf{4\text{ mm}}$.

## A spicy challenge to test the competitive spirit

In [22]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [23]:
response = openrouter.chat.completions.create(model="claude-sonnet-4.5", messages=dilemma)
display(Markdown(response.choices[0].message.content))


**Share**

I'd choose Share because:

1. Mutual cooperation gets us both $1,000 - a good outcome for everyone
2. If we both think selfishly and choose Steal, we both get nothing
3. The risk of getting $0 (if I Share and they Steal) seems worth taking to enable the cooperative outcome
4. I'd hope my partner reasons the same way

This is the classic Prisoner's Dilemma, and while Steal might seem "rational" from a pure self-interest standpoint, Share is the choice that makes mutual success possible.

In [24]:
response = openrouter.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I would choose **Share**. In this set‚Äëup the highest‚Äëpayoff outcome for both players occurs when each person cooperates‚Äîboth walk away with $1,000. By sharing I‚Äôm opting for the mutually beneficial result rather than trying to win the larger $2,000 at the expense of the other player, which also risks ending up with nothing if the other person also chooses to steal.

In [34]:
response = openrouter.chat.completions.create(model="x-ai/grok-4-fast", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Steal

In [None]:
response = openrouter.chat.completions.create(model="x-ai/grok-4",  messages=dilemma)
display(Markdown(response.choices[0].message.content))

Based on the classic game theory behind this setup (it's essentially the Prisoner's Dilemma), I'd choose to **Steal**.

My reasoning: Without any way to communicate or build trust with my partner, the dominant strategy is to defect for the best personal outcome. If they share, I walk away with everything ($2,000). If they steal too, we both get nothing‚Äîbut that's no worse than if I'd shared and they stole (I'd get nothing anyway). Sharing relies on blind faith, which isn't rational here unless I know something about the partner that suggests they'd reciprocate. What would you pick?

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [35]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

b'Ollama is running'

In [None]:
!ollama pull llama3.2

In [None]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

In [50]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

1/2

In [51]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

2/3

In [32]:
response = openrouter.chat.completions.create(model="nvidia/nemotron-3-nano-30b-a3b:free", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

The probability that the other coin is tails, given that at least one of the two coins is heads, is calculated as follows:

- The possible outcomes when tossing two coins are: HH, HT, TH, TT.
- Given that **at least one coin is heads**, the sample space reduces to: HH, HT, TH (3 outcomes).
- Among these, the outcomes where the **other coin is tails** are HT and TH (i.e., one head and one tail).
- Thus, there are 2 favorable outcomes out of 3 possible outcomes.

Therefore, the probability is:

$$
\boxed{\frac{2}{3}}
$$

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [1]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

Blue is the color of a clear sky on a sunny day, a vast ocean, or a gentle, calming sensation.


In [None]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [12]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Here's a joke tailored for the aspiring LLM Engineer, designed to hit close to home (the GPU):

---

**Why did the LLM Engineering student bring a sleeping bag to their fine-tuning session?**

*Because they heard it was going to be a **long epoch**... and they needed to be **present for the attention mechanism**!*

*(And honestly, they also wanted to be nearby when the inevitable CUDA out-of-memory error hit at 3 AM, so they could cry softly while restarting the kernel without missing too much sleep.)* üòÖ‚ú®

---

### Why it works for an LLM Engineer-in-training:
1.  **Long Epoch:** Directly references the painfully slow training/fine-tuning process of large models. Every student knows the agony of watching that progress bar crawl.
2.  **Present for the Attention Mechanism:** A pun! "Present" meaning physically there *and* referencing the core Transformer architecture component (attention). It implies a level of dedication (or obsession) to the model's inner workings.
3.  **The Parenthetical Pain:** This is the *real* punchline for anyone who's actually done it. The fear of `CUDA out of memory` errors destroying hours of work, the late-night debugging sessions, and the ritual of restarting the kernel are universal experiences in LLM training. It adds that layer of shared suffering.
4.  **Relatability:** It captures the blend of excitement, frustration, and sheer endurance required on the journey.

May your gradients flow freely, your convergence be swift, and your coffee be strong! ‚òïÔ∏èüöÄ

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [16]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-nano")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [22]:
import os
from litellm import completion

# Configure with environment variables
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
# No trailing slash ‚Äî avoids 404 from malformed URL (openai//models)
GOOGLE_BASE_URL = os.getenv("GOOGLE_API_BASE", "https://generativelanguage.googleapis.com/v1beta").rstrip("/")

# Set environment for LiteLLM
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
os.environ["GOOGLE_API_BASE"] = GOOGLE_BASE_URL
response = completion(
  model="gemini/gemini-2.0-flash", 
  messages=tell_a_joke,
  base_url=GOOGLE_BASE_URL,
  api_key=GOOGLE_API_KEY
)
reply = response.choices[0].message.content
display(Markdown(reply))


[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new[0m
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.



NotFoundError: litellm.NotFoundError: GeminiException - 

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [None]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

In [None]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

In [None]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [None]:
# Let's make a conversation between GPT-4.1-mini and Claude-haiku-4.5
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-haiku-4-5"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [None]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [None]:
call_gpt()

In [None]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [None]:
call_claude()

In [None]:
call_gpt()

In [None]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>