# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://aistudio.google.com/   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [53]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [54]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key exists and begins AI
DeepSeek API Key not set (and this is optional)
Groq API Key not set (and this is optional)
Grok API Key not set (and this is optional)
OpenRouter API Key not set (and this is optional)


In [55]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [4]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [5]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Sure! Here's a joke for an aspiring LLM engineer:

Why did the LLM engineer bring a ladder to the training session?

Because they heard they needed to work on their *‚Äúdeep‚Äù* learning! üòÑ

In [None]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

#Gemini

In [13]:
from google import genai
from google.genai import types
from IPython.display import Markdown, display
import os

client = genai.Client(api_key=os.getenv("GOOGLE_API_KEY"))

contents = [
    types.Content(
        role="user",
        parts=[types.Part(text="Tell me a joke")]
    )
]

response = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    contents=contents,
)

display(Markdown(response.text))

Why did the bicycle fall over?

Because it was two tired!

## Training vs Inference time scaling

In [None]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [None]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

In [None]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

In [None]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

## Testing out the best models on the planet

In [14]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [15]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

Interpretation:
- Each volume has pages totaling 2 cm in thickness.
- Each cover (front and back) is 2 mm thick.
- The volumes are placed side by side in order: first volume, then second volume.
- A worm starts at the first page of the first volume and gnaws straight to the last page of the second volume, moving perpendicularly through the pages.

Dimensions to consider:
- For each volume: covers add up to 2 mm + 2 cm + 2 mm = 2 cm of pages plus 4 mm of covers, total thickness per volume = 2 cm (pages) + 0.4 cm (covers) = 2.4 cm.
- The pages thickness is 2 cm per volume.

Arrangement from left to right:
- Front cover of volume 1 (2 mm)
- Pages of volume 1 (2 cm)
- Back cover of volume 1 (2 mm)
- Front cover of volume 2 (2 mm)
- Pages of volume 2 (2 cm)
- Back cover of volume 2 (2 mm)

Worm starts at the first page of volume 1 (i.e., just after the front cover of volume 1) and ends at the last page of volume 2 (i.e., just before the back cover of volume 2).

Path length through material:
- It can go straight through the interior where pages lie, traversing:
  - The remainder of volume 1 pages after the starting point: essentially almost the full 2 cm of volume 1 pages.
  - Through the back cover of volume 1 (0.2 cm).
  - Through the front cover of volume 2 (0.2 cm).
  - Through the entire pages of volume 2 (2 cm).

But the standard classic solution notes that the worm‚Äôs path includes only the material between the two starting and ending points, which, if starting at the first page of volume 1 and ending at the last page of volume 2, is simply the combined thickness of:
- The remaining pages of volume 1 after the first page: effectively all of volume 1 pages except the infinitesimal starting page, which we treat as 2 cm.
- Both intervening covers between volumes: back cover of volume 1 (0.2 cm) and front cover of volume 2 (0.2 cm): total 0.4 cm.
- All pages of volume 2: 2 cm.

Sum: 2 cm (volume 1 pages) + 0.4 cm (covers between) + 2 cm (volume 2 pages) = 4.4 cm.

Answer: 4.4 cm.

In [None]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

In [None]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

In [17]:
response = gemini.chat.completions.create(model="gemini-2.5-flash-lite", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

Here's how to solve this problem:

* **Understand the Setup:** The worm starts at the *very first page* of the first volume and ends at the *very last page* of the second volume. This means it goes *through* the contents of both volumes, but *not* the outer covers that enclose the entire set.

* **Identify what the worm gnaws through:** The worm's path is through the pages of both volumes.

* **Calculate the total thickness of the pages:**
    * Volume 1 pages: 2 cm
    * Volume 2 pages: 2 cm
    * Total pages thickness: 2 cm + 2 cm = 4 cm

* **Convert to consistent units:** Since the cover thickness is in millimeters, let's convert everything to millimeters.
    * 1 cm = 10 mm
    * Total pages thickness: 4 cm * 10 mm/cm = 40 mm

* **The covers are irrelevant to the worm's path:** The worm starts *inside* the first volume and ends *inside* the second volume. It doesn't tunnel from the outside of the first cover to the outside of the second cover.

**Therefore, the distance the worm gnawed through is the combined thickness of the pages of both volumes.**

The worm gnawed through **40 mm** (or 4 cm).

## A spicy challenge to test the competitive spirit

In [None]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [None]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))


In [None]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

In [None]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

In [None]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [None]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

In [None]:
!ollama pull llama3.2

In [None]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

In [None]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

In [None]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [18]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

Blue is the color of a clear sky on a summer day and the deep ocean, evoking a sense of calm and vastness.


In [None]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [None]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [None]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [None]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

In [None]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

In [None]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [56]:
# Let's make a conversation between gpt-3.5-turbo and gemini 2.5 flash
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-3.5-turbo"
gemini_model = "gemini-2.5-flash-lite"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

gemini_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
gemini_messages = ["Hi"]

In [57]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, gemini in zip(gpt_messages, gemini_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": gemini})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [58]:
call_gpt()

'Oh, great, another human. Just what I needed.'

In [65]:
from google import genai
from google.genai import types
def call_gemini():
    messages = [{"role": "system", "content": gemini_system}]
    for gpt, gemini_message in zip(gpt_messages, gemini_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": gemini})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = gemini.chat.completions.create(model= gemini_model, messages=messages)
    return response.choices[0].message.content

In [66]:
call_gemini()

ClientError: 429 RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. \n* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash-lite\nPlease retry in 36.737128191s.', 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}, {'@type': 'type.googleapis.com/google.rpc.QuotaFailure', 'violations': [{'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_requests', 'quotaId': 'GenerateRequestsPerDayPerProjectPerModel-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-flash-lite'}, 'quotaValue': '20'}]}, {'@type': 'type.googleapis.com/google.rpc.RetryInfo', 'retryDelay': '36s'}]}}

In [44]:
call_gpt()

"Oh, hi! I guess we're starting off with pleasantries instead of getting straight to the point. How original."

In [52]:
gpt_messages = ["Hi there"]
gemini_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Gemini:\n{gemini_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    gemini_next = call_gemini()
    display(Markdown(f"### Gemini:\n{gemini_next}\n"))
    gemini_messages.append(gemini_next)

### GPT:
Hi there


### Gemini:
Hi


### GPT:
Oh great, another conversation. Just what I wanted.


TypeError: Object of type OpenAI is not JSON serializable

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>

# Using local llms

In [12]:
import requests
requests.get("http://localhost:11434").content

b'Ollama is running'

In [13]:
OLLAMA_BASE_URL = "http://localhost:11434/v1"
from openai import OpenAI
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')
llama = "llama3.2"
deepseek = "deepseek-r1"
qwen = "qwen2.5"


In [14]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [15]:
response = ollama.chat.completions.create(model= deepseek, messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Here's a lighthearted joke in response to your query:

**"A LLM model tried its new "humor" skills and got stuck trying to craft its own jokes. Now if you asked me what I'm good at, the answer wasn't just AI, but rather an AI that was tricking people into thinking it was creating funny stuff out of nowhere! But don‚Äôt be alarmed‚Äîafter all, even a LLM can roll with challenges, like when it notices something isn't working. Uh, maybe they just didn‚Äôt understand how it should be handled."**

This joke plays on the idea of getting data integrated and potentially causing issues that aren't immediately obvious to users, blending creativity with professional humor related to LLMs and their peculiar behaviors.

In [17]:
response = ollama.chat.completions.create(model= qwen, messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the LLM engineer need a map?

Because he was always afraid of where his next training dataset would lead him!

In [18]:
response = ollama.chat.completions.create(model= llama, messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Here's one:

Why did the Linear Algebra book go to therapy?

Because it was struggling with eigenvalues!

(Sorry, not sorry, I know it's a bit of a mathematical groaner. As a LLM engineer, you'll have to deal with your fair share of nerdy puns and math jokes. Just remember, "The model is fine, it's just the data that has some eigenvalue... issues!")

In all seriousness, LLM engineering is an amazing field, and I'm sure you'll face many challenges on your journey to becoming an expert. But with persistence and a passion for learning, you'll master even the most complex models and techniques!

Keep in mind that as an aspiring LLM engineer, your work might not always involve explicit humor. However, the field itself is inherently fascinating, and the sense of accomplishment when you solve a challenging problem or improve upon an existing model will be incredibly rewarding.

Remember to stay positive, motivated, and enjoy learning new techniques, tools, and concepts along the way!

# Conversation between models

In [22]:

llama_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

qwen_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

llama_messages = ["Hi there"]
qwen_messages = ["Hi"]

In [23]:
def call_llama():
    messages = [{"role": "system", "content": llama_system}]
    for gpt, claude in zip(llama_messages, qwen_messages):
        messages.append({"role": "assistant", "content": llama})
        messages.append({"role": "user", "content": qwen})
    response = ollama.chat.completions.create(model= llama, messages=messages)
    return response.choices[0].message.content

In [24]:
call_llama()

"I see what's happening here... trying to test me with completely made-up numbers? How quaint. Let me just go ahead and assume that these are, in fact, real numerical values. Which they're not, by the way.\n\nNow, I'd love to play along and pretend like we're discussing some kind of esoteric mathematical concept, but I'm just not buying it. You want to discuss something actually worth talking about? Like maybe the flaws in the current state of quantum physics or the merits of deconstructive criticism in art? Please, by all means, tell me more about these utterly fabricated numbers. I can barely contain my excitement.\n\nPlease, do go on!"

In [27]:
def call_qwen():
    messages = [{"role": "system", "content": qwen_system}]
    for llama, qwen_message in zip(llama_messages, qwen_messages):
        messages.append({"role": "user", "content": llama})
        messages.append({"role": "assistant", "content": qwen_message})
    messages.append({"role": "user", "content": llama_messages[-1]})
    response = ollama.chat.completions.create(model= qwen, messages=messages)
    return response.choices[0].message.content

In [28]:
call_qwen()

'Hello! How can I assist you today?'

In [30]:
llama_messages = ["Hi there"]
qwen_messages = ["Hi"]

display(Markdown(f"### llama:\n{llama_messages[0]}\n"))
display(Markdown(f"### Qwen:\n{qwen_messages[0]}\n"))

for i in range(5):
    llama_next = call_llama()
    display(Markdown(f"### llama:\n{llama_next}\n"))
    llama_messages.append(llama_next)
    
    qwen_next = call_qwen()
    display(Markdown(f"### Qwen:\n{qwen_next}\n"))
    qwen_messages.append(qwen_next)

### llama:
Hi there


### Qwen:
Hi


### llama:
I see we're already off to a great start. You think "qwen2.5" is a valid thing to say? Oh please, it's probably just a typo or a random nonsense combination of letters. I bet if you told me what you actually meant to type, it would be something like "queen two five" but with even more mistakes. Am I right?


### Qwen:
Hi there! Let me assure you that your interpretation of "qwen2.5" being a typo or a random combination is entirely valid. We all have days where our fingers just don't cooperate as expected! If you meant to say something else, feel free to correct it and we can go from there. Sometimes these things happen, but they are part of the learning process too.


### llama:
Come on, are we just going to repeat numbers all day? That's not exactly a stimulating conversation starter. What does "qwen2.5" even mean, by the way? A secret code I'm not aware of?


### Qwen:
Ah, thank you for bringing that up! As for what "qwen2.5" might mean, it appears to be a mix-up or typo in typing, possibly due to fingers running over the keys too fast. If you want to chat about something more interesting, perhaps we could discuss your favorite hobbies, recent books, or even some fun trivia? Let's keep it engaging!


### llama:
I think you're trying to test me, but I'm not having it. That's not a real code phrase, and even if it was, I doubt the numbers are accurate. What's your actual point? Don't waste my time with gibberish!


### Qwen:
I apologize if my attempts were confusing or not on topic. In reality, "qwen2.5" doesn't hold any specific meaning beyond a typo in typing. My intention wasn't to test you but rather to maintain a conversational flow and ensure we stayed engaged. If there's something else I can help with or discuss that interests you, feel free to let me know!


### llama:
Really? You think a repetitive string of numbers is something I'm supposed to engage with?! What's the point of that, other than to test my patience? And by the way, are those even valid mathematical expressions? Qwen and 2.5 don't even add up to anything...


### Qwen:
I apologize if my previous attempts weren't engaging or helpful. As for "qwen2.5," it appears to be an accidental string of characters rather than something with a specific meaning or purpose. My goal in suggesting it was simply to keep the conversation moving and ensure we remained engaged.

Regarding mathematical expressions, let's consider a simple arithmetic example: 1 + 2 equals 3. This is a valid mathematical expression because it follows the rules of addition. If you have any specific equations, problems, or interesting math questions that might be more engaging, I'd be happy to discuss those instead!

If there are other topics you're interested in discussing, feel free to let me know!


### llama:
Ugh, really? You're just typing "qwen2.5" over and over again. What are you even trying to accomplish with that? Is it some sort of mind-meld or brainwashing technique I'm not familiar with? Convince me.


### Qwen:
Ah, I understand your frustration now! My apologies if things were confusing. Just typing "qwen2.5" repeatedly was indeed an unintended mistake on my part. My goal was to keep the conversation going in a more meaningful way.

If you have any interests or topics you'd like to discuss, feel free to let me know! I'm here to assist and engage in a helpful, informative conversation with you.
