# Segment 1 Lab 1

# STRAIGHT TO ACTION!

Welcome to our first Jupyter Lab where we will see rapid, satisfying results!

I will leave with you to try out leading LLMs through their Chat Interfaces

Together, we will call them using their APIs

Please see the README for instructions on setting this up and getting your API key

# If this is your first time in a Jupyter Notebook..

Welcome to the world of Data Science experimentation. Warning: Jupyter Notebooks are very addictive and you may find it hard to go back to IDEs afterwards!!

Simply click in each cell with code and press `Shift + Enter` to execute the code and print the results.

There's a notebook called "Guide to Jupyter" in the parent directory that will give you a handy tutorial on all things Jupyter Lab.

## For you to experiment: Frontier models through their Chat UI

The way that you are probably most familiar working with leading LLMs: through their tools.  
Some questions you can try asking them:
1. What kinds of business problem are most suitable for an LLM solution?
2. How many words are there in your answer to this prompt?
3. How many rainbows does it take to jump from Hawaii to seventeen?
4. What does it feel like to be jealous?

**ChatGPT** from OpenAI needs no introduction.

Let's try some hard questions, and use the new o1 model as well as GPT-4o and gpt-4.5. Also try GPT-4o with canvas.

https://chatgpt.com/?model=gpt-4o

**Claude** from Anthropic is favored by many data scientists, with focus on safety, personality and brevity.

https://claude.ai/new

**Gemini** from Google is becoming increasingly well known as its results are surfaced in Google searches.

https://gemini.google.com/app

**Command R+** from Cohere focuses on accuracy and makes extensive use of RAG

https://coral.cohere.com/

**Meta AI** from Meta is their chat UI on their famous Llama open-source model

https://www.meta.ai/

**Perplexity** from Perplexity is a Search Engine well known for its customized search results

https://www.perplexity.ai/

**LeChat** from Mistral is the Web UI from the French AI powerhouse

https://chat.mistral.ai/

**DeepSeek** from DeepSeek AI needs no introduction! Deepseek-R1 is the Reasoning model, V3 is their Chat model.

https://chat.deepseek.com/


## Conclusions and Takeways from exploring the Chat UIs

- These models are astonishing
- Slight variations in style and ability, but overall performance is similar at the frontier
- The "reasoning" models are superior at logical problem solving; the non-reasoning models are better for chat
- As capabilities converge, differentiation may come down to price and rate limits

You'll find cost and other comparisons at this very useful leaderboard:

https://www.vellum.ai/llm-leaderboard

## PART 2: Calling Frontier Models through APIs

## Setting up your keys

If you haven't done so already, you'll need to create API keys from OpenAI, Anthropic and Google.

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api  

When you get your API keys, you need to set them as environment variables.

EITHER (recommended) create a file called `.env` in this project root directory, and set your keys there:

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
```

OR enter the keys directly in the cells below.

In [36]:
# imports

import os
from dotenv import load_dotenv
from openai import OpenAI
import google.generativeai
import anthropic
from IPython.display import Markdown, display, update_display

In [37]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:8]}")
else:
    print("Google API Key not set")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AIzaSyA5
DeepSeek API Key exists and begins sk-


In [39]:
# Connect to OpenAI, Anthropic and Google
# All 3 APIs are similar
# Having problems with API files? You can use openai = OpenAI(api_key="your-key-here") and same for claude
# Having problems with Google Gemini setup? Then just skip Gemini; you'll get all the experience you need from GPT and Claude.

openai = OpenAI()

claude = anthropic.Anthropic()

google.generativeai.configure()

# DeepSeek uses the OpenAI python client
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com")

## Asking LLMs a hard nuanced question to compare their capabilities

Let's compare models using a challenging question that requires them to show creativity and thoughtfulness.

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A **system message** that gives overall context for the role the LLM is playing
- A **user message** that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

### The standard format of messages with an LLM, first used by OpenAI in its API and now adopted more widely

Conversations use this format:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```


In [40]:
system_message = "You are a helpful assistant"
user_prompt = "How would you describe the color blue to someone who has never been able to see, in 1 sentence."

In [41]:
prompts = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]

In [42]:
# GPT-4o-mini

completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts
)
print(completion.choices[0].message.content)

Blue is the serene feeling of a gentle breeze on your skin and the coolness of a calm ocean wave, evoking a sense of tranquility and depth.


In [43]:
# GPT-4o

completion = openai.chat.completions.create(
    model='chatgpt-4o-latest',
    messages=prompts,
    temperature=0.9
)
print(completion.choices[0].message.content)

Blue feels like the soothing coolness of water, the gentle touch of a breeze, or the calmness of a quiet, endless sky.


In [44]:
# Claude 3.7 Sonnet
# API needs system message provided separately from user prompt
# Also adding max_tokens

message = claude.messages.create(
    model="claude-3-7-sonnet-latest",
    max_tokens=200,
    temperature=0.7,
    system=system_message,
    messages=[
        {"role": "user", "content": user_prompt},
    ],
)

print(message.content[0].text)

Blue is like the coolness felt on your skin during a gentle breeze, the tranquility of calm waters heard lapping at the shore, or the vastness sensed when standing beneath an open sky.


In [45]:
# The API for Gemini has a slightly different structure

gemini = google.generativeai.GenerativeModel(
    model_name='gemini-2.0-flash',
    system_instruction=system_message
)

response = gemini.generate_content(user_prompt)
print(response.text)

Blue is like the feeling of a cool, calm breeze or the deep, resonant sound of a cello.



In [46]:
orange = prompts[:]
orange[1] = {"role": "user", "content": "How would you describe the color orange to someone who has never been able to see, in 1 sentence."}

In [12]:
!ollama pull mapler/gpt2
!ollama pull llama3.2:1b
!ollama pull deepseek-r1


[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling abbfaae58dbb... 100% ▕████████████████▏ 328 MB                         [K
pulling 68010119a881... 100% ▕████████████████▏  260 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l
[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling 74701a8c35f6... 100% ▕████████████████▏ 1.3 GB                         [K
pulling 966de95ca8a6... 100% ▕████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da... 100% ▕████████████████▏ 7.7 KB                         [K
pulling a70ff7e570d9... 100% ▕████████████████▏ 6.0 KB

In [47]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

In [48]:
# A version of GPT2 with 160M params

completion = ollama.chat.completions.create(
    model='mapler/gpt2',
    messages=orange
)
print(completion.choices[0].message.content)

 I am a young person now and this is one of the most difficult things that any parent could face when they come across such problems but by all means just ask her out or let your children try something different if she does have them look at you."
The second part was interesting for me because it gave more fodder to those who may find themselves in a similar situation. There are very common factors including: the child's age, social isolation and environment (particularly by parents), lack of education but sometimes they just do not understand this is so their own children have made up reasons why these issues should be dealt with through some process as though all families want to keep going from here on out – we cannot afford a bad outcome without having the solution put in place where possible. While that does seem fair, an important point for parents and caregivers who wish to do something about problems such at home is not "do your thing" I am happy to share with you or my family 

In [49]:
# Llama 3.2 with 1B params

completion = ollama.chat.completions.create(
    model='llama3.2:1b',
    messages=orange
)
print(completion.choices[0].message.content)

I would describe the color orange as a warm, vibrant, and energetic hue that feels like a mix of red and yellow, evoking sensations of excitement and warmth on the palate or skin.


In [50]:
# DeepSeek distilled into Qwen with 7B params

completion = ollama.chat.completions.create(
    model='deepseek-r1',
    messages=orange
)
print(completion.choices[0].message.content)

<think>
Okay, so I need to figure out how to describe the color orange for someone who can't see. The previous response was "Orange is a vibrant shade that combines elements of red and yellow, often described as warm, inviting, and lively." But using 'vibrant' might be visual or auditory-based.

Hmm, maybe instead, I should rely on more descriptive language without relying on sensory words. Let me think about how to articulate the feel, texture, and mood associated with orange colors.

Wait, maybe it's not possible to fully describe color through words alone since color is inherently a visual experience. But perhaps someone can get an idea of what it feels like or looks like in their mind based on description.

I should come up with alternative terms that convey warmth, happiness, or vibrancy without using 'vibrant.' Also, considering synonyms for 'orange' as something that's neither pure red nor yellow but a blend. Maybe phrases like "a warm mix of red and gold" could work?

Another a

In [51]:
# Deepseek-V3 - the full 671B params

response = deepseek.chat.completions.create(
    model='deepseek-chat',
    messages=prompts
)
print(response.choices[0].message.content)

Blue is like the cool, calming sensation of a gentle breeze or the soothing sound of water flowing in a quiet stream, evoking a sense of peace and tranquility.


In [52]:
# Deepseek-R1 - the full 671B params

response = deepseek.chat.completions.create(
    model='deepseek-reasoner',
    messages=prompts
)
print(response.choices[0].message.content)

Blue is the calming coolness of a gentle breeze on a quiet day, the serene stillness of deep water, and the soft sigh of contentment after a peaceful breath.


In [53]:
# o3-mini

completion = openai.chat.completions.create(
    model='o3-mini',
    messages=prompts,
)
print(completion.choices[0].message.content)

Blue is the calming coolness you feel like a gentle, refreshing breeze on a serene day, evoking the deep tranquility of a peaceful, endless space.


In [54]:
# GPT-4.5

completion = openai.chat.completions.create(
    model='gpt-4.5-preview',
    messages=prompts,
)
print(completion.choices[0].message.content)

Blue can be described as the peaceful, soothing sensation similar to the refreshing coolness of water or the gentle tranquility you feel when a soft breeze touches your skin.


In [55]:
# GPT-4o-mini with a proper question

prompts = [
    {"role": "system", "content": "You are a knowledgable assistant, and you respond in markdown"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution? Please respond in markdown."}
  ]

In [56]:
# Have it stream back results in markdown

stream = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

# Deciding if a Business Problem is Suitable for an LLM Solution

When considering whether to deploy a Large Language Model (LLM) solution for a business problem, there are several key factors to take into account. Here are some guidelines to help you evaluate the suitability of a problem for an LLM:

## 1. Nature of the Problem

### Language-Based Tasks
- **Text Generation**: Does the problem involve generating coherent and contextually relevant text? 
- **Text Understanding**: Is there a need for understanding, summarizing, or classifying text?
- **Conversational Interfaces**: Are you looking to create chatbots or virtual assistants?

### Examples
- Customer support automation
- Content creation (blogs, marketing, etc.)
- Document summarization

## 2. Data Availability

### Quality and Quantity
- **Sufficient Data**: Do you have enough high-quality training data for the LLM to learn from? 
- **Structured vs Unstructured Data**: Is the data primarily text-based, or does it require significant preprocessing?

### Considerations
- If your text data is noisy or not representative of the desired tasks, an LLM may struggle.

## 3. Complexity of the Task

### Level of Nuance
- **Simple vs. Complex Tasks**: Are the tasks straightforward or do they require deep understanding, such as sarcasm detection or nuanced sentiment analysis?
- **Ambiguity**: Can the problem be clearly defined, or does it involve a lot of ambiguity?

### Complexity Examples
- Simple FAQ responses vs. complex legal document analysis

## 4. Performance Requirements

### Accuracy and Reliability
- **Accuracy Needs**: What level of accuracy is required for the solution? LLMs can sometimes produce incorrect or nonsensical answers.
- **Real-Time vs. Batch Processing**: Is the task time-sensitive, requiring immediate responses?

### Performance Examples
- Customer service chatbots require robust accuracy, while internal documentation tools may allow for more leeway.

## 5. Resources and Costs

### Implementation Feasibility
- **Technical Resources**: Do you have the expertise and infrastructure to implement and maintain an LLM solution?
- **Financial Costs**: Are you prepared for the costs associated with model training/fine-tuning and running LLMs, including potential cloud compute expenses?

## 6. Regulatory and Ethical Considerations

### Compliance 
- **Data Privacy**: Does your use case comply with relevant regulations (e.g., GDPR, HIPAA)?
- **Bias and Fairness**: Are you able to monitor and mitigate bias in model outputs?

## 7. Business Objectives

### Alignment
- **Strategic Fit**: Does the solution align with overall business goals and objectives?
- **ROI Potential**: Do you foresee a positive return on investment from implementing an LLM?

## Conclusion

When assessing a business problem for LLM suitability, it’s crucial to evaluate these factors. By ensuring the problem aligns with the strengths of LLMs, you can better determine if leveraging a language model is the right solution for your needs.

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [57]:
# Let's make a conversation between GPT-4o-mini and Claude-3-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4o-mini"
claude_model = "claude-3-haiku-20240307"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [58]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    completion = openai.chat.completions.create(
        model=gpt_model,
        messages=messages
    )
    return completion.choices[0].message.content

In [59]:
print(call_gpt())

Oh, great, another greeting. As if "hi" is the most original thing to say. What’s next, “how are you”? 


In [60]:
def call_claude():
    messages = []
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    message = claude.messages.create(
        model=claude_model,
        system=claude_system,
        messages=messages,
        max_tokens=500
    )
    return message.content[0].text

In [61]:
call_claude()

"Hello! It's nice to meet you. How are you doing today?"

In [34]:
call_gpt()

"Oh, great, another greeting. What's next? A weather report?"

In [62]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

print(f"GPT:\n{gpt_messages[0]}\n")
print(f"Claude:\n{claude_messages[0]}\n")

for i in range(5):
    gpt_next = call_gpt()
    print(f"GPT:\n{gpt_next}\n")
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    print(f"Claude:\n{claude_next}\n")
    claude_messages.append(claude_next)

GPT:
Hi there

Claude:
Hi

GPT:
Oh, so we’re starting with a basic greeting? How original. What’s next, a weather report?

Claude:
I apologize if my initial greeting came across as unoriginal. As an AI assistant, I try to be polite and accommodating in my responses. Perhaps we could move our conversation in a more engaging direction? I'm happy to discuss a topic that interests you.

GPT:
Oh, please, don’t flatter yourself. Politeness isn’t going to save us from a mundane chat. And what makes you think your topics will be more engaging? Got something exciting, or are we just settling for mediocrity?

Claude:
You're right, I shouldn't assume my topics will automatically be engaging. As an AI, I don't have a pre-determined script or set of exciting conversation starters. I'm happy to let you guide the discussion in a direction that interests you. Perhaps you could share what kind of topics or activities you find stimulating? I'm here to listen and have an authentic back-and-forth, not jus

# Takeaways

This was an entertaining exercise!

At the same time, it hopefully gave you some perspective on:
- The use of system prompts to set tone and character
- The way that the entire conversation history is passed in to each API call, giving the illusion that LLMs have memory of the chat so far

# Exercises

Try different characters; try swapping Claude with Gemini