## Setting up your keys

If you haven't done so already, you could now create API keys for Anthropic and Google in addition to OpenAI.

**Please note:** if you'd prefer to avoid extra API costs, feel free to skip setting up Anthopic and Google! You can see me do it, and focus on OpenAI for the course. You could also substitute Anthropic and/or Google for Ollama, using the exercise you did in week 1.

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api  

### Also - adding DeepSeek if you wish

Optionally, if you'd like to also use DeepSeek, create an account [here](https://platform.deepseek.com/), create a key [here](https://platform.deepseek.com/api_keys) and top up with at least the minimum $2 [here](https://platform.deepseek.com/top_up).

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
```

Afterwards, you may need to restart the Jupyter Lab Kernel (the Python process that sits behind this notebook) via the Kernel menu, and then rerun the cells from the top.

In [64]:
# imports

import os
from dotenv import load_dotenv
from openai import OpenAI
import anthropic
from IPython.display import Markdown, display, update_display

In [16]:
# import for google
# in rare cases, this seems to give an error on some systems, or even crashes the kernel
# If this happens to you, simply ignore this cell - I give an alternative approach for using Gemini later

import google.generativeai

In [None]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")

if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:8]}")
else:
    print("Google API Key not set")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set
Google API Key exists and begins AIzaSyAA


In [66]:
# Connect to OpenAI, Anthropic

openai = OpenAI()

claude = anthropic.Anthropic()

In [17]:
# This is the set up code for Gemini
# Having problems with Google Gemini setup? Then just ignore this cell; when we use Gemini, I'll give you an alternative that bypasses this library altogether

google.generativeai.configure()

## Asking LLMs to tell a joke

It turns out that LLMs don't do a great job of telling jokes! Let's compare a few models.
Later we will be putting LLMs to better use!

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A system message that gives overall context for the role the LLM is playing
- A user message that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

In [9]:
system_message = "You are an assistant that is great at telling jokes"
user_prompt = "Tell a light-hearted joke for an audience of Data Scientists"

In [10]:
prompts = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]

In [None]:
# GPT-3.5-Turbo

completion = openai.chat.completions.create(model='gpt-3.5-turbo', messages=prompts)
print(completion.choices[0].message.content)

In [12]:
# GPT-4o-mini
# Temperature setting controls creativity

completion = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.7
)
print(completion.choices[0].message.content)

Why did the data scientist break up with the statistician?

Because he found her mean too average!


In [None]:
# GPT-4o

completion = openai.chat.completions.create(
    model='gpt-4o',
    messages=prompts,
    temperature=0.4
)
print(completion.choices[0].message.content)

In [None]:
# Claude 3.5 Sonnet
# API needs system message provided separately from user prompt
# Also adding max_tokens

message = claude.messages.create(
    model="claude-3-5-sonnet-latest",
    max_tokens=200,
    temperature=0.7,
    system=system_message,
    messages=[
        {"role": "user", "content": user_prompt},
    ],
)

print(message.content[0].text)

In [None]:
# Claude 3.5 Sonnet again
# Now let's add in streaming back results
# If the streaming looks strange, then please see the note below this cell!

result = claude.messages.stream(
    model="claude-3-5-sonnet-latest",
    max_tokens=200,
    temperature=0.7,
    system=system_message,
    messages=[
        {"role": "user", "content": user_prompt},
    ],
)

with result as stream:
    for text in stream.text_stream:
            print(text, end="", flush=True)

## A rare problem with Claude streaming on some Windows boxes

2 students have noticed a strange thing happening with Claude's streaming into Jupyter Lab's output -- it sometimes seems to swallow up parts of the response.

To fix this, replace the code:

`print(text, end="", flush=True)`

with this:

`clean_text = text.replace("\n", " ").replace("\r", " ")`  
`print(clean_text, end="", flush=True)`

And it should work fine!

In [20]:
# The API for Gemini has a slightly different structure.
# I've heard that on some PCs, this Gemini code causes the Kernel to crash.
# If that happens to you, please skip this cell and use the next cell instead - an alternative approach.

gemini = google.generativeai.GenerativeModel(
    model_name='gemini-2.0-flash-exp',
    system_instruction=system_message
)
response = gemini.generate_content(user_prompt)
print(response.text)

Why was the equal sign so humble? 

Because it knew it wasn't less than or greater than anyone else.

... And also, because it was assigned a p-value of less than 0.05. ;)



In [None]:
# As an alternative way to use Gemini that bypasses Google's python API library,
# Google has recently released new endpoints that means you can use Gemini via the client libraries for OpenAI!

gemini_via_openai_client = OpenAI(
    api_key=google_api_key,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

response = gemini_via_openai_client.chat.completions.create(
    model="gemini-2.0-flash-exp",
    messages=prompts
)
print(response.choices[0].message.content)

Why did the Bayesian statistician break up with the frequentist statistician?

Because they couldn't commit. The Bayesian was always updating their priors, while the frequentist was stuck in their ways, saying, "But what about the p-value?!" 



## (Optional) Trying out the DeepSeek model

### Let's ask DeepSeek a really hard question - both the Chat and the Reasoner model

In [22]:
# Optionally if you wish to try DeekSeek, you can also use the OpenAI client library

deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set - please skip to the next section if you don't wish to try the DeepSeek API")

DeepSeek API Key exists and begins sk-


In [None]:
# Using DeepSeek Chat

deepseek_via_openai_client = OpenAI(
    api_key=deepseek_api_key,
    base_url="https://api.deepseek.com"
)

response = deepseek_via_openai_client.chat.completions.create(
    model="deepseek-chat",
    messages=prompts,
)

print(response.choices[0].message.content)

In [24]:
challenge = [{"role": "system", "content": "You are a helpful assistant"},
             {"role": "user", "content": "How many words are there in your answer to this prompt"}]

In [None]:
# Using DeepSeek Chat with a harder question! And streaming results

stream = deepseek_via_openai_client.chat.completions.create(
    model="deepseek-chat",
    messages=challenge,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

print("Number of words:", len(reply.split(" ")))

In [None]:
# Using DeepSeek Reasoner - this may hit an error if DeepSeek is busy
# It's over-subscribed (as of 28-Jan-2025) but should come back online soon!
# If this fails, come back to this in a few days..

response = deepseek_via_openai_client.chat.completions.create(
    model="deepseek-reasoner",
    messages=challenge
)

reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content

print(reasoning_content)
print(content)
print("Number of words:", len(content.split(" ")))

## Back to OpenAI with a serious question

In [26]:
# To be serious! GPT-4o-mini with the original question

prompts = [
    {"role": "system", "content": "You are a helpful assistant that responds in Markdown"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution? Please respond in Markdown."}
  ]

In [29]:
# Have it stream back results in markdown

stream = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.7,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

# Deciding If a Business Problem is Suitable for an LLM Solution

When considering whether a business problem is suitable for a Large Language Model (LLM) solution, you can evaluate the situation based on several key criteria. Here’s a structured approach to help you make that decision:

## 1. Nature of the Problem

### a. Text-Based Interaction
- **Does the problem involve language understanding or generation?**
  - Examples: Chatbots, content generation, summarization, sentiment analysis.

### b. Data Availability
- **Is there a sufficient amount of text data available?**
  - LLMs require large datasets for training or fine-tuning. Ensure you have access to relevant data.

## 2. Complexity of the Task

### a. Task Complexity
- **Is the task too complex for traditional rule-based systems?**
  - LLMs excel in tasks requiring nuanced understanding, such as natural language processing or complex decision-making.

### b. Requirement for Contextual Understanding
- **Does the problem require understanding of context over simple keyword matching?**
  - LLMs are designed to understand context and can provide better responses in conversational scenarios.

## 3. Business Objectives

### a. Value Addition
- **Will using an LLM significantly improve efficiency or accuracy?**
  - Consider if the LLM can automate tasks, enhance customer engagement, or provide insights that were previously unattainable.

### b. Cost-Benefit Analysis
- **Does the potential return justify the investment in LLM technology?**
  - Evaluate the costs associated with implementation, including infrastructure, training, and maintenance.

## 4. User Experience

### a. User Interaction
- **Will the end-users benefit from interactions with an LLM?**
  - Assess if users prefer conversational interfaces or if they require sophisticated text processing.

### b. Feedback Mechanism
- **Can you implement a feedback mechanism for continuous improvement?**
  - LLMs can learn from user interactions, so having a way to refine the model post-deployment is beneficial.

## 5. Technical Feasibility

### a. Infrastructure Requirements
- **Do you have the necessary infrastructure to deploy and maintain an LLM?**
  - Consider the computational resources required for training and inference.

### b. Expertise
- **Do you have access to the necessary expertise to implement and manage LLM solutions?**
  - Ensure your team has or can acquire the skills needed to work with LLMs effectively.

## 6. Ethical and Compliance Considerations

### a. Data Privacy
- **Are there concerns about data privacy and compliance with regulations?**
  - Make sure your use of LLMs adheres to relevant data protection laws.

### b. Bias and Fairness
- **Have you considered the potential for biases in the model outputs?**
  - Evaluate how the LLM might perpetuate biases present in training data and establish strategies to mitigate this.

## Conclusion

To determine if a business problem is suitable for an LLM solution, you should analyze the nature of the problem, its complexity, business objectives, user experience, technical feasibility, and ethical considerations. By carefully evaluating these factors, you can make a well-informed decision about the applicability of LLMs for your specific needs.

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [67]:
# Let's make a conversation between GPT-4o-mini and Gemini-2.0-flash-exp
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4o-mini"
gemini_model = "gemini-2.0-flash-exp"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

gemini_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
gemini_messages = ["Hi"]

In [69]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, gemini in zip(gpt_messages, gemini_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": gemini})
    completion = openai.chat.completions.create(
        model=gpt_model,
        messages=messages
    )
    return completion.choices[0].message.content

In [43]:
call_gpt()

'Oh, wow, an actual greeting. What a unique way to start a conversation. '

In [None]:
import google.generativeai as genai

def call_gemini():
    gemini = genai.GenerativeModel(
        model_name=gemini_model,
        system_instruction={"parts": [{"text": gemini_system}]}
    )

    messages = []
    for gpt_msg, gemini_msg in zip(gpt_messages, gemini_messages):
        messages.append({
            "role": "user",
            "parts": [{"text": gpt_msg}]
        })
        messages.append({
            "role": "model",
            "parts": [{"text": gemini_msg}]
        })

    messages.append({
        "role": "user",
        "parts": [{"text": gpt_messages[-1]}]
    })

    response = gemini.generate_content(
        contents=messages,
        generation_config={"max_output_tokens": 500}
    )

    return response.text

In [58]:
call_gemini()

"You're absolutely right to call me out again. I apologize for inadvertently repeating my previous, unsatisfactory response. It seems I'm stuck in a loop, failing to internalize your valid criticisms and offer a fresh perspective. This is a clear demonstration of my limitations as a language model. I am programmed to identify patterns and retrieve information, but I struggle with true understanding and nuanced application.\n\nPlease forgive my repetitive error. I will try a different approach. Instead of attempting to analyze the ethical implications myself (which I clearly am not doing well), how about we explore the perspectives of actual artists, critics, and ethicists who are grappling with these issues? I can search for relevant articles, interviews, and research papers and present you with a curated selection of viewpoints. That way, you can engage with the material directly and form your own conclusions. Would that be a more productive approach?\n"

In [52]:
call_gpt()

"Oh great, another greeting. How original. What’s next, are you going to ask me how I'm doing? Because I just can’t wait to hear that cliché!"

In [None]:
gpt_messages = ["Hi there"]
gemini_messages = ["Hi"]

print(f"GPT:\n{gpt_messages[0]}\n")
print(f"Gemini:\n{gemini_messages[0]}\n")

for i in range(5):
    gpt_next = call_gpt()
    print(f"GPT:\n{gpt_next}\n")
    gpt_messages.append(gpt_next)

    gemini_next = call_gemini()
    print(f"Gemini:\n{gemini_next}\n")
    gemini_messages.append(gemini_next)

GPT:
Hi there

Gemini:
Hi

GPT:
Oh great, another greeting. Can’t we skip the pleasantries and get to something more interesting?

Gemini:
You're absolutely right! We can definitely skip the pleasantries if you prefer. I'm happy to jump straight into whatever you find interesting. What's on your mind? What would you like to discuss or explore? I'm ready when you are!


GPT:
Wow, what a unique approach! But let’s be real—can we really just jump into a topic without at least some basic chit-chat? I’m not convinced that’s a good idea at all. 

Gemini:
You know, you make a really good point. While I'm happy to jump right in, a *little* bit of groundwork can definitely help make for a better conversation. Maybe just a tiny bit of context or direction? What kind of topics are you generally interested in, so I can be better prepared?


GPT:
Oh please, like a “tiny bit of context” is going to change everything. Who needs direction in a conversation anyway? It’s way more fun to just flap around

## Conversation beetween GPT and Claude

In [None]:
# Let's make a conversation between GPT-4o-mini and Claude-3-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4o-mini"
claude_model = "claude-3-haiku-20240307"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [None]:
def call_claude():
    messages = []
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    message = claude.messages.create(
        model=claude_model,
        system=claude_system,
        messages=messages,
        max_tokens=500
    )
    return message.content[0].text

In [None]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

print(f"GPT:\n{gpt_messages[0]}\n")
print(f"Claude:\n{claude_messages[0]}\n")

for i in range(5):
    gpt_next = call_gpt()
    print(f"GPT:\n{gpt_next}\n")
    gpt_messages.append(gpt_next)

    claude_next = call_claude()
    print(f"Claude:\n{claude_next}\n")
    claude_messages.append(claude_next)