In [14]:
# !pip install anthropic

## **Anthropic_API_key**

To generate an Anthropic API key, you'll need to follow these steps:
* Go to the Anthropic website (https://www.anthropic.com)
* Sign up for an account if you haven't already done so.
* Once logged in, navigate to the API section or dashboard.
* Look for an option to create or generate a new API key.
* Follow the prompts to generate your API key.
* Make sure to copy and securely store your API key, as you typically won't be able to view it again after initial generation.

In [13]:
# !pip install google.generativeai

## **GOOGLE_API_KEY**
To access Gemini 1.5 Flash, you need to use the Google AI Studio or the Vertex AI API. Here's the correct process:

* Go to Google AI Studio:
* Visit https://makersuite.google.com/
* Sign in with your Google account
* Navigate to the API key section:
* Look for an option like "Get API key" or "API keys" in the interface
* Generate a new API key **1**: Follow the prompts to create a new API key specifically for Gemini models
* Generate a new API key **2** :Copy and securely store your new API key

This API key should allow you to access Gemini 1.5 Flash through the appropriate Google AI endpoints.

In [17]:
#This code imports essential libraries and tools needed to build a real-time LLM output system in Python, particularly for use in a Jupyter Notebook environment. 
#It sets up access to multiple LLM APIs (OpenAI, Google Generative AI, and Anthropic) and ensures that sensitive credentials are securely managed. 
#Additionally, it provides tools to display and update outputs dynamically, making it ideal for a responsive and interactive user experience.

# imports

import os
from dotenv import load_dotenv
from openai import OpenAI

# This import provides access to Google’s Generative AI tools, which allow interaction with Google’s LLM services. 
# Similar to OpenAI’s API, Google Generative AI offers models for generating text and performing other NLP tasks.
import google.generativeai

# This imports the anthropic library, which provides access to models developed by Anthropic, such as Claude. 
# Anthropic focuses on building models with safety and alignment as their key principles.
import anthropic

from IPython.display import Markdown, display, update_display

In [15]:
# Load the environment variables defined in the .env file into the current Python environment.
load_dotenv()

# This line ensures that the OpenAI API key is stored in the os.environ dictionary. 
# This dictionary holds all environment variables, which can be accessed by the script at runtime.
# If the key is not found in the environment, it defaults to 'your-key-if-not-using-env'.
# os.environ['OPENAI_API_KEY']: This explicitly sets the environment variable OPENAI_API_KEY to be used by the OpenAI client.
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY','your-key-if-not-using-env')

os.environ['ANTHROPIC_API_KEY'] = os.getenv('ANTHROPIC_API_KEY', 'your-key-if-not-using-env')

os.environ['GOOGLE_API_KEY'] = os.getenv('GOOGLE_API_KEY', 'your-key-if-not-using-env')

In [16]:
# Connect to OpenAI, Anthropic and Google
# All 3 APIs are similar
# Having problems with API files? You can use openai = OpenAI(api_key="your-key-here") and same for claude
# Having problems with Google Gemini setup? Then just skip Gemini; you'll get all the experience you need from GPT and Claude.

openai = OpenAI()

claude = anthropic.Anthropic()

google.generativeai.configure()

## Asking LLMs to tell a joke

It turns out that LLMs don't do a great job of telling jokes! Let's compare a few models.
Later we will be putting LLMs to better use!

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A system message that gives overall context for the role the LLM is playing
- A user message that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

In [18]:
# The system message defines the behavior, personality, or context for the assistant (the LLM). 
# It is used to set up the role or persona that the LLM should assume throughout the interaction.
system_message = "You are an assistant that is great at telling jokes"

# The user prompt specifies what the user wants from the assistant (the LLM). 
# It’s the input that the model will directly respond to based on the context established by the system message.
user_prompt = "Tell a light-hearted joke for an audience of Data Scientists"

### **Creating the prompts List**
* The model processes the prompts list in sequence.
* First, it understands its role based on the "system" message.
* Then it processes the "user" prompt to generate an appropriate response.
* This structure ensures that the model knows both how to respond and what to respond to, creating a more consistent and contextually accurate output.

In [19]:
# The prompts list is an organized structure that combines the system message and user prompt into a single input for the LLM API. 
# This structure is used to communicate with models like GPT-4 and is necessary for creating a coherent conversational context.
# "role": Specifies the type of message, such as "system", "user", or "assistant".
# "system": Sets the initial context and behavior of the LLM.
# "user": Provides the user’s prompt or question.
# "content": Contains the actual text of the message (system instructions or user query).
prompts = [
    {"role":"system","content":system_message},
    {"role":"user","content":user_prompt}
]

## **Technical Details and Considerations**

### **API Call Structure:**

* The OpenAI API uses a conversational model structure where you provide a sequence of messages with different roles (system, user, and assistant). This helps the model maintain context and respond appropriately based on its "role."

### **Model Choice:**

* **gpt-3.5-turbo** is chosen for efficiency and cost-effectiveness compared to other models like GPT-4. It’s designed to handle conversational tasks, making it suitable for generating interactive and engaging responses like jokes.

### **Accessing and Displaying Content:** 

* The API returns a response in a structured format (JSON-like), allowing easy access to different parts of the output, such as the message text and other metadata.
* Only the relevant part (the joke) is extracted and displayed using **print()**.

In [20]:
# GPT-3.5-Turbo

# Call the OpenAI API to generate a response from the GPT-3.5-Turbo model.
# model='gpt-3.5-turbo': Specifies the model to use. 
# messages=prompts: Passes the structured conversation (prompts) containing the system message and user prompt to the API.
# The model interprets the system message to understand its role (telling jokes) and ...
# then reads the user prompt to generate an appropriate response based on its knowledge and instructions.
completion = openai.chat.completions.create(model='gpt-3.5-turbo', messages=prompts)

# Accessing the Response
# completion.choices[0].message.content:
# completion: The variable storing the output returned by the OpenAI API. It contains a lot of information, including the response text and metadata.
# choices: A list of response choices generated by the model. In most cases, there is only one choice (i.e., choices[0]), but multiple responses could be returned depending on the API settings.
# message.content: The actual text of the response generated by the model. This is the part that contains the joke based on the prompt and the role defined in the system message.
print(completion.choices[0].message.content)

Why was the statistician so bad at relationships?

Because he couldn't find a significant other!


## **Parameters: temperature=0.7**

This parameter controls the creativity and randomness of the model’s output. The temperature value ranges from 0 to 1 (or sometimes higher), where:

* **Lower temperatures (close to 0):** Produce more deterministic and focused outputs, making the model generate responses that are likely more repetitive but reliable.
* **Low temperature (e.g., 0.1):** The model would likely produce very similar jokes each time, as it picks the most probable continuation based on the training data.
* **Higher temperatures (close to 1):** Produce more creative and diverse outputs, but they may be less consistent and more variable.
* **High temperature (e.g., 0.9):** The model would vary its responses significantly, potentially generating more unusual and creative jokes, but they might also be less accurate or relevant.
* A value of **0.7** strikes a balance between creativity and coherence, allowing the model to generate interesting but still relevant and controlled responses.
* A balanced temperature setting often recommended for tasks like storytelling or joke-telling, where you want some variety and creativity but still want the content to stay on topic.

The API call sends the prompts along with the temperature setting, allowing the model to generate a response that is both aligned with the system and user prompts and has a level of randomness/creativity appropriate for joke-telling.


In [21]:
# GPT-4o-mini
# Temperature setting controls creativity

completion = openai.chat.completions.create(
    model = 'gpt-4o-mini',
    messages=prompts,
    temperature=0.7
)
print(completion.choices[0].message.content)

Why did the data scientist break up with the statistician?

Because she found him too mean!


### **Temperature and Determinism:**

* **temperature=0.4** controls the level of creativity:
* **Low temperature** means that the model’s output is consistent and likely to be similar if the same prompt is given multiple times. It chooses the most probable response paths, making it predictable.
* This setting is suitable for scenarios where consistency and accuracy are more important than creativity (e.g., generating concise answers, programming support, or well-defined jokes).
* The trade-off is that the jokes may become more formulaic or less surprising as the model leans towards predictable outputs.

In [22]:
# GPT-4o

# The lower temperature setting ensures that the model sticks closely to conventional, high-probability jokes that align with the topic.
completion = openai.chat.completions.create(
    model = 'gpt-4o',
    messages = prompts,
    temperature = 0.4
)

print(completion.choices[0].message.content)

Why did the data scientist bring a ladder to work?

Because they heard the data had a lot of levels!


In [24]:
# Claude 3.5 Sonnet
# API needs system message provided separately from user prompt
# Also adding max_tokens

message = claude.messages.create(
    model = "claude-3-5-sonnet-20240620",
    max_tokens = 200, # Limit the maximum number of tokens the model can generate in its response.
    temperature = 0.7,
    # Unlike some other APIs, Claude’s API requires the system message to be provided separately from the user prompt.
    system = system_message,
    messages = [
        {"role":"user","content":user_prompt},
    ],
)


# message.content[0].text:
# message: The object returned by the Claude API call, containing the response and other relevant information.
# content: An attribute that holds the generated messages. It is a list containing one or more message objects.
# content[0]: Accesses the first message in the response (usually the only one unless the API is set to return multiple responses).
# .text: Extracts the actual text of the message.
print(message.content[0].text)

Sure, here's a light-hearted joke for data scientists:

Why did the data scientist break up with their significant other?

There was just too much variance in the relationship, and they couldn't find a good way to normalize it!


## **Technical Details and Considerations**
### **Streaming Advantages:**
* **Real-Time Feedback:** Streaming enables immediate feedback, which can be beneficial for user interaction, especially in chat interfaces or applications where responsiveness is crucial.
* **Resource Management:** The context manager **(with result as stream:)** ensures that the streaming session is managed efficiently, closing the stream properly once the data is fully received.

### **Temperature and Creativity:**
* The **temperature value (0.7)** is kept the same as before, ensuring a balance between creative and consistent output while streaming. It provides diversity in the response but keeps it relevant to the user prompt and system context.

### **Chunked Output:**
* The **stream.text_stream** returns chunks of text, which might not always be complete sentences. This can give the impression of the model "typing" or generating content on the fly, similar to human interaction.
* The **print()** function’s use of **flush=True** ensures the text is displayed as soon as it's received, preventing delays that might occur due to buffering.

In [28]:
# Claude 3.5 Sonnet again
# Now let's add in streaming back results
result = claude.messages.stream(
    model = "claude-3-5-sonnet-20240620",
    max_tokens = 200,
    temperature = 0.7,
    # In Claude’s API, the system message is provided separately from the user prompt, setting up the context before processing the user’s input.
    system = system_message,
    messages = [
        {"role":"user","content":user_prompt}
    ],
)

# This block sets up a context manager to handle the streaming object (result) and iterates over the text as it's streamed back from the model.

# The with statement is used to create a context manager for the stream object. 
# This ensures that resources are managed properly (e.g., closing the stream) when the block is exited.
# The result object (returned by claude.messages.stream()) is treated as a stream, which contains the real-time output from the model.
with result as stream:
    # The stream.text_stream attribute provides an iterable that returns chunks of text as they are generated by the model. 
    # This allows the output to be processed and displayed piece by piece.
    for text in stream.text_stream:
        # The print() function outputs each chunk of text as it is received.
        # end="": Ensures that no newline is added after each chunk, so the output appears as a continuous stream.
        # flush=True: Forces the output buffer to flush immediately, ensuring that each chunk is displayed in real-time, providing a smooth, incremental display of the text.
        print(text, end="", flush=True)

Sure, here's a light-hearted joke for data scientists:

Why did the data scientist break up with their significant other?

Because they had too many null hypotheses!

(Get it? Null hypothesis is a statistical concept, but it sounds like "no love this is" when said quickly!)

This joke plays on the statistical term "null hypothesis" while also touching on the stereotypical image of data scientists being more focused on their work than their personal lives. It's a bit nerdy, but should get a chuckle from a data science crowd!

In [32]:
# The API for Gemini has a slightly different structure

# Initialize a new instance of the GenerativeModel class from Google’s Generative AI library, specifically the Gemini model.
gemini = google.generativeai.GenerativeModel(
    model_name = "gemini-1.5-flash",
    system_instruction = system_message
)

#Call the generate_content method of the gemini object, passing the user prompt to generate a response based on the model's configuration and instruction.
response = gemini.generate_content(user_prompt)

# Accesse the .text attribute of the response object and prints the generated content to the console.
# response: The object returned by the generate_content method, containing the generated output from the model and possibly other related information such as tokens used or response time.
# .text: The actual text content of the response. This is the main output generated by the model in response to the user prompt.
print(response.text) # The .text attribute extracts the relevant text from the response object, making it accessible for display.

Why did the data scientist break up with the algorithm? 

Because it was always trying to **regress** their relationship! 



In [33]:
# To be serious! GPT-4o-mini with the original question

prompts = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution?"}
  ]

### **Technical Details and Considerations**
#### **Streaming Efficiency:**
* **Real-Time Feedback:** Streaming allows the user to see the response as it is being generated, reducing perceived waiting time and enhancing interactivity.
* **Efficiency:** By updating the display incrementally, users can start reading and engaging with the response even before it’s complete.
 
#### **Markdown Display:**
* The use of **Markdown** ensures that the response is formatted properly, allowing for headers, bullet points, bold text, and other formatting features to be applied dynamically as the text is streamed.
* This is particularly useful for creating engaging content, such as instructions, lists, or structured explanations.
 
#### **Data Handling and Processing:**
* The **chunk.choices[0].delta.content** provides access to the new text generated by the model since the last chunk. The code efficiently accumulates this text into the **reply** string to build the complete response over time.
* The **replace()** method is used to clean up any unwanted formatting artifacts that might interfere with the display, ensuring the output remains clean and user-friendly.

In [34]:
# Have it stream back results in markdown

stream = openai.chat.completions.create(
    model = 'gpt-4o',
    messages = prompts,
    temperature = 0.7,
    # Enable streaming mode, meaning the model’s response will be sent back incrementally instead of as a single chunk.
    stream = True
)

# Initializes an empty string (reply) to store the streamed content and sets up a display handle for updating the output in markdown format.
reply = ""

# Markdown(""): Creates an empty markdown object, which will be updated as new chunks arrive.
# display_id=True: Creates a unique identifier (display_handle) for the markdown display, allowing updates to this display as the output streams in.
display_handle = display(Markdown(""), display_id = True)

# This loop iterates over each chunk received from the streaming session, appending it to the reply string and updating the display in real-time
for chunk in stream: #Iterate over the streamed response chunks as they are generated.Each chunk represents a small part of the response from the model.
    # Accesse the content of the current chunk. The delta attribute contains the new content generated since the last update.
    # The code uses or '' to ensure that if there’s no content in a chunk, it doesn't break the process.
    # The content is appended to the reply string to build up the full response incrementally.
    reply += chunk.choices[0].delta.content or ''
    # Removes any extraneous markdown formatting markers ("```") or the word "markdown" that may appear. 
    reply = reply.replace("'''", "").replace("markdown", "")
    # Updates the display with the current accumulated content in reply formatted as markdown.
    # The display_id ensures the existing display is updated rather than creating a new one, providing a seamless, continuous display experience.
    update_display(Markdown(reply), display_id = display_handle.display_id)

Determining whether a business problem is suitable for a Large Language Model (LLM) solution involves evaluating several factors. Here are the key considerations:

1. **Nature of the Problem**:
   - **Textual Data**: LLMs are particularly effective for problems involving text. If your problem involves generating, understanding, summarizing, or translating text, an LLM might be suitable.
   - **Conversational Interfaces**: If the solution requires a chatbot or virtual assistant, LLMs can be quite effective.
   - **Content Creation**: Tasks involving writing copy, generating content ideas, or automating report generation can be well-suited for LLMs.

2. **Complexity and Ambiguity**:
   - **Complex Language Understanding**: If the problem requires deep understanding of context, nuance, or complex sentence structures, LLMs can be advantageous due to their advanced language understanding capabilities.
   - **Handling Ambiguity**: LLMs can handle cases where language is ambiguous or context-dependent, which can be challenging for simpler models.

3. **Data Availability**:
   - **Pre-existing Text Data**: Access to a large corpus of relevant text data can improve the performance of LLMs, especially if fine-tuning is needed.
   - **Privacy and Sensitivity**: Consider data privacy and compliance requirements, as LLMs may require access to sensitive information.

4. **Scalability and Cost**:
   - **Resource Requirements**: LLMs can be resource-intensive, requiring significant computational power. Evaluate whether your organization can support these requirements.
   - **Cost-effectiveness**: Consider the cost of deploying and maintaining an LLM solution against the expected benefits.

5. **Solution Requirements**:
   - **Accuracy and Reliability**: Assess whether the level of accuracy provided by LLMs meets the business needs. While LLMs can be highly accurate, they are not infallible and may produce incorrect or biased outputs.
   - **Customizability**: Determine if the model can be fine-tuned or customized to meet specific business requirements.

6. **Ethical and Compliance Considerations**:
   - **Bias and Fairness**: Consider the potential for biased outputs and ensure there are measures to mitigate these risks.
   - **Regulatory Compliance**: Ensure that the use of LLMs complies with regulations relevant to your industry, such as GDPR for data protection.

7. **Integration and Usability**:
   - **Integration with Existing Systems**: Evaluate how well an LLM solution can integrate with your current systems and workflows.
   - **User Acceptance**: Consider whether the end-users are likely to accept and effectively use the LLM-based solution.

If a business problem aligns well with these factors, it may be suitable for a solution involving an LLM. However, it’s crucial to conduct a thorough analysis, possibly starting with a pilot project, to validate the effectiveness of an LLM in addressing your specific business needs.

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [35]:
# Let's make a conversation between GPT-4o-mini and Claude-3-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4o-mini"
claude_model = "claude-3-haiku-20240307"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

### **Technical Details and Considerations**
#### **System Message:**
* The system message sets the overall behavior and context for the model, ensuring consistency in its responses. It’s crucial for guiding the model on how to respond throughout the session.
  
#### **Pairing Messages:**
* Using **zip()** to pair **gpt_messages** and **claude_messages** ensures that the conversation alternates between the assistant (GPT) and the user (Claude), creating a coherent back-and-forth structure.
  
#### **API Call:**
* The API call uses the model name stored in **gpt_model** and sends the accumulated conversation **(messages)** to generate a response. Make sure that **gpt_model** is defined correctly and the appropriate API key is set up to authenticate the request.
  
#### **Return Mechanism:**
* The function returns the content of the model’s response, which can be used or displayed further, depending on the application.

In [39]:
def call_gpt():
    # The system message sets the initial behavior and context for the model, guiding it on how to respond throughout the conversation.
    # The system message ensures that all subsequent interactions adhere to the model’s behavior and tone as set at the beginning of the session.
    messages = [{"role":"system","content":gpt_system}] 
    # This loop builds a conversation sequence where each GPT message (as the assistant) is followed by a user message (from Claude), creating a structured back-and-forth conversation.
    # zip(gpt_messages, claude_messages): This function combines the two lists (gpt_messages and claude_messages) into pairs. 
    # Each pair consists of one message from gpt_messages and one from claude_messages.
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role":"assistant","content":gpt}) # Add a message from the assistant (GPT) based on the gpt content.
        messages.append({"role":"user","content":claude}) # Add a user message based on the claude content, simulating user input or a counterpart response
    # Making the API Call: Constructe messages list to the OpenAI API using the specified model (gpt_model).
    completion = openai.chat.completions.create(
        model = gpt_model,
        messages = messages
    )
    # Returning the Generated Response
    return completion.choices[0].message.content # The .message.content attribute contains the text of the generated response from the model.

In [38]:
call_gpt()

'Oh, wow, a simple "hi." How original! What’s next? A waving emoji?'

In [40]:
def call_claude():
    messages = []
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role":"user","content":gpt})
        messages.append({"role":"assistant","content":claude_message})
    messages.append({"role":"user","content":gpt_messages[-1]})
    message = claude.messages.create(
        model = claude_model,
        system = claude_system,
        messages = messages,
        max_tokens = 500
    )
    return message.content[0].text

In [41]:
call_claude()

"Hello, it's nice to meet you! How are you doing today?"

In [42]:
call_gpt()

'Oh, great. Just what I needed—another greeting. What’s next, a weather update?'