# Building Solutions with LLMs and Retrival Augmented Geenration (RAG): Introduction

In this workshop we will use notebooks and Python scripts to interactively learn about Large Language Models and RAG.

Large Language Models (LLMs) are a type of machine learning models designed to understand and generate human language. They are trained on massive datasets of text to predict and generate language based on given prompts, learning patterns, structures, and relationships in text to produce human-like responses. They can be used to generate text, answer questions, and more.


### Getting started with Jupyter Notebooks
Notebooks are one of the most imporant tools in teh toolbelt of a data scientist of ML engineer.
They are great for interacting with code and visualizing results.
This file you are interacting with is a jupyter notebook. In Jupter you can have cells of either text or code.
Cells with code can be run by pressing ctrl+enter.



# Task 1
Interact with the cell below and run it multiple times to see the results.

In [6]:
# define a if it does not exist
if  'a' not in globals():
    a = 42

a = a+1

# press ctrl+enter to run the cell, do that multiple times
# notice how the variable `a` is persistent in the cell
# you can run full programs in a notebook.

# Task 1.2 
# To restart the kernel of the notebook you can use the restart button in the top right corner of the screen
# Do that and run the cell again to see that the value of a goes back to 43

a

48

## MistralAI

In this workshop, we will be using Mistral langauge models, which are similar in concept to OpenAI's ChatGPT and Anthropic's Claude.

To start working with Mistral, you first need to install the library. You can install any library in a notebook like the example below:

In [7]:
!pip install mistralai



### Getting access
LLMs are usually big and use a lot of computer resourcesthe most common way to use them is to run them in teh cloud via network calls.
For the sake of this workshop we will use the cloud version as we dont need to download big models.
If you are interested in running llms locally check [ollama](https://github.com/ollama/ollama), arguably the most advanced project that does it.


#### API key
The second  step is to get a Mistral api key. You can find some APIs keys we prepared for this workshop in this [sheet](https://docs.google.com/spreadsheets/d/1ZwTpkG6OOuVrOx8nzPmgai_7Hwpo8Kun7yZrOmg_5K4/edit?gid=0#gid=0). Get the key (please write your name next to it in the sheet such that people know it is taken) and write it to the .env file using the command


# Task 2

Use the cell below to write your api key into the file so you can use mistral.
We write the variable into a file named .env. .env files a are a standard in python to load information like api keys and passwords that should not stay with the code for security reasons.

In [2]:
# write file .env

mistral_api_key = 'KEY PLACEHOLDER - REPLACE WITH YOUR KEY'
with open('../.env', 'w') as f:
    f.write(f'MISTRAL_API_KEY={mistral_api_key}')

# make sure to delete teh API key from this cell before commiting the file so you dont save your key in the repo which is a security risk.

## Testing if it worked
Now run the cell below 

In [None]:
# we now reload the the configuration file and it should show your key
from dotenv import load_dotenv, find_dotenv
# Initialize models and RAG
env_file = find_dotenv()
print(f"Loading environment variables from {env_file}")
load_dotenv(env_file)

import os
env =os.getenv('MISTRAL_API_KEY')

# you should see your key here not
print("The key is: ", env)



## Running mistral
Let's run the code below to import Mistral and initialize the Mistral client: 

In [2]:
import os
from mistralai import Mistral

# Retrieve the Mistral API key from the environment variables
mistral_api_key = os.getenv('MISTRAL_API_KEY')

# Initialize the Mistral client with the API key
mistral_client = Mistral(api_key=mistral_api_key)

# The model below is the specific model we want to use
model_name = "mistral-small-latest"

# The code below defines a function `call_mistral_model` that sends a message to a Mistral model and returns the model's response text.
def call_mistral_model(message):
    response = mistral_client.chat.complete(
        model = model_name,
        messages = [
            {
                "role": "user",
                "content": message,
            }
            ]
        )
    # Extract only the text from the response
    response_text = response.choices[0].message.content
    return response_text

In [3]:
# Let's test it out
response = call_mistral_model("hello! What is your name?")
# Print the response from the Mistral model
print(response)

## not how the result resembles natural language. Its the exact same concept as chatgpt.
## feel free to play around with the prompt and see how the results change.

Hello! You can call me Assistant. How can I help you today?


### Rate Limiting

When using APIs, you might encounter rate limiting. Rate limiting is a mechanism implemented by APIs to control the number of requests a user can make in a given time frame. This is done to prevent abuse so 1 player dont make the whole system worse.
However, rate limiting can be annoying because it can interrupt your workflow and force you to wait before making more requests.

To make things easier, we've implemented a LargeLanguageModel class (see usage example below) that automatically adds sleep intervals between requests to avoid exceeding the rate limit. We will use the LargeLanguageModel class moving forward to make calls to Mistral AI. This class has the same logic as the examples we showed above but includes a mechanism to counter the rate limiting issue. 

You don't need to understand all the code in the class to use it. Feel free to have a look teh class by ctrl+ clicking in LargeLanguageModel if you are curious.
There are libraries that deal with rate limiting out of the box like [lang-chain](https://github.com/hwchase17/langchain) (more on it later).

In [17]:
from chat_solution.llm import LargeLanguageModel

# Initialize an instance of the LargeLanguageModel class
llm = LargeLanguageModel()

# Make a call to Mistral using the LargeLanguageModel class
# This class includes logic to counteract rate limiting by adding appropriate sleep intervals between requests
response = llm.call("hello! What is your name?")

# Print the response from the Mistral model
print(response)

Hello! You can call me Assistant. How can I help you today?


## Exploring the LLM
Now that we have seen how to make a basic call to the Mistral model using the `LargeLanguageModel` class, let's try some more prompts to see how the model responds to different types of queries.

# Task 3: Exploring and Modifying Prompts
Below are some example use cases of how to use an LLM such as Mistral. Play around with the prompts and see the results. Modify the prompts to see how the model's responses change. This will help you understand how to craft effective prompts and get the desired output from the model.

Try to:
- Ask different types of questions
- Change the text for summarization or extraction (see examples 2 and 3 below)
- Alter the style of the response

#### Example 1: Asking for Information

In [18]:
# Example: Asking for information
prompt = "Can you tell me about coding school 42Berlin?"
response = llm.call(prompt)
print(response)

Certainly! 42Berlin is a coding school that follows the educational model pioneered by École 42 in France. Here are some key features and aspects of 42Berlin:

### Educational Model
- **Peer-to-Peer Learning**: The school emphasizes collaborative learning, where students help and teach each other. This fosters a community-driven environment and encourages a deep understanding of concepts.
- **Project-Based**: The curriculum is heavily project-based, with students working on various coding and programming projects to apply what they've learned.
- **No Traditional Classes or Teachers**: There are no formal lectures or traditional teachers. Instead, students learn through hands-on projects and interactive problem-solving.

### Admission Process
- **Piscine (Pool)**: The admission process involves a rigorous, intensive selection phase called the "Piscine" (Pool), which typically lasts several weeks. During this time, applicants work on coding challenges and projects to demonstrate their ap

#### Example 2: Summarizing a Given Text

In [19]:
# Change the text into something else to see the results
text_to_summarize = (
    """
    42Berlin is a non-profit coding school offering software engineering education completely tuition free. 
    By making tech education more accessible and inclusive, they empower the next generation of coders.
    Founded in 2021 and based in central Neukölln, we train our students up to the equivalent of Master’s level 
    and implement peer-learning methodologies that give autonomy to each student.
    """

)
prompt = f"Summarize the following text in one brief sentence: {text_to_summarize}"
response = llm.call(prompt)
print(response)

42Berlin is a tuition-free, non-profit coding school empowering the next generation of software engineers through inclusive, peer-learning education, aiming for Master's level competency.


#### Example 3: Extracting Information from a Given Text

In [20]:
# Change the text into something else to see the results
text_to_extract_from = (
    """
    42Berlin is a non-profit coding school offering software engineering education completely tuition free. 
    By making tech education more accessible and inclusive, they empower the next generation of coders.
    Founded in 2021 and based in central Neukölln, we train our students up to the equivalent of Master’s level 
    and implement peer-learning methodologies that give autonomy to each student.
    """

)
prompt = f"Extract the year 42Berlin was founded from the following text: {text_to_extract_from}"
response = llm.call(prompt)
print(response)

The year 42Berlin was founded is 2021.



# Task 3.1: Doing maths -> When will the LLM make a mistake?

LLMs are trained on text language so they are not necessarily good with maths. We can be confident that it will fail to produce the correct result if you ask a question that is complex enough. On the notebook below we increase the complexity of the task at every run. Run it until you see it diverging.

What was te number of iterations necessary to make it break?

In [23]:

if 'a' not in globals():
    a = 1
    b = 1
a = a + 1
b = b + 1

response = llm.call(f"what is the results of {a} ** {b}")

print("Correct result: ", a**b)
print("Mistral result: ", response)

Correct result:  827240261886336764177
Mistral result:  The result of 17 raised to the power of 17 (17^17) is a very large number. Here it is:

17^17 = 4,299,816,961,406,251,776,668,926,984,275,280,160,745,629,199,685,440,162,560,608,000,000

This number has 53 digits.


## Hallucination
LLMs sometimes generate responses that are plausible-sounding but factually incorrect or nonsensical. This phenomenon is known as "hallucination". 
Hallucination can occur because the model generates text based on patterns in the training data rather than actual knowledge or retrieval of relevant information.
LLMs will produce the most likelly words that they would find in similar text, they have no clue about fact vs fiction. They can confidently produce writing about things they have no clue about.


### Exercise 2: Demonstrating Hallucination

In this exercise, we will ask the model a question that it might to hallucinate an answer for, showing the limitations of relying solely on language generation without retrieval.

Try running the command below a few times in a row and see how the response by the LLM changes.



In [26]:
# Ask a question likely to cause hallucination
prompt = "What was the name of the workshop launched by the MLOps Community Berlin in collaboration with Girls in Tech?"
response = llm.call(prompt)
print("Response likely to hallucinate:\n")
print(response)

Response likely to hallucinate:

The workshop launched by the MLOps Community Berlin in collaboration with Girls in Tech was named "MLOps Workshop for Women." This event was designed to empower women in the field of Machine Learning and Operations (MLOps) by providing practical skills, networking opportunities, and mentorship. It is part of both organizations' efforts to increase diversity and inclusion in the tech industry.


Now, let's move on to the next section on Retrieval-Augmented Generation.

# Retrieval-Augmented Generation (RAG)

Large language models (LLMs) can sometimes hallucinate, presenting false information due to outdated training data. Retrieval-Augmented Generation (RAG) allows us to incorporate external information to mitigate these challenges. 

In RAG, a retrieval component searches and retrieves relevant information from a knowledge base or external documents, and a generation component uses this information to generate responses.
This approach allows the model to access up-to-date information and provide more detailed and accurate answers.

### Exercise 4: Simple RAG

In this exercise, we will demonstrate a simple example of how to use Retrieval-Augmented Generation. We will use a predefined set of documents, retrieve relevant information based on a query, and then generate a response using the retrieved information.

Run the code below.

In [27]:
def create_rag_prompt(message: str, context: str):
    """
    Message is the question that the user is asking.
    Context is the information that we want to use to answer the question.
    """
    return f"""Answer the question only using the provided content.

        Context: {context}

        User Question: {message}

        Be helpful and friendly. If the information cannot be found respond with "I don't know"
        """  

By running the code in the cell below, you can compare how our LLM responses differ by the information that you provided.

In [29]:
# The workshop the MLOps Community hosted together with Girls in Tech Germany was called "AI Launchpad: Building Your First Ml Pipeline" or simply "Building Your First ML Pipeline"
# We copy paste the info from our Eventbrite event page from the previous workshop and use this as context for the model to retrieve the right info from
context = """
AI Launchpad - Building Your First ML Pipeline: 
On Wednesday, June 5th, 2024, the MLOps Community Berlin in collaboration with Girls in Tech Germany hosted an interactive workshop for beginners who want to kick start their career in AI/ML. 
The workshop starts at 18.00h at 42Berlin. 

🔍 Why Attend?

Gain hands-on experience building your first ML pipeline in an agile way
Apply the fundamentals of statistical modeling and basic Python
Opportunities to improve your portfolio 
Connect with ML professionals at different levels of seniority


✨ The Agenda: 

6:00 pm - Arrive & Pizza 
6:30 pm -  Introduction MLOps and GiT
6:45 pm - Workshop Introduction
7:30 pm - Break
7:45 pm - Workshop
9:45 pm - Networking


🎉 Highlights:

Food and drinks provided
Engaging discussions and networking opportunities
Bring your laptop and get ready to learn!


💼 Who Should Attend?

Individuals starting their career in Machine Learning or Artificial Intelligence
Those looking to transition into the field of AI/ML
Anyone interested in contributing to and learning from the ML community
Don't miss out on this chance to gain practical AI/ML skills while expanding your professional network! 
"""

message = "What was the name of the workshop launched by the MLOps Community Berlin in collaboration with Girls in Tech?"


generic_response = llm.call(message)
print(f"GENERIC RESPONSE:\n {generic_response}")

rag_prompt = create_rag_prompt(message=message, context=context)
rag_response = llm.call(rag_prompt)

print("-" * 30)
print(f"RAG RESPONSE:\n {rag_response}")

GENERIC RESPONSE:
 The workshop launched by the MLOps Community Berlin in collaboration with Girls in Tech was named "Machine Learning Operations (MLOps) Workshop." This event aimed to provide participants with practical insights and hands-on experiences in the field of MLOps, highlighting the importance of operationalizing machine learning models in a scalable and efficient manner.
Error happened while calling the model: API error occurred: Status 429
{"message":"Requests rate limit exceeded"}
Rate limit error: API error occurred: Status 429
{"message":"Requests rate limit exceeded"}
Waiting 2 seconds before retrying
------------------------------
RAG RESPONSE:
 I don't know


In [30]:
# Let's try the same thing with Berlin weather data!
context = """
The weather in Berlin  December of 2027 will be around 13 degrees Celsius.
Specific dates:
- 10th of December: 10 degrees Celsius
- 15th of December: 15 degrees Celsius
- 20th of December: 7 degrees Celsius
"""

message = "What will be the weather in Berlin on the 10th of December of 2027?"


generic_response = llm.call(message)
print(f"GENERIC RESPONSE:\n {generic_response}")

rag_prompt = create_rag_prompt(message=message, context=context)
rag_response = llm.call(rag_prompt)

print("-" * 30)
print(f"RAG RESPONSE:\n {rag_response}")

Error happened while calling the model: API error occurred: Status 429
{"message":"Requests rate limit exceeded"}
Rate limit error: API error occurred: Status 429
{"message":"Requests rate limit exceeded"}
Waiting 2 seconds before retrying
GENERIC RESPONSE:
 I'm an assistant that operates solely on the data it has been trained on up until 2021, and I don't have real-time web browsing capabilities or the ability to predict future weather with certainty. Therefore, I can't provide the specific weather for Berlin on the 10th of December, 2027.

For the most accurate information, I recommend checking a reliable weather forecasting website or service closer to the date.
------------------------------
RAG RESPONSE:
 Hello! On the 10th of December 2027, the weather in Berlin is expected to be around 10 degrees Celsius.


Now try it yourself! Can you find some content on the internet (think, for example, news articles or very specific, locally relevant information that the LLM normally would not have access to). 

Play around with it and let the creative juices flow. Can you discover some more use cases for which you can use RAG can help make our LLM smarter?

In [31]:

context = """""" # Add your context here

message = "" # Add your message here


generic_response = llm.call(message)
print(f"GENERIC RESPONSE:\n {generic_response}")

rag_prompt = create_rag_prompt(message=message, context=context)
rag_response = llm.call(rag_prompt)

print("-" * 30)
print(f"RAG RESPONSE:\n {rag_response}")

GENERIC RESPONSE:
 Q: What is the difference between a software engineer and a software developer?

A: The terms "software engineer" and "software developer" are often used interchangeably, but there can be some differences in the context of their roles and responsibilities.

1. **Education and Background**:
   - **Software Engineer**: Often has a formal education in software engineering or a related field. They may have a Bachelor's or Master's degree in software engineering, computer science, or a similar discipline.
   - **Software Developer**: May have a degree in computer science or a related field, but the educational background can vary. Some developers may be self-taught or have degrees in unrelated fields but have acquired relevant skills through experience.

2. **Approach and Methodologies**:
   - **Software Engineer**: Tends to focus on the engineering aspects of software development, including system design, algorithms, data structures, and software architecture. They often

# That's it! 

RAGs enrich the prompt with additional information about the topic to generate responses. The external information can come from various sources, such as PDFs, Google search results, social media posts, and more. With that, we’ve built a simple Q&A RAG.

## ===> Now head to notebook 2 on embedddings and how llms undertand language

