# Building Solutions with LLMs and RAG: Introduction

In this workshop we will use notebooks and Python scripts to interactively learn about Large Language Models and RAG.

Large Language Models (LLMs) are a type of machine learning models designed to understand and generate human language. They are trained on massive datasets of text to predict and generate language based on given prompts, learning patterns, structures, and relationships in text to produce human-like responses. They can be used to generate text, answer questions, and more.

Retrieval-Augmented Generation (RAG) combines language generation with real-time data retrieval, allowing models to access external sources or databases to provide more accurate, contextually relevant answers.

RAG combines two main components:
- Retrieval: This component searches and retrieves relevant information from external databases or documents.
- Generation: This component uses the retrieved information to generate more accurate and contextually relevant responses.



## Getting started with Jupyter notebook
First of all, let's make sure you understand the Jupyter notebook interface.
In Jupter you can have cells of either text or code.
You can type any python code in a cell and press shift + enter to run it.

Interact with the cell below and run it multiple times to see the results.

In [20]:
a = 42 if a is None else a
a = a + 2 
print(a)

64


You can also import libraries and use them in the cells:

In [1]:

from datetime import datetime
datetime.now().strftime('%Y-%m-%d %H:%M:%S')

'2024-11-17 09:44:00'

## MistralAI

In this workshop, we will be using MistralAI's LLMs, which are similar in concept to OpenAI's ChatGPT and Anthropic's Claude.

To start working with Mistral, you first need to install the library. We did that for you already.

```bash
pip install mistralai
```


The second  step is to get a Mistral api key. You can find some APIs keys we prepared for this workshop in this [doc](https://docs.google.com/document/d/1IjC8TbvxWCoZ0dAKyaTKSdnoQg12kn0D-OWkm1f6i4Y/edit?tab=t.0). Get the key and write it to the .env file using the command

```
echo 'MISTRAL_API_KEY="your_api_key_here"' >> .env
```

You can run LLMs on your local machine or use the cloud version.
For the sake of this workshop we will use the cloud version as we dont need to download big models.

Let's run the code below to import Mistral and initialize the Mistral client: <br> _(Note: If you run into a ModuleNotFoundError when trying to run the code below, run the command pip install -r requirements.txt in your terminal and try again after it finishes)_

In [8]:
import os
from mistralai import Mistral

# Retrieve the Mistral API key from the environment variables
mistral_api_key = os.getenv('MISTRAL_API_KEY')

# Initialize the Mistral client with the API key
mistral_client = Mistral(api_key=mistral_api_key)

# The model below is the specific model we want to use
model_name = "mistral-small-latest"

The code below defines a function `call_mistral_model` that sends a message to a Mistral model and returns the model's response text.

In [9]:
def call_mistral_model(message):
    response = mistral_client.chat.complete(
        model = model_name,
        messages = [
            {
                "role": "user",
                "content": message,
            }
            ]
        )
    # Extract only the text from the response
    response_text = response.choices[0].message.content
    return response_text

In [10]:
# Let's test it out
response = call_mistral_model("hello! What is your name?")

# Print the response from the Mistral model
print(response)

Hello! You can call me Assistant. How can I help you today?


## Understanding Rate Limiting

When using APIs, you might encounter rate limiting. Rate limiting is a mechanism implemented by APIs to control the number of requests a user can make in a given time frame. This is done to prevent abuse and ensure fair use of the API resources. 

However, rate limiting can be annoying because it can interrupt your workflow and force you to wait before making more requests.

To make things easier, we've implemented a LargeLanguageModel class (see usage example below) that automatically adds sleep intervals between requests to avoid exceeding the rate limit. We will use the LargeLanguageModel class moving forward to make calls to Mistral AI. This class has the same logic as the examples we showed above but includes a mechanism to counter the rate limiting issue. 

You don't need to understand all the code in the class, but feel free to have a look in chat_solution.llm in case you're curious.

In [5]:
from chat_solution.llm import LargeLanguageModel

# Initialize an instance of the LargeLanguageModel class
llm = LargeLanguageModel()

# Make a call to Mistral using the LargeLanguageModel class
# This class includes logic to counteract rate limiting by adding appropriate sleep intervals between requests
response = llm.call("hello! What is your name?")

# Print the response from the Mistral model
print(response)

Hello! You can call me Assistant. How can I help you today?


# Retrieval Augmented Generation (RAG)

Large language models (LLMs) can sometimes hallucinate, presenting false information due to outdated training data. Retrieval-Augmented Generation (RAG) allows us to incorporate external information to mitigate these challenges. In this task, we will create a simple Q&A RAG that utilizes knowledge from a PDF to enrich its answers.

Now that we have the text, we can begin enriching the prompt to make our LLM even smarter!

In [54]:
def create_rag_prompt(message: str, context: str):
    """
    Message is the question that the user is asking.
    Context is the information that we want to use to answer the question.
    """
    return f"""Answer the question only using the provided content.

        Context: {context}

        User Question: {message}

        Be helpful and friendly. If the information cannot be found respond with "I don't know"
        """  

Below you can compare how our LLM differs the answers by the information that you provided

In [4]:
context = """
The weather in Berlin  December of 2027 will be around 13 degrees Celsius.
Specific dates:
- 10th of December: 10 degrees Celsius
- 15th of December: 15 degrees Celsius
- 20th of December: 7 degrees Celsius
"""
message= "What will be the weather in Berlin on the 10th of December of 2027?"


generic_response = call_mistral_model(message)
print(f"GENERIC RESPONSE:\n {generic_response}")

rag_prompt = create_rag_prompt(message=message, context=context)
rag_response = call_mistral_model(rag_prompt)

print("-" * 30)
print(f"RAG RESPONSE:\n {rag_response}")

NameError: name 'call_mistral_model' is not defined

# That's it! 

RAGs enrich the prompt with additional information about the topic to generate responses. The external information can come from various sources, not just PDFs, such as Google search results, social media posts, and more. With that, we’ve built a simple Q&A RAG. In the next chapter, we will scale it up to include even more context.
In the next notebook you fill find how to prepare data for RAGs.