# Basic RAG (Retrieval Augmented Generation)
Retrieval Augmented Generation is an AI framework that combines the capabilities of LLM and information retrieval systems. It is useful to answer questions or generate content leveraging external knowledge.
Why do we need RAG? LLMs can face a lot of challenges, it doesn’t have access to our internal document, it doesn’t have the most up to date information and it can hallucinate. One of the potential solutions for these problems is RAG.
When users ask a question about an internal document or a knowledge base, we retrieve relevant information from the knowledge base, where all the text embeddings are stored in a vector store, this step is called retrieval. Then in a prompt we include both the user query and the relevant information, so that our model can generate output based on the relevant context, this step is called generation.  
<img src="RAG.png" width="600" height="400"><br>
```python
! pip install faiss-cpu "mistralai>=0.1.2"
from helper import load_mistral_api_key
api_key, dlai_endpoint = load_mistral_api_key(ret_key=True)

In [1]:
# ! pip install faiss-cpu "mistralai>=0.1.2"

In [2]:
from helper import load_mistral_api_key
api_key, dlai_endpoint = load_mistral_api_key(ret_key=True)

### Parse the article with BeautifulSoup 

In [3]:
import requests
from bs4 import BeautifulSoup
import re

response = requests.get(
    "https://www.deeplearning.ai/the-batch/a-roadmap-explores-how-ai-can-detect-and-mitigate-greenhouse-gases/"
)
html_doc = response.text
soup = BeautifulSoup(html_doc, "html.parser")
tag = soup.find("div", re.compile("^prose--styled"))
text = tag.text
print(text)

How can AI help to fight climate change? A new report evaluates progress so far and explores options for the future.What’s new: The Innovation for Cool Earth Forum, a conference of climate researchers hosted by Japan, published a roadmap for the use of data science, computer vision, and AI-driven simulation to reduce greenhouse gas emissions. The roadmap evaluates existing approaches and suggests ways to scale them up.How it works: The roadmap identifies 6 “high-potential opportunities”: activities in which AI systems can make a significant difference based on the size of the opportunity, real-world results, and validated research. The authors emphasize the need for data, technical and scientific talent, computing power, funding, and leadership to take advantage of these opportunities.Monitoring emissions. AI systems analyze data from satellites, drones, and ground sensors to measure greenhouse gas emissions. The European Union uses them to measure methane emissions, environmental orga

In [4]:
file_name = "AI_greenhouse_gas.txt"
with open(file_name, 'w') as file:
    file.write(text)

### Chunking
Let's split the document into chunks. It is crucial to do so in a RAG system to be able to more effectively identify and retrieve the most relevant piece of information. Here we will simple split the text by character combining 512 characters into each chunk. Depending on our specific use cases it may be necessary to customize or experiment with different chunk sizes. Also there are various ootions in terms of how we split the text. We can split by tokens, sentences, HTML headers and others depending on our application. After this we have to make embeddings for each of these chunks.

In [5]:
chunk_size = 512
chunks = [text[i : i + chunk_size] for i in range(0, len(text), chunk_size)]

In [6]:
len(chunks)

8

### Get embeddings of the chunks
We define the get_text_embedding() function using the Mistral embeddings API endpoint to get embedding from a single text chunk.

In [7]:
import os
from mistralai.client import MistralClient


def get_text_embedding(txt):
    client = MistralClient(api_key=api_key, endpoint=dlai_endpoint)
    embeddings_batch_response = client.embeddings(model="mistral-embed", input=txt)
    return embeddings_batch_response.data[0].embedding

In [8]:
import numpy as np

text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])

In [9]:
text_embeddings

array([[-0.03274536,  0.04751587,  0.04489136, ..., -0.03289795,
         0.02278137, -0.01459503],
       [-0.03631592,  0.05548096,  0.03271484, ..., -0.03125   ,
         0.01594543, -0.01722717],
       [-0.04876709,  0.04779053,  0.05670166, ...,  0.0046463 ,
         0.0184021 , -0.01251984],
       ...,
       [-0.02597046,  0.04049683,  0.03543091, ..., -0.01013184,
        -0.00962067, -0.00917053],
       [-0.03025818,  0.0541687 ,  0.06280518, ..., -0.00900269,
        -0.00782776, -0.00432587],
       [-0.02456665,  0.05093384,  0.04879761, ..., -0.0064888 ,
         0.02600098, -0.01386261]])

In [10]:
len(text_embeddings[0])

1024

### Store in a vector databsae
Then we use the list comprehension to get text embeddings for all text chunks. The embedding are of 1024 dimension. [Faiss](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/)

In [11]:
import faiss

d = text_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embeddings)

### Embed the user query
For storing the embeddings in vector database we will use Faiss library, this is a common practice to store the embeddings for efficient processing and retrieval. With Faiss we define an instance of index class with the embedding dimension as the argument. We then add the text embeddings to the indexing structure.

In [12]:
question = "What are the ways that AI can reduce emissions in Agriculture?"
question_embeddings = np.array([get_text_embedding(question)])

In [13]:
question_embeddings

array([[-0.00073624,  0.04116821,  0.04318237, ..., -0.02453613,
         0.01029968,  0.00930023]])

### Search for chunks that are similar to the query
When user asks question, we also need to create embeddings for this question using the same embedding model as before. Now we can retireve text chunks from the vector database that's similar to the question we asked.

In [14]:
D, I = index.search(question_embeddings, k=2)
print(I)

[[4 5]]


We can perform a search on the vector database with index.search. This function returns the distances and the indices of the k most similar vectors to the question vector in the vector database.

In [15]:
retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
print(retrieved_chunk)

['data to help factories use more recycled materials, cut waste, minimize energy use, and reduce downtime. Similarly, they can optimize supply chains to reduce emissions contributed by logistics.\xa0Agriculture.\xa0Farmers use AI-equipped sensors to simulate different crop rotations and weather events to forecast crop yield or loss. Armed with this data, food producers can cut waste and reduce carbon footprints. The authors cite lack of food-related datasets and investment in adapting farming practices as primary b', 'arriers to taking full advantage of AI in the food industry.Transportation.\xa0AI systems can reduce greenhouse-gas emissions by improving traffic flow, ameliorating congestion, and optimizing public transportation. Moreover, reinforcement learning can reduce the impact of electric vehicles on the power grid by optimizing their charging. More data, uniform standards, and AI talent are needed to realize this potential.Materials.\xa0Materials scientists use AI models to stu

Then based on the return indices, we can retrieve the actual relevant text chunks that correspond to those indices. In the response we get two text chunks because we defined k=2 to retireve the two most similar vectors in the vector database.
There are lot of different retireval strategies. Here we used a simple similarity search with embeddings. Depending on our use case sometimes we might want to perform metadata filtering first, or provide weights to the retrieved documents, or retrieve a larger parent child that original retrieved chunks belong to.

In [16]:
prompt = f"""
Context information is below.
---------------------
{retrieved_chunk}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""

Finally we can offer the retrieved text chunks as the context information within the prompt. In the prompt template we can include both the retrieved text chunks and the user question in a prompt.

In [17]:
from mistralai.models.chat_completion import ChatMessage


def mistral(user_message, model="mistral-small-latest", is_json=False):
    client = MistralClient(api_key=api_key, endpoint=dlai_endpoint)
    messages = [ChatMessage(role="user", content=user_message)]

    if is_json:
        chat_response = client.chat(
            model=model, messages=messages, response_format={"type": "json_object"}
        )
    else:
        chat_response = client.chat(model=model, messages=messages)

    return chat_response.choices[0].message.content

In [18]:
response = mistral(prompt)
print(response)

In the context provided, AI can reduce emissions in agriculture by helping farmers use AI-equipped sensors to simulate different crop rotations and weather events. This allows them to forecast crop yield or loss, which in turn enables food producers to cut waste and reduce their carbon footprints.


With the prompt we get a response. And this is how RAG works from scratch.

If we are developing a complex application where RAG is one of the tools we can call, or if we have multiple RAGs as multiple tools we can call, then we may consider using RAG in setup function calling.

## RAG + Function calling

In [19]:
def qa_with_context(text, question, chunk_size=512):
    # split document into chunks
    chunks = [text[i : i + chunk_size] for i in range(0, len(text), chunk_size)]
    # load into a vector database
    text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])
    d = text_embeddings.shape[1]
    index = faiss.IndexFlatL2(d)
    index.add(text_embeddings)
    # create embeddings for a question
    question_embeddings = np.array([get_text_embedding(question)])
    # retrieve similar chunks from the vector database
    D, I = index.search(question_embeddings, k=2)
    retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
    # generate response based on the retrieve relevant text chunks

    prompt = f"""
    Context information is below.
    ---------------------
    {retrieved_chunk}
    ---------------------
    Given the context information and not prior knowledge, answer the query.
    Query: {question}
    Answer:
    """
    response = mistral(prompt)
    return response

In [20]:
I.tolist()

[[4, 5]]

In [21]:
I.tolist()[0]

[4, 5]

The above function is a wrap up of the RAG knowledge.

In [22]:
import functools

names_to_functions = {"qa_with_context": functools.partial(qa_with_context, text=text)}

Then we organize this function into a dictionary, this might not look that useful with just one function, but if we have multiple tools or functions, this is very useful to organize them into one dictionary.

In [23]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "qa_with_context",
            "description": "Answer user question by retrieving relevant context",
            "parameters": {
                "type": "object",
                "properties": {
                    "question": {
                        "type": "string",
                        "description": "user question",
                    }
                },
                "required": ["question"],
            },
        },
    },
]

We can outline the function specs with a JSON schema to tell the model what this function is about.

In [24]:
question = """
What are the ways AI can mitigate climate change in transportation?
"""

client = MistralClient(api_key=api_key, endpoint=dlai_endpoint)

response = client.chat(
    model="mistral-large-latest",
    messages=[ChatMessage(role="user", content=question)],
    tools=tools,
    tool_choice="any",
)

response

ChatCompletionResponse(id='64d6c8310b164881bad2e670dd01138b', object='chat.completion', created=1715972374, model='mistral-large-latest', choices=[ChatCompletionResponseChoice(index=0, message=ChatMessage(role='assistant', content='', name=None, tool_calls=[ToolCall(id='Ceah0Qorr', type=<ToolType.function: 'function'>, function=FunctionCall(name='qa_with_context', arguments='{"question": "What are the ways AI can mitigate climate change in transportation?"}'))]), finish_reason=<FinishReason.tool_calls: 'tool_calls'>)], usage=UsageInfo(prompt_tokens=92, total_tokens=126, completion_tokens=34))

In [25]:
tool_function = response.choices[0].message.tool_calls[0].function
tool_function

FunctionCall(name='qa_with_context', arguments='{"question": "What are the ways AI can mitigate climate change in transportation?"}')

Now we pass the user question and the tool to the model, we get two call results with the function name and the arguments in our user question.

In [26]:
tool_function.name

'qa_with_context'

In [27]:
import json

args = json.loads(tool_function.arguments)
args

{'question': 'What are the ways AI can mitigate climate change in transportation?'}

In [28]:
function_result = names_to_functions[tool_function.name](**args)
function_result

'The context information does not provide specific details on how AI can mitigate climate change in transportation. However, it does mention that AI has a role to play in reducing greenhouse gas emissions in various sectors including manufacturing, food production, and transportation. According to the roadmap published by the Innovation for Cool Earth Forum, there are six high-potential opportunities for AI to help reduce greenhouse gas emissions, but it does not specify which of these opportunities pertain specifically to the transportation sector.'

## More about RAG
To learn about more advanced chunking and retrieval methods, you can check out:
- [Advanced Retrieval for AI with Chroma](https://learn.deeplearning.ai/courses/advanced-retrieval-for-ai/lesson/1/introduction)
  - Sentence window retrieval
  - Auto-merge retrieval
- [Building and Evaluating Advanced RAG Applications](https://learn.deeplearning.ai/courses/building-evaluating-advanced-rag)
  - Query Expansion
  - Cross-encoder reranking
  - Training and utilizing Embedding Adapters
