## Building a Retrieval-Augmented Generation (RAG) System with LangChain

### Introduction

In this notebook, we will learn how to build a Retrieval-Augmented Generation (RAG) system using LangChain in Python. RAG systems combine information retrieval and natural language generation to produce answers that are grounded in external knowledge bases. This approach is particularly useful when dealing with large documents or datasets where direct querying isn’t efficient or possible.

### Objectives

- Understand the concept of Retrieval-Augmented Generation (RAG).
- Learn how to use LangChain to implement a RAG system.
- Implement the system step by step with guided TODO tasks.
- Test your implementation at each step.
- Provide helpful explanations and definitions.

Help

### Methods Used:

- LangChain: A library for building language model applications.
- VectorStore (FAISS): A tool for efficient similarity search and clustering of dense vectors.
- Sentence-Transformers: Used to create embeddings from text without relying on OpenAI.
- RetrievalQA Chain: Combines retrieval and question-answering over documents.

### Data Used

- I extracted some chapters of the Gen AI course as a txt file. 
- The goal how this notebook is to build a RAG system that can answer questions based on the content of these chapters.

## Step 1: Set Up Your Environment

We first need to import all the necessary modules and set up the environment. The environment variable for the GOOGLE_API_KEY is loaded from the .env file to interact with Google’s Gemini model.

In [1]:
# Import necessary libraries

import os
from dotenv import load_dotenv
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_core.documents import Document
from typing import List


In [2]:
# Charger les variables d'environnement depuis le fichier .env
load_dotenv()

# Charger la clé API de Google
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

# Initialisation de Gemini via LangChain
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",  # Utilisez le modèle Gemini adéquat
    temperature=0.0,
    max_tokens=150,
    api_key=GOOGLE_API_KEY
)

## Step 2: Load and Split Documents

Load the document you want to use and split it into manageable chunks.

Here, we load the course chapters from a .txt file and split the text into smaller chunks using RecursiveCharacterTextSplitter. This helps us manage large documents efficiently for later retrieval.

In [3]:
# TODO: Load your document and split it into chunks
# Hint: Use TextLoader and RecursiveCharacterTextSplitter

# Charger et diviser le document
filename = "../data/gen_ai_course.txt"  # Remplacer par le chemin du fichier
loader = TextLoader(filename)
documents = loader.load()

# Diviser le document en morceaux plus petits
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

## Step 3: Create Embeddings and Build the VectorStore

Generate embeddings for each chunk and store them in a vector store for efficient retrieval.

We now generate embeddings for each chunk of text using Sentence-Transformers, which is an effective and efficient way to create vector representations of text. The embeddings are then stored in a FAISS index for fast retrieval.

In [4]:
# TODO: Create embeddings and store them in a VectorStore
# Hint: Use OpenAIEmbeddings and FAISS

# 3. Création des embeddings avec Sentence-Transformers
# Utiliser le modèle `all-MiniLM-L6-v2` qui est léger et efficace pour des embeddings de documents
model = SentenceTransformer('all-MiniLM-L6-v2')

# Générer les embeddings pour chaque document
embeddings = model.encode([doc.page_content for doc in docs])

# Convertir les embeddings en numpy array pour FAISS
embeddings_np = np.array(embeddings).astype('float32')

# Créer un index FAISS
index = faiss.IndexFlatL2(embeddings_np.shape[1])  # L2 distance (distance euclidienne)
index.add(embeddings_np)

## Step 4: Set Up the QA Chain using LCEL 

Create a chain that can retrieve relevant chunks.

The next step is to set up a RetrievalQA chain that combines the information retrieval process with the language generation model. The chain will search for the most relevant documents and pass them to Gemini for question-answering.

In [5]:
# TODO: Create a RetrievalQA chain

# 4. Fonction pour interroger le modèle Gemini avec les documents récupérés
def query_to_gemini(query: str, top_k: int = 3):
    # Générer l'embedding de la requête
    query_embedding = model.encode([query]).astype('float32')

    # Recherche des documents les plus proches dans l'index FAISS
    D, I = index.search(query_embedding, top_k)

    # Récupérer les documents les plus proches
    relevant_docs = [docs[i] for i in I[0]]

    # Créer un prompt structuré avec les documents pertinents
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant that answers questions based on the provided documents."),
        ("user", f"Here are some documents: {format_docs(relevant_docs)}\n\nQuestion: {query}")
    ])

    # Formater le prompt pour l'API LangChain
    formatted_prompt = prompt.format_messages(query=query)

    # Interroger Gemini via LangChain pour obtenir une réponse
    ai_msg = llm.invoke(formatted_prompt)

    # Vérifier si le message renvoyé par Gemini contient la réponse
    if hasattr(ai_msg, 'content'):
        return ai_msg.content
    else:
        return "Erreur : la réponse générée ne contient pas de texte valide."

## Step 5: Ask Questions and Get Answers

Test the system by asking a question.

Once the system is set up, we can test it by asking questions. The system will retrieve relevant documents, generate the embeddings for the query, and pass them to Gemini for generating a response.

In [6]:
# TODO: Ask a question to the QA chain
# Replace 'Your question here' with an actual question and run the qa_chain for this question

# 5. Fonction pour formater les documents à afficher dans le prompt Gemini
def format_docs(docs: List[Document]):
    return "\n\n".join(doc.page_content for doc in docs)

# Answer:
query = "What is the main topic discussed in the document?"
result = query_to_gemini(query)
print(f"result : {result}")


result : The main topic appears to be RAG (Retrieval Augmented Generation) techniques, specifically focusing on different aspects like hierarchical indices, hypothetical question answering (HyDE), and choosing chunk size for document processing.  It also mentions tools and libraries related to document loading and parsing within the context of RAG.



## Step 6: Test Your Implementation with Different Questions

Try out different questions to see how the system performs.

You can experiment with various questions to evaluate the system's performance and how well it retrieves relevant information.

In [7]:
# Replace 'Another question here' with your own question and run the qa_chain for this question

query = "Can you summarize the key points mentioned?"
result = query_to_gemini(query)
print(f"result : {result}")

result : This document describes the self-attention mechanism, a key component in models like Transformers.  Here's a breakdown:

* **Self-Attention Components:**  Self-attention involves three main elements:
    * **Key (K):** Represents what information a word/token *has*.  Generated by multiplying the embedding (E) by a key matrix (WK).
    * **Query (Q):** Represents what information a word/token is *looking for*. Generated by multiplying the embedding (E) by a query matrix (WQ).
    * **Value (V):** Represents the information a word/token *reveals* to other words/tokens. Generated by multiplying the embedding (E) by a value matrix


## Step 7: Improve the System

You can experiment with different parameters, like adjusting the chunk size or using a different language model.

Conclusion

Congratulations! You’ve built a simple Retrieval-Augmented Generation system using LangChain. This system can retrieve relevant information from documents and generate answers to user queries.

Help

- TextLoader: Loads text data from files.
- RecursiveCharacterTextSplitter: Splits text into smaller chunks for better processing.
- FAISS: A library for efficient similarity search of embeddings.
- RetrievalQA Chain: A chain that retrieves relevant documents and answers questions based on them.
- Sentence-Transformers: Generates embeddings that capture the semantic meaning of text.