## Building a Retrieval-Augmented Generation (RAG) System with LangChain

### Introduction

In this notebook, we will learn how to build a Retrieval-Augmented Generation (RAG) system using LangChain in Python. RAG systems combine information retrieval and natural language generation to produce answers that are grounded in external knowledge bases. This approach is particularly useful when dealing with large documents or datasets where direct querying isn’t efficient or possible.

### Objectives

- Understand the concept of Retrieval-Augmented Generation (RAG).
- Learn how to use LangChain to implement a RAG system.
- Implement the system step by step with guided TODO tasks.
- Test your implementation at each step.
- Provide helpful explanations and definitions.

Help

### Methods Used:

- LangChain: A library for building language model applications.
- VectorStore (FAISS): A tool for efficient similarity search and clustering of dense vectors.
- RetrievalQA Chain: Combines retrieval and question-answering over documents.

### Data Used

- I extracted some chapters of the Gen AI course as a txt file. 
- The goal how this notebook is to build a RAG system that can answer questions based on the content of these chapters.

## Step 1: Set Up Your Environment

We first need to import all the necessary modules and set up the environment. The environment variable for the GOOGLE_API_KEY is loaded from the .env file to interact with Google’s Gemini model.

In [1]:
# Import necessary libraries
import sys
from dotenv import load_dotenv
from langchain import hub
from langchain_google_genai.chat_models import ChatGoogleGenerativeAI
from langchain_google_genai.embeddings import GoogleGenerativeAIEmbeddings
from langchain_google_genai.llms import GoogleGenerativeAI
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.documents.base import Document
from langchain_core.prompts import ChatPromptTemplate
from typing import List
import os

In [2]:
load_dotenv()
sys.path.append("../")

# Load the Google API key
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)


## Step 2: Load and Split Documents

Load the document you want to use and split it into manageable chunks.

In [3]:
# TODO: Load your document and split it into chunks
# Hint: Use TextLoader and RecursiveCharacterTextSplitter

filename = "../data/gen_ai_course.txt"
# Answer:
loader = TextLoader(filename, encoding="utf-8")
documents = loader.load()

# Answer
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

## Step 3: Create Embeddings and Build the VectorStore

Generate embeddings for each chunk and store them in a vector store for efficient retrieval.

We now generate embeddings for each chunk of text using GoogleGenerativeAIEmbedding, which is an effective and efficient way to create vector representations of text. The embeddings are then stored in a FAISS index for fast retrieval.

In [4]:
# Hint: Use OpenAIEmbeddings and FAISS
# Utiliser GoogleGenerativeAIEmbeddings pour créer les embeddings des documents
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Construire le VectorStore avec FAISS
vectorstore = FAISS.from_documents(docs, embeddings)

## Step 4: Set Up the QA Chain using LCEL 

Create a chain that can retrieve relevant chunks and generate answers based on them.

The next step is to set up a RetrievalQA chain that combines the information retrieval process with the language generation model. The chain will search for the most relevant documents and pass them to Gemini for question-answering.

In [5]:
# Création du prompt structuré avec ChatPromptTemplate
prompt: ChatPromptTemplate = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise."),
    ("human", "Question: {question}"),
    ("human", "Context: {context}"),
    ("ai", "Answer:")
])

# Fonction pour formater les documents récupérés en texte brut
def format_docs(docs: List[Document]):
    return "\n\n".join(doc.page_content for doc in docs)

# Fonction pour interroger Gemini avec les documents récupérés via FAISS
def query_to_gemini(query: str, top_k: int = 3):
    # Recherche des documents les plus proches dans le vectorstore FAISS
    docs_with_score = vectorstore.similarity_search(query, k=top_k)
    
    # docs_with_score contient maintenant une liste de documents (et non un couple doc, score)
    relevant_docs = docs_with_score  # Nous pouvons directement utiliser la liste de documents
    
    # Formater les documents pour le prompt
    formatted_docs = format_docs(relevant_docs)
    
    # Créer un prompt structuré avec les documents pertinents et la question
    formatted_prompt = prompt.format_messages(question=query, context=formatted_docs)
    
    # Interroger Gemini via LangChain pour obtenir une réponse
    ai_msg = llm.invoke(formatted_prompt)
    
    # Vérifier si le message renvoyé par Gemini contient la réponse
    if hasattr(ai_msg, 'content'):
        reponse= ai_msg.content
        reponse = reponse.replace(". ", ".\n")  # Remplacer chaque point suivi d'un espace par un saut de ligne
        return reponse
    else:
        return "Erreur : la réponse générée ne contient pas de texte valide."




## Step 5: Ask Questions and Get Answers

Test the system by asking a question.

In [6]:
# Answer:
query = "What is the main topic discussed in the document?"
result = query_to_gemini(query)
print(f"Réponse : {result}")

Réponse : The main topic is Retrieval Augmented Generation (RAG), a technique that uses retrieved information to improve model responses.
 The document discusses RAG's architecture, including information retrieval methods and filtering techniques, to enhance context and coherence.
 It also briefly touches upon TF-IDF and Elasticsearch for text search within the RAG framework.



## Step 6: Test Your Implementation with Different Questions

Try out different questions to see how the system performs.

In [7]:
query = "Can you summarize the key points mentioned?"
result = query_to_gemini(query)
print(f"Réponse : {result}")

Réponse : Retrieval Augmented Generation (RAG) uses a vector store of document content to improve model responses.
 Key improvements to RAG include context enrichment and multi-faceted filtering.
 Transformers use self/cross attention, multi-head attention, residual connection, layer normalization, a feed forward layer, softmax layer, and positional embeddings.



## Step 7: Improve the System

You can experiment with different parameters, like adjusting the chunk size or using a different language model.

#### Used Sentence-Transformers 

In this part we will Sentence-Transformers to create embeddings from text.

In [21]:
from sentence_transformers import SentenceTransformer

# Utiliser le modèle `all-MiniLM-L6-v2` qui est léger et efficace pour des embeddings de documents
model1 = SentenceTransformer('all-MiniLM-L6-v2')

# Générer les embeddings pour chaque document
embeddings1 = model1.encode([doc.page_content for doc in docs])

# Convertir les embeddings en numpy array pour FAISS
embeddings_np = np.array(embeddings1).astype('float32')

# Créer un index FAISS
index = faiss.IndexFlatL2(embeddings_np.shape[1])  # L2 distance (distance euclidienne)
index.add(embeddings_np)


# 4. Fonction pour interroger le modèle Gemini avec les documents récupérés
def query_to_gemini1(query: str, top_k: int = 3):
    # Générer l'embedding de la requête
    query_embedding = model1.encode([query]).astype('float32')

    # Recherche des documents les plus proches dans l'index FAISS
    D, I = index.search(query_embedding, top_k)

    # Récupérer les documents les plus proches
    relevant_docs = [docs[i] for i in I[0]]

    # Créer un prompt structuré avec les documents pertinents
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant that answers questions based on the provided documents."),
        ("user", f"Here are some documents: {format_docs(relevant_docs)}\n\nQuestion: {query}")
    ])

    # Formater le prompt pour l'API LangChain
    formatted_prompt = prompt.format_messages(query=query)

    # Interroger Gemini via LangChain pour obtenir une réponse
    ai_msg = llm.invoke(formatted_prompt)

    # Vérifier si le message renvoyé par Gemini contient la réponse
    if hasattr(ai_msg, 'content'):
        return ai_msg.content
    else:
        return "Erreur : la réponse générée ne contient pas de texte valide."
    
# 5. Fonction pour formater les documents à afficher dans le prompt Gemini
def format_docs(docs: List[Document]):
    return "\n\n".join(doc.page_content for doc in docs)

# Answer:
query = "What is the main topic discussed in the document?"
result = query_to_gemini(query)
print(f"result : {result}")

: 

In [None]:
query = "Can you summarize the key points mentioned?"
result = query_to_gemini(query)
print(f"Réponse : {result}")

Conclusion

Congratulations! You’ve built a simple Retrieval-Augmented Generation system using LangChain. This system can retrieve relevant information from documents and generate answers to user queries.

Help

- TextLoader: Loads text data from files.
- RecursiveCharacterTextSplitter: Splits text into smaller chunks for better processing.
- FAISS: A library for efficient similarity search of embeddings.
- RetrievalQA Chain: A chain that retrieves relevant documents and answers questions based on them.

## Help

In [17]:
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate([
    ("system", "You are a helpful AI bot. Your name is {name}."),
    ("human", "Hello, how are you doing?"),
    ("ai", "I'm doing well, thanks!"),
    ("human", "{user_input}"),
])

prompt_value = template.invoke(
    {
        "name": "Bob",
        "user_input": "What is your name?"
    }
)

# Output:
# ChatPromptValue(
#    messages=[
#        SystemMessage(content='You are a helpful AI bot. Your name is Bob.'),
#        HumanMessage(content='Hello, how are you doing?'),
#        AIMessage(content="I'm doing well, thanks!"),
#        HumanMessage(content='What is your name?')
#    ]
#)