## Building a Retrieval-Augmented Generation (RAG) System with LangChain

### Introduction

In this notebook, we will learn how to build a Retrieval-Augmented Generation (RAG) system using LangChain in Python. RAG systems combine information retrieval and natural language generation to produce answers that are grounded in external knowledge bases. This approach is particularly useful when dealing with large documents or datasets where direct querying isn’t efficient or possible.

### Objectives

- Understand the concept of Retrieval-Augmented Generation (RAG).
- Learn how to use LangChain to implement a RAG system.
- Implement the system step by step with guided TODO tasks.
- Test your implementation at each step.
- Provide helpful explanations and definitions.

Help

### Methods Used:

- LangChain: A library for building language model applications.
- VectorStore (FAISS): A tool for efficient similarity search and clustering of dense vectors.
- OpenAI Embeddings: Representations of text that can capture semantic meaning.
- RetrievalQA Chain: Combines retrieval and question-answering over documents.

### Data Used

- I extracted some chapters of the Gen AI course as a txt file. 
- The goal how this notebook is to build a RAG system that can answer questions based on the content of these chapters.

## Step 1: Set Up Your Environment

We need to import the required modules and set up the OpenAI API key.

In [1]:
# Import necessary libraries
import sys
from dotenv import load_dotenv
from langchain import OpenAI, hub
from langchain_openai import OpenAIEmbeddings
from langchain_openai.chat_models import ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.documents.base import Document
from langchain_core.prompts import ChatPromptTemplate
from typing import List



In [21]:
import os
from dotenv import load_dotenv

load_dotenv()  # Charge les variables depuis .env
#print(os.getenv("GOOGLE_API_KEY"))  # Vérifiez si la clé est bien chargée
sys.path.append("../")

In [22]:
load_dotenv()
sys.path.append("../")

## Step 2: Load and Split Documents

Load the document you want to use and split it into manageable chunks.

In [23]:
# Import necessary classes
from langchain.document_loaders import TextLoader

# Load the document
filename = "../data/gen_ai_course.txt"
loader = TextLoader(filename, encoding="utf-8")
documents = loader.load()

# Split the document into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

# Optionally, print the first chunk to verify
print(docs[0].page_content[:500])  # Show the first 500 characters of the first chunk



Generative AI with LLM
Florian Bastin
👨🏼‍🎓 Master MASH - Université PSL
👨🏼‍💻 LLM Engineer @OctoTechnology
Le Monde, Casino, Channel, Club Med, Pernod Ricard, Suez
1

2
Module Overview:

Building Large Language Models
Transformers
Retrieval Augmented Generation
Tools and Agents
Fine tuning and optimization techniques
Generative AI in vision




3
Transformers
A. Before Transformers 
N grams
Embeddings
RNN 
LSTM


## Step 3: Create Embeddings and Build the VectorStore

Generate embeddings for each chunk and store them in a vector store for efficient retrieval.

In [24]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
# Generate embeddings for each document chunk
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
embeddings_list = embeddings.embed_documents([doc.page_content for doc in docs])

# Build the FAISS vector store from the embeddings
vectorstore = FAISS.from_documents(docs, embeddings)

## Step 4: Set Up the QA Chain using LCEL 

Create a chain that can retrieve relevant chunks and generate answers based on them.

In [25]:
# TODO: Create a RetrievalQA chain
# Hint: Use ChatOpenAI, create a prompt, and use StrOutputParser
# Hint: The chain should be an LCEL chain https://python.langchain.com/v0.1/docs/expression_language/get_started/

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)


prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a question answering chatbot. 
            You'll say if you don't know. 
            You'll find the relevant information in {formatted_docs}. 
            Answer in at most 5 sentences.""",
        ),
        ("human", "{query}"),
    ]
)


def format_docs(docs: List[Document]):
    return "\n\n".join(doc.page_content for doc in docs)


question = "What is Generative AI"


formatted_docs = format_docs(docs)

qa_chain = prompt | llm
answer = qa_chain.invoke(
    {
        "input_language": "English",
        "output_language": "English",
        "query": question,
        "formatted_docs": formatted_docs,
    }
).content

print(f"Question: {question}\nAnswer: {answer}")

Question: What is Generative AI
Answer: Generative AI refers to a category of artificial intelligence algorithms that can create various types of content, including text, images, audio, and synthetic data.  These models learn the underlying patterns and structure of their input training data and then generate new data that has similar characteristics.  Generative AI is used in a wide range of applications, from creating art and music to writing articles and generating code.  It relies heavily on techniques like deep learning, particularly generative adversarial networks (GANs) and transformer models.



## Step 5: Ask Questions and Get Answers

Test the system by asking a question.

In [26]:

# TODO: Ask a question to the QA chain
# Replace 'Your question here' with an actual question and run the qa_chain for this question

# Answer:
query = "What is Attention Mechanism"
result = qa_chain.invoke(
    {
        "query": query,
        "formatted_docs": formatted_docs,
    }
).content
print(f"Question: {query}\nAnswer: {result}")

Question: What is Attention Mechanism
Answer: The attention mechanism in Transformers allows the model to focus on different parts of the input sequence when generating each word of the output sequence.  It does this by calculating a weighted sum of the input values, where the weights are determined by the relevance of each input word to the current output word being generated.  This relevance is determined by comparing a query vector (representing the current output word) with key vectors (representing each input word), and then using a softmax function to normalize the resulting scores into weights.  This allows the model to prioritize the most relevant input words when generating the output.



## Step 6: Test Your Implementation with Different Questions

Try out different questions to see how the system performs.

In [27]:
# Replace 'Another question here' with your own question and run the qa_chain for this question

query = "Can you expalain the difference between Self Attention and Multi Head Attention"
result = qa_chain.invoke(
    {
        "query": query,
        "formatted_docs": formatted_docs,
    }
)
print(result.content)

Self-attention calculates relationships between different parts of a single input sequence to better understand the overall context.  Multi-head attention runs self-attention multiple times in parallel with different learned linear projections (heads).  Each head focuses on different aspects of the input, allowing the model to capture a richer representation. The results from each head are then concatenated and linearly transformed to produce the final output.  This allows the model to jointly attend to information from different representation subspaces at different positions.  In summary, multi-head attention enhances self-attention by allowing the model to learn diverse relationships within the input sequence.



## Step 7: Improve the System

You can experiment with different parameters, like adjusting the chunk size or using a different language model.

Conclusion

Congratulations! You’ve built a simple Retrieval-Augmented Generation system using LangChain. This system can retrieve relevant information from documents and generate answers to user queries.

Help

- TextLoader: Loads text data from files.
- RecursiveCharacterTextSplitter: Splits text into smaller chunks for better processing.
- FAISS: A library for efficient similarity search of embeddings.
- RetrievalQA Chain: A chain that retrieves relevant documents and answers questions based on them.
- OpenAIEmbeddings: Generates embeddings that capture the semantic meaning of text.

## Help

In [28]:
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate([
    ("system", "You are a helpful AI bot. Your name is {name}."),
    ("human", "Hello, how are you doing?"),
    ("ai", "I'm doing well, thanks!"),
    ("human", "{user_input}"),
])

prompt_value = template.invoke(
    {
        "name": "Bob",
        "user_input": "What is your name?"
    }
)

# Output:
# ChatPromptValue(
#    messages=[
#        SystemMessage(content='You are a helpful AI bot. Your name is Bob.'),
#        HumanMessage(content='Hello, how are you doing?'),
#        AIMessage(content="I'm doing well, thanks!"),
#        HumanMessage(content='What is your name?')
#    ]
#)