## Building a Retrieval-Augmented Generation (RAG) System with LangChain

### Introduction

In this notebook, we will learn how to build a Retrieval-Augmented Generation (RAG) system using LangChain in Python. RAG systems combine information retrieval and natural language generation to produce answers that are grounded in external knowledge bases. This approach is particularly useful when dealing with large documents or datasets where direct querying isn’t efficient or possible.

### Objectives

- Understand the concept of Retrieval-Augmented Generation (RAG).
- Learn how to use LangChain to implement a RAG system.
- Implement the system step by step with guided TODO tasks.
- Test your implementation at each step.
- Provide helpful explanations and definitions.

Help

### Methods Used:

- LangChain: A library for building language model applications.
- VectorStore (FAISS): A tool for efficient similarity search and clustering of dense vectors.
- OpenAI Embeddings: Representations of text that can capture semantic meaning.
- RetrievalQA Chain: Combines retrieval and question-answering over documents.

### Data Used

- I extracted some chapters of the Gen AI course as a txt file. 
- The goal how this notebook is to build a RAG system that can answer questions based on the content of these chapters.

## Step 1: Set Up Your Environment

We need to import the required modules and set up the OpenAI API key.

In [9]:
# Import necessary libraries
import sys
from dotenv import load_dotenv
from langchain import OpenAI, hub
from langchain_openai import OpenAIEmbeddings
from langchain_openai.chat_models import ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.documents.base import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from typing import List
import os
from langchain_google_genai import ChatGoogleGenerativeAI


In [2]:
load_dotenv(dotenv_path='C:/IAGen/exercices/.env')

sys.path.append("../")

In [3]:
import os
google_api_key = os.getenv("GOOGLE_API_KEY")
db_password = os.getenv("DB_PASSWORD")

In [4]:
if google_api_key is None or db_password is None:
    print("Error: One or more environment variables are not loaded.")
else:
    print("Environment variables loaded successfully.")

Environment variables loaded successfully.


## Step 2: Load and Split Documents

Load the document you want to use and split it into manageable chunks.

In [6]:
# TODO: Load your document and split it into chunks
# Hint: Use TextLoader and RecursiveCharacterTextSplitter

filename = r"C:\IAGen\data\gen_ai_course.txt"
# Answer:
loader = TextLoader(filename, encoding='utf-8')
documents = loader.load()


# Answer
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

## Step 3: Create Embeddings and Build the VectorStore

Generate embeddings for each chunk and store them in a vector store for efficient retrieval.

In [8]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vectorstore = FAISS.from_documents(docs, embeddings)

## Step 4: Set Up the QA Chain using LCEL 

Create a chain that can retrieve relevant chunks and generate answers based on them.

In [11]:
# TODO: Create a RetrievalQA chain
# Hint: Use ChatOpenAI, create a prompt, and use StrOutputParser
# Hint: The chain should be an LCEL chain https://python.langchain.com/v0.1/docs/expression_language/get_started/

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)



prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a question answering chatbot. 
            You'll say if you don't know. 
            You'll find the relevant information in {formatted_docs}. 
            Answer in at most 5 sentences.""",
        ),
        ("human", "{query}"),
    ]
)


def format_docs(docs: List[Document]):
    return "\n\n".join(doc.page_content for doc in docs)


question = "How to train an LLM ?"


formatted_docs = format_docs(docs)

# Answer:
qa_chain = prompt | llm
answer = qa_chain.invoke(
    {
        "input_language": "English" or "French",
        "output_language": "English",
        "query": question,
        "formatted_docs": formatted_docs,
    }
).content

print(f"Question: {question}\nAnswer: {answer}")

Question: How to train an LLM ?
Answer: LLM training involves two phases: pre-training and post-training. Pre-training uses a massive text dataset and cross-entropy loss to predict the next word in a sequence, teaching the model language structure and semantics. Post-training refines the model's ability to follow instructions and align with human preferences.  Supervised fine-tuning (SFT) uses human-written examples to guide the model's responses. Reinforcement learning from human feedback (RLHF) further improves alignment by training a reward model based on human preferences and using it to optimize the LLM's output. Direct preference optimization (DPO) is a newer, simpler alternative to RLHF that directly optimizes the model based on human preferences.



## Step 5: Ask Questions and Get Answers

Test the system by asking a question.

In [13]:
# TODO: Ask a question to the QA chain
# Replace 'Your question here' with an actual question and run the qa_chain for this question

# Answer:
query = "What's an encoder? "
result = qa_chain.invoke(
    {
        "query": query,
        "formatted_docs": formatted_docs,
    }
).content
print(f"Question: {query}\nAnswer: {result}")
print(result)

Question: What's an encoder 
Answer: An encoder is a component of a transformer model that processes input sequences, such as text or images, and transforms them into a contextualized representation. It uses self-attention mechanisms to capture relationships between different parts of the input sequence, allowing the model to understand the overall meaning and context.  The encoder's output is a sequence of vectors, where each vector represents a part of the input and its relationship to other parts. In tasks like machine translation, the encoder's output is then passed to a decoder, which generates the translated sequence.  In other tasks, like text classification, the encoder's output can be directly used for prediction.

An encoder is a component of a transformer model that processes input sequences, such as text or images, and transforms them into a contextualized representation. It uses self-attention mechanisms to capture relationships between different parts of the input sequenc

## Step 6: Test Your Implementation with Different Questions

Try out different questions to see how the system performs.

In [16]:
# Replace 'Another question here' with your own question and run the qa_chain for this question

query = "who is Florian Bastin ?"
result = qa_chain.invoke(
    {
        "query": query,
        "formatted_docs": formatted_docs,
    }
)
print(result.content)

Florian Bastin holds a Master MASH degree from Université PSL and works as an LLM Engineer at OctoTechnology.  He has collaborated with prominent organizations such as Le Monde, Casino, Channel, Club Med, Pernod Ricard, and Suez.  His presentation, "Generative AI with LLM," covers topics including building and fine-tuning large language models, transformers, retrieval augmented generation, and generative AI in vision.  He discusses pre-training and post-training phases, including supervised fine-tuning and RLHF, along with various optimization techniques.



## Step 7: Improve the System

You can experiment with different parameters, like adjusting the chunk size or using a different language model.

Conclusion

Congratulations! You’ve built a simple Retrieval-Augmented Generation system using LangChain. This system can retrieve relevant information from documents and generate answers to user queries.

Help

- TextLoader: Loads text data from files.
- RecursiveCharacterTextSplitter: Splits text into smaller chunks for better processing.
- FAISS: A library for efficient similarity search of embeddings.
- RetrievalQA Chain: A chain that retrieves relevant documents and answers questions based on them.
- OpenAIEmbeddings: Generates embeddings that capture the semantic meaning of text.

## Help

In [9]:
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate([
    ("system", "You are a helpful AI bot. Your name is {name}."),
    ("human", "Hello, how are you doing?"),
    ("ai", "I'm doing well, thanks!"),
    ("human", "{user_input}"),
])

prompt_value = template.invoke(
    {
        "name": "Bob",
        "user_input": "What is your name?"
    }
)

# Output:
# ChatPromptValue(
#    messages=[
#        SystemMessage(content='You are a helpful AI bot. Your name is Bob.'),
#        HumanMessage(content='Hello, how are you doing?'),
#        AIMessage(content="I'm doing well, thanks!"),
#        HumanMessage(content='What is your name?')
#    ]
#)

messages=[SystemMessage(content='You are a helpful AI bot. Your name is Bob.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Hello, how are you doing?', additional_kwargs={}, response_metadata={}), AIMessage(content="I'm doing well, thanks!", additional_kwargs={}, response_metadata={}), HumanMessage(content='What is your name?', additional_kwargs={}, response_metadata={})]
