## Building a Retrieval-Augmented Generation (RAG) System with LangChain

### Introduction

In this notebook, we will learn how to build a Retrieval-Augmented Generation (RAG) system using LangChain in Python. RAG systems combine information retrieval and natural language generation to produce answers that are grounded in external knowledge bases. This approach is particularly useful when dealing with large documents or datasets where direct querying isn’t efficient or possible.

### Objectives

- Understand the concept of Retrieval-Augmented Generation (RAG).
- Learn how to use LangChain to implement a RAG system.
- Implement the system step by step with guided TODO tasks.
- Test your implementation at each step.
- Provide helpful explanations and definitions.

Help

### Methods Used:

- LangChain: A library for building language model applications.
- VectorStore (FAISS): A tool for efficient similarity search and clustering of dense vectors.
- OpenAI Embeddings: Representations of text that can capture semantic meaning.
- RetrievalQA Chain: Combines retrieval and question-answering over documents.

### Data Used

- I extracted some chapters of the Gen AI course as a txt file. 
- The goal how this notebook is to build a RAG system that can answer questions based on the content of these chapters.

## Step 1: Set Up Your Environment

We need to import the required modules and set up the OpenAI API key.

In [1]:
# Import necessary libraries
import sys
from dotenv import load_dotenv
from langchain import hub
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.documents.base import Document
from langchain_core.prompts import ChatPromptTemplate
from typing import List

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
load_dotenv()
sys.path.append("../")

## Step 2: Load and Split Documents

Load the document you want to use and split it into manageable chunks.

In [3]:


filename = "../data/gen_ai_course.txt"
loader = TextLoader(filename)
documents = loader.load()

# Answer
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

## Step 3: Create Embeddings and Build the VectorStore

Generate embeddings for each chunk and store them in a vector store for efficient retrieval.

In [4]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vectorstore = FAISS.from_documents(docs, embeddings)

## Step 4: Set Up the QA Chain using LCEL 

Create a chain that can retrieve relevant chunks and generate answers based on them.

In [None]:
# TODO: Create a RetrievalQA chain
# Hint: Use ChatOpenAI, create a prompt, and use StrOutputParser
# Hint: The chain should be an LCEL chain https://python.langchain.com/v0.1/docs/expression_language/get_started/

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    max_tokens=None,
    timeout=20,
    max_retries=2,
)

def format_docs(docs: List[Document]):
    return "\n\n".join(doc.page_content for doc in docs)

prompt = ChatPromptTemplate.from_messages(
    messages=[
        ("system", f"You are a question-answering chatbot. You must provide the answer in {{language}}."),
        ("human", f"The question is: {{question}}\n\nRelevant Information:\n{{formatted_docs}}")
    ]
)

formatted_docs = format_docs(docs)
qa_chain = prompt | llm


## Step 5: Ask Questions and Get Answers

Test the system by asking a question.

In [6]:
def get_answer(question: str, language: str = "English") -> str:
    result = qa_chain.invoke({
        "language": language,
        "question": question,
        "formatted_docs": formatted_docs
    }).content
    return result

In [7]:
query_1 = "What is the main topic discussed in the document?"
result_1 = get_answer(query_1)
print(f"Answer to query 1: {result_1}")


Answer to query 1: This document provides a comprehensive overview of Large Language Models (LLMs), focusing heavily on the **architecture and training of transformer-based models**.  It covers topics such as pre-training, fine-tuning techniques (including Supervised Fine Tuning and RLHF), tokenization, and evaluation methods.  Additionally, it discusses Retrieval Augmented Generation (RAG) and its variations, along with the use of tools and agents with LLMs.  While other topics like model optimization and a brief history of pre-transformer architectures are touched upon, the core subject remains the workings and training of transformer models within the broader context of generative AI.



## Step 6: Test Your Implementation with Different Questions

Try out different questions to see how the system performs.

In [8]:
query_2 = "Can you summarize the key points mentioned?"
result_2 = get_answer(query_2)
print(f"Answer to query 2: {result_2}")


Answer to query 2: The key points discussed include:

**Large Language Models (LLMs):**

* **Building LLMs:**  Involves pre-training (using cross-entropy loss, tokenization, data preprocessing, scaling laws) and post-training (fine-tuning with supervised fine-tuning, RLHF using reward models and PPO/DPO).  Cost and optimization are important considerations during training.
* **Evaluation:**  Uses datasets like IFEval, BBH, MMLU-Pro, and Math, covering diverse fields.  Contamination of training data is a concern.
* **Supervised Fine-Tuning (SFT):** Aligns LLMs to follow instructions and human preferences, addressing limitations of pre-training alone. Data collection for SFT can be scaled using LLMs.
* **Reinforcement Learning from Human Feedback (RLHF):** Improves alignment further by training a reward model based on human preferences for different generated answers.  PPO and DPO are used for training the RL model.  RLHF faces challenges like answer length inflation and human preference

## Step 7: Improve the System

You can experiment with different parameters, like adjusting the chunk size or using a different language model.

Conclusion

Congratulations! You’ve built a simple Retrieval-Augmented Generation system using LangChain. This system can retrieve relevant information from documents and generate answers to user queries.

Help

- TextLoader: Loads text data from files.
- RecursiveCharacterTextSplitter: Splits text into smaller chunks for better processing.
- FAISS: A library for efficient similarity search of embeddings.
- RetrievalQA Chain: A chain that retrieves relevant documents and answers questions based on them.
- OpenAIEmbeddings: Generates embeddings that capture the semantic meaning of text.

## Help

In [9]:
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate([
    ("system", "You are a helpful AI bot. Your name is {name}."),
    ("human", "Hello, how are you doing?"),
    ("ai", "I'm doing well, thanks!"),
    ("human", "{user_input}"),
])

prompt_value = template.invoke(
    {
        "name": "Bob",
        "user_input": "What is your name?"
    }
)

# Output:
# ChatPromptValue(
#    messages=[
#        SystemMessage(content='You are a helpful AI bot. Your name is Bob.'),
#        HumanMessage(content='Hello, how are you doing?'),
#        AIMessage(content="I'm doing well, thanks!"),
#        HumanMessage(content='What is your name?')
#    ]
#)

messages=[SystemMessage(content='You are a helpful AI bot. Your name is Bob.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Hello, how are you doing?', additional_kwargs={}, response_metadata={}), AIMessage(content="I'm doing well, thanks!", additional_kwargs={}, response_metadata={}), HumanMessage(content='What is your name?', additional_kwargs={}, response_metadata={})]
