## Building a Retrieval-Augmented Generation (RAG) System with LangChain

### Introduction

In this notebook, we will learn how to build a Retrieval-Augmented Generation (RAG) system using LangChain in Python. RAG systems combine information retrieval and natural language generation to produce answers that are grounded in external knowledge bases. This approach is particularly useful when dealing with large documents or datasets where direct querying isn’t efficient or possible.

### Objectives

- Understand the concept of Retrieval-Augmented Generation (RAG).
- Learn how to use LangChain to implement a RAG system.
- Implement the system step by step with guided TODO tasks.
- Test your implementation at each step.
- Provide helpful explanations and definitions.

Help

### Methods Used:

- LangChain: A library for building language model applications.
- VectorStore (FAISS): A tool for efficient similarity search and clustering of dense vectors.
- OpenAI Embeddings: Representations of text that can capture semantic meaning.
- RetrievalQA Chain: Combines retrieval and question-answering over documents.

### Data Used

- I extracted some chapters of the Gen AI course as a txt file. 
- The goal how this notebook is to build a RAG system that can answer questions based on the content of these chapters.

## Step 1: Set Up Your Environment

We need to import the required modules and set up the OpenAI API key.

In [3]:
# Import necessary libraries
import sys
from dotenv import load_dotenv
from langchain import hub
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.vectorstores import FAISS
from langchain.chains import LLMChain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.documents.base import Document
from langchain_core.prompts import ChatPromptTemplate
from typing import List

In [4]:
load_dotenv()
sys.path.append("../")

## Step 2: Load and Split Documents

Load the document you want to use and split it into manageable chunks.

In [5]:
# TODO: Load your document and split it into chunks
# Hint: Use TextLoader and RecursiveCharacterTextSplitter


filename = "../data/gen_ai_course.txt"
# Answer:
loader = TextLoader(filename, encoding='utf-8')
documents = loader.load()

# Answer
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)


## Step 3: Create Embeddings and Build the VectorStore

Generate embeddings for each chunk and store them in a vector store for efficient retrieval.

In [6]:
import os
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("GOOGLE_API_KEY")


In [8]:
# TODO: Create embeddings and store them in a VectorStore
# Hint: FAISS
# Hint : Use GoogleGenerativeAIEmbeddings(model=...)
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vectorstore = FAISS.from_documents(documents=docs, embedding=embeddings)


## Step 4: Set Up the QA Chain using LCEL 

Create a chain that can retrieve relevant chunks and generate answers based on them.

In [20]:
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro")
# Initialize ChatGoogleGenerativeAI with the required arguments

#Create a function to format documents for the prompt
def format_docs(docs: List[Document]):
    # Hint: Join the content of each document
    return "\n".join(doc.page_content for doc in docs) # Join the page content of docs into a stringg

# Hint: Define the prompt template with system and human messages. See help below

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an AI assistant that provides accurate answers based on these documents {documents}."),
    ("human", "Question: {question}"),
])

# Hint: Format the documents using the function above
formatted_docs = format_docs(docs) 

# Hint: Create the QA chain by combining the prompt and model
qa_chain = LLMChain(prompt=prompt, llm=llm)


## Step 5: Ask Questions and Get Answers

Test the system by asking a question.

In [13]:
# TODO: Ask a question to the QA chain
# Replace 'Your question here' with an actual question and run the qa_chain for this question

# Answer:
query = "What is the main topic discussed in the document?"
result = qa_chain.run(documents=formatted_docs, question=query)
print(result)


The document provides a comprehensive overview of Generative AI with Large Language Models (LLMs). It covers various aspects, including the architecture of transformers, training processes for LLMs (pretraining and fine-tuning), Retrieval Augmented Generation (RAG), and the use of tools and agents with LLMs.



In [14]:
query = "What's a llm?"
result = qa_chain.run(documents=formatted_docs, question=query)
print(result)

The document defines Language Models (LMs) as probability distributions over a sequence of words, denoted as p(x₁, …, xₙ).  They assign probabilities to sequences of words, reflecting syntactic and semantic knowledge.  Large Language Models (LLMs) are a subset of LMs, typically larger and more powerful, trained on massive datasets, and capable of generating human-quality text, translating languages, writing different kinds of creative content, and answering your questions in an informative way.  They function as generative models, predicting the next word in a sequence based on the preceding words (autoregressive).



## Step 6: Test Your Implementation with Different Questions

Try out different questions to see how the system performs.

In [16]:
# Replace 'Another question here' with your own question and run the qa_chain for this question

query = "Can you summarize the key points ?"
result = qa_chain.run(documents=formatted_docs, question=query)
print(result)

The document provides a comprehensive overview of Generative AI with Large Language Models (LLMs), covering the following key areas:

* **Building LLMs:** This involves two phases: pre-training (training on a massive dataset to predict the next word in a sequence, using cross-entropy loss and techniques like byte-pair encoding for tokenization) and post-training (fine-tuning the model for specific tasks using supervised fine-tuning or reinforcement learning from human feedback (RLHF)).  Scaling laws dictate the optimal model and data size based on available compute resources.  Cost and optimization strategies are crucial for training large models.  RLHF, including reward models and algorithms like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO), aligns the model with human preferences.  Evaluation is done using datasets like IFEval, BBH, MMLU-Pro, and others.

* **Transformers:** The core architecture of LLMs, transformers rely on self-attention and cross-at

In [17]:
query = "What are the two main phases of building LLMs ?"
result = qa_chain.run(documents=formatted_docs, question=query)
print(result)

The two main phases of building LLMs like GPT-3 are **pre-training** and **post-training**.



In [18]:
query = "What is the role of self-attention and cross-attention in transforme?"
result = qa_chain.run(documents=formatted_docs, question=query)
print(result)

Self-attention allows each word in a sequence to consider its relationship with every other word in the *same* sequence, helping the model understand the context and relationships within the input.  Cross-attention, used in encoder-decoder models like those for translation, allows each word in the decoder sequence to attend to every word in the *encoder* sequence, helping the model understand how the input and output sequences relate to each other.  In simpler terms, self-attention looks within a single sequence, while cross-attention looks at two different sequences.



## Step 7: Improve the System

You can experiment with different parameters, like adjusting the chunk size or using a different language model.

In [19]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=100)
docs = text_splitter.split_documents(documents) 
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vectorstore = FAISS.from_documents(documents=docs, embedding=embeddings)


In [21]:

query = "Can you summarize the key points ?"
result = qa_chain.run(documents=formatted_docs, question=query)
print(result)

The document covers a wide range of topics related to Large Language Models (LLMs), from their underlying architecture and training process to advanced techniques like Retrieval Augmented Generation (RAG) and the use of tools and agents.  Here's a summary of the key points:

**Building LLMs:**

* **Architecture:**  Modern LLMs are primarily based on the Transformer architecture, which utilizes self-attention mechanisms to process sequential data efficiently, overcoming limitations of previous recurrent models like RNNs and LSTMs.  Key components include multi-head attention, residual connections, layer normalization, feed-forward layers, a softmax layer, and positional embeddings.
* **Training:**  Training involves a pre-training phase focused on language modeling (predicting the next word in a sequence) using a massive dataset, followed by a post-training phase to align the model with human preferences.  Techniques like Supervised Fine-Tuning (SFT), Reinforcement Learning from Human F

Conclusion

Congratulations! You’ve built a simple Retrieval-Augmented Generation system using LangChain. This system can retrieve relevant information from documents and generate answers to user queries.

Help

- TextLoader: Loads text data from files.
- RecursiveCharacterTextSplitter: Splits text into smaller chunks for better processing.
- FAISS: A library for efficient similarity search of embeddings.
- RetrievalQA Chain: A chain that retrieves relevant documents and answers questions based on them.
- OpenAIEmbeddings: Generates embeddings that capture the semantic meaning of text.

## Help

In [14]:
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate([
    ("system", "You are a helpful AI bot. Your name is {name}."),
    ("human", "Hello, how are you doing?"),
    ("ai", "I'm doing well, thanks!"),
    ("human", "{user_input}"),
])

prompt_value = template.invoke(
    {
        "name": "Bob",
        "user_input": "What is your name?"
    }
)

# Output:
# ChatPromptValue(
#    messages=[
#        SystemMessage(content='You are a helpful AI bot. Your name is Bob.'),
#        HumanMessage(content='Hello, how are you doing?'),
#        AIMessage(content="I'm doing well, thanks!"),
#        HumanMessage(content='What is your name?')
#    ]
#)