# Building a RAG (Retrieval Augmented Generation) System

This notebook demonstrates how to build a RAG system from scratch using LangChain. We'll break it down into the following steps:

1. Setting up dependencies
2. Loading and processing documents
3. Creating embeddings
4. Setting up the vector store
5. Configuring the LLM
6. Creating the RAG chain
7. Asking questions

Let's get started!

## 1. Setting up dependencies

First, let's install the required packages:

In [1]:
%pip install langchain-community langchain-huggingface langchain_ollama chromadb sentence-transformers

Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
from typing import List, Dict

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_ollama import ChatOllama
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

## 2. Loading and Processing Documents

We'll create functions to load and process PDF documents:

In [4]:
def process_pdf(file_path: str) -> List[Dict]:
    """Load and process a PDF file"""
    # Load PDF
    loader = PyPDFLoader(file_path)
    pages = loader.load()
    
    # Split into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
    )
    
    splits = text_splitter.split_documents(pages)
    
    # Add source filename to metadata
    for split in splits:
        split.metadata["source"] = os.path.basename(file_path)
        
    return splits

# Example usage:
pdf_path = "/Users/shaonsikder/Downloads/AI Session/agent_learning/data/Meta-12.31.2022-Exhibit-99.1-FINAL.pdf"  # Replace with your PDF path
documents = process_pdf(pdf_path)
print(f"Processed {len(documents)} chunks from the PDF")

Processed 30 chunks from the PDF


## 3. Creating Embeddings

Now let's set up the embedding model:

In [5]:
# Initialize embedding model
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)

  from tqdm.autonotebook import tqdm, trange


## 4. Setting up the Vector Store

Create a vector store from our documents:

In [7]:
# Create vector store
vector_store = Chroma.from_documents(
    documents=documents,
    embedding=embeddings
)

In [8]:
# Test a simple similarity search
query = "What is this document about?"  # Replace with your test query
docs = vector_store.similarity_search(query)
print(f"Found {len(docs)} relevant chunks")

Found 4 relevant chunks


## 5. Configuring the LLM

Set up the language model (using Ollama in this case):

In [9]:
# Initialize Ollama
llm = ChatOllama(
    model="llama3.1",  # or your preferred model
    temperature=0
)

## 6. Creating the RAG Chain

Now let's create our RAG chain with a custom prompt:

In [10]:
# Create prompt template
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know.

Context: {context}

Question: {question}

Answer:"""

prompt = PromptTemplate(
    template=template,
    input_variables=["context", "question"]
)

# Create the RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_store.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt}
)

## 7. Asking Questions

Finally, let's test our RAG system by asking questions:

In [11]:
def ask_question(question: str):
    """Ask a question and display the answer with sources"""
    # Get response
    response = qa_chain.invoke({"query": question})
    
    print("Answer:")
    print(response["result"])
    print("\nSources:")
    for doc in response["source_documents"]:
        print(f"\nFrom {doc.metadata['source']} (Page {doc.metadata['page'] + 1}):")
        print(doc.page_content)

# Example usage
question = "What is this document about?"  # Replace with your question
ask_question(question)

Answer:
This document appears to be a press release from a company, likely a technology or media company, that contains forward-looking statements and financial information. It mentions non-GAAP financial measures and a reconciliation table, which suggests that the document is discussing the company's financial performance and future business plans.

Sources:

From Meta-12.31.2022-Exhibit-99.1-FINAL.pdf (Page 5):
intended to represent our residual cash flow available for discretionary expenditures. 
For more information on our non-GAAP financial measures and a reconciliation of GAAP to non-GAAP measures, please see the 
"Reconciliation of GAAP to Non-GAAP Results" table in this press release.
5

From Meta-12.31.2022-Exhibit-99.1-FINAL.pdf (Page 5):
intended to represent our residual cash flow available for discretionary expenditures. 
For more information on our non-GAAP financial measures and a reconciliation of GAAP to non-GAAP measures, please see the 
"Reconciliation of GAAP to Non

## Conclusion

You've now built a complete RAG system! Here's what we covered:

1. Loading and processing PDF documents
2. Creating embeddings using Hugging Face
3. Setting up a vector store with Chroma
4. Configuring an LLM with Ollama
5. Creating a RAG chain with LangChain
6. Asking questions and getting answers with sources

You can extend this system by:
- Adding support for different document types
- Using different embedding models
- Trying different LLMs
- Customizing the prompt template
- Adding error handling and logging