# Simple RAG

## Overview
This notebook demonstrates a basic Retrival-Augmented Generation (RAG) system for processing and querying PDF documents.
The system encodes the document content into a vector store, which can be queried to retrieve relevant information.

## Key Components
1. PDF processing and text extraction
2. Text chunking for manageable processing
3. Vector store creation using FAISS and OpenAI embeddings
4. Retriever setup for querying the processed documents
5. Evaluation of the RAG system

### Import libraries and environment variables

In [1]:
import os 
import sys
from dotenv import load_dotenv, find_dotenv
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), ".")))   # Add src directory to path
from pydantic import SecretStr

# Load environment variables from .env file
load_dotenv(find_dotenv())

DEEPSEEK_API_KEY: str = os.getenv("DEEPSEEK_API_KEY") or ""
OPENAI_API_KEY: str = os.getenv("OPENAI_API_KEY") or ""


In [2]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_deepseek import ChatDeepSeek
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS

from langchain_core.vectorstores import VectorStore
from langchain_core.documents.base import Document
from langchain_core.vectorstores.base import VectorStoreRetriever


### Feature Pipeline

In [3]:
file_path: str = "../data_samples/Understanding_Climate_Change.pdf"


In [4]:
def feature_pipeline(file_path: str, chunk_size: int = 1000, chunk_overlap: int = 200) -> VectorStore:
    # Load the PDF document
    loader = PyPDFLoader(file_path)
    documents: list[Document] = loader.load()

    # Split documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    chunks: list[Document] = text_splitter.split_documents(documents)
    print(f"Chunk length: {len(chunks)}")

    # Create embeddings
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key=SecretStr(OPENAI_API_KEY))

    # Create vector store
    vector_store: VectorStore = FAISS.from_documents(chunks, embeddings)

    return vector_store


In [5]:
vector_store: VectorStore = feature_pipeline(file_path)
print(f"Vector store created: {vector_store}")


Chunk length: 97
Vector store created: <langchain_community.vectorstores.faiss.FAISS object at 0x10efea480>


### Create Retriever

In [6]:
retriever: VectorStoreRetriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 2})


### Test Retriever

In [None]:
test_query: str = "What is the main cause of climate change?"
retrieved_docs: list[Document] = retriever.invoke(test_query)
for i, doc in enumerate(retrieved_docs):
    print(f"Document {i+1}:")
    print(doc.page_content)
    print("------------\n")


Document 1:
Chapter 2: Causes of Climate Change 
Greenhouse Gases 
The primary cause of recent climate change is the increase in greenhouse gases in the 
atmosphere. Greenhouse gases, such as carbon dioxide (CO2), methane (CH4), and nitrous 
oxide (N2O), trap heat from the sun, creating a "greenhouse effect." This effect is essential 
for life on Earth, as it keeps the planet warm enough to support life. However, human 
activities have intensified this natural process, leading to a warmer climate. 
Fossil Fuels 
Burning fossil fuels for energy releases large amounts of CO2. This includes coal, oil, and 
natural gas used for electricity, heating, and transportation. The industrial revolution marked 
the beginning of a significant increase in fossil fuel consumption, which continues to rise 
today. 
Coal
------------

Document 2:
Most of these climate changes are attributed to very small variations in Earth's orbit that 
change the amount of solar energy our planet receives. During the H

### Evaluation

In [7]:
from utils.evaluation import rag_evaluation

rag_evaluation(retriever=retriever, num_questions=1, eval_topic="climate change")


Evaluated question 1/1:
Question: Sure! Here’s a diverse test question about climate change:
Context: Understanding Climate Change 
Chapter 1: Introduction to Climate Change 
Climate change refers to significant, long-term changes in the global climate. The term 
"global climate" encompasses the planet's overall weather patterns, including temperature, 
precipitation, and wind patterns, over an extended period. Over the past century, human 
activities, particularly the burning of fossil fuels and deforestation, have significantly 
contributed to climate change. 
Historical Context 
The Earth's climate has changed throughout history. Over the past 650,000 years, there have 
been seven cycles of glacial advance and retreat, with the abrupt end of the last ice age about 
11,700 years ago marking the beginning of the modern climate era and human civilization. 
Most of these climate changes are attributed to very small variations in Earth's orbit that 
change the amount of solar energy our 