# ðŸ§  Simplest RAG â€” Concept Notebook

This notebook explains the minimal steps involved in building a Retrieval-Augmented Generation (RAG) system.

---

## 1. High-Level Idea

RAG enhances an LLM by giving it **external knowledge** retrieved from your documents.

A minimal RAG workflow has three steps:

1. **Create a vector database** from your documents (PDFs, text, etc.)
2. **Retrieve relevant chunks** using a retriever
3. **Feed retrieved chunks + query** into the LLM to generate the final answer

---

## 2. RAG Flow Summary

**Step 1 â€” Build the Vectorstore**  
- Load documents  
- Split into chunks  
- Convert each chunk into embeddings  
- Store embeddings in a vector database  

**Step 2 â€” Retrieve**  
- Convert user query into an embedding  
- Search the vector database  
- Retrieve top-k relevant chunks  

**Step 3 â€” Generate**  
- Combine: *original query + retrieved chunks*  
- Pass to LLM  
- Produce final grounded answer  

---

In [39]:
import os 
from dotenv import load_dotenv

load_dotenv()

# setup openai environment
hf_token = os.getenv('HF_TOKEN')
os.environ['HF_TOKEN'] = hf_token

---
#### ðŸ”¹ Using an Open-Source LLM (Llama 3.2 with Ollama)

In this notebook, we will use the fully open-source model **`llama3.2`** through **Ollama**.

##### **1. Pull the Model**
Before running the notebook, pull the model in your terminal:

```bash
ollama pull llama3.2


In [8]:
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="llama3.2",
    temperature=0,
    verbose=True
)

In [9]:
llm.invoke(input="Hey Hii").content

'Hi! How can I assist you today?'

---

## Step1 : Load PDF data, get chunks, make a vector store

In [36]:
# we can use loaders provided by langchain itself
import langchain
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("../data/Understanding_Climate_Change.pdf")
documents = loader.load()

print(f"Total docs we get : {len(documents)}")

Total docs we get : 33


In [35]:
# Now we'll create a text_splitter that slit these documents into chunks
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
# get chunks 
chunks = text_splitter.split_documents(documents=documents)

print(f"Total Chunks we have are : {len(chunks)}")
print(f" ")
print(f"Chunk 1 : {chunks[0]}")

Total Chunks we have are : 170
 
Chunk 1 : page_content='Understanding Climate Change 
Chapter 1: Introduction to Climate Change 
Climate change refers to significant, long-term changes in the global climate. The term 
"global climate" encompasses the planet's overall weather patterns, including temperature, 
precipitation, and wind patterns, over an extended period. Over the past century, human 
activities, particularly the burning of fossil fuels and deforestation, have significantly 
contributed to climate change. 
Historical Context' metadata={'producer': 'MicrosoftÂ® Word 2021', 'creator': 'MicrosoftÂ® Word 2021', 'creationdate': '2024-07-13T20:17:34+03:00', 'author': 'Nir', 'moddate': '2024-07-13T20:17:34+03:00', 'source': '../data/Understanding_Climate_Change.pdf', 'total_pages': 33, 'page': 0, 'page_label': '1'}


#### **Emneddings model**

Hugging Face sentence-transformers is a Python framework for <br>state-of-the-art sentence, text and image embeddings. You can use these embedding models <br>from the HuggingFaceEmbeddings class.

In [43]:
## embedding model we'll use is 
from langchain_huggingface import HuggingFaceEmbeddings
from huggingface_hub import login
import time 

# login huggingface
login(token=hf_token)

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

text = "This is a test document."

start = time.time()
query_result = embedding_model.embed_query(text)
total_time = time.time() - start
# show only the first 100 characters of the stringified vector
print(f"Length of text embedding : {len(text)}")
print(f"Time taken to convert text to embedding : {total_time :.2f} sec")
print(str(query_result)[:100] + "...")

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


Length of text embedding : 24
Time taken to convert text to embedding : 0.13 sec
[-0.0383385606110096, 0.1234646886587143, -0.02864295430481434, 0.05365273356437683, 0.0088453618809...


In [44]:
## now we'll create a vectorstore
import faiss
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

embedding_dim = len(embedding_model.embed_query("hello world"))
index = faiss.IndexFlatL2(embedding_dim)

vector_store = FAISS(
    embedding_function=embedding_model,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

In [45]:
# adding text chunks to vector store 
vector_store.add_documents(documents=chunks)

['bc60dc51-a70d-4b02-83c4-528acece89a7',
 'dd6ffd5d-2eb9-4ad7-ac20-07f5ed3549e5',
 '5f908421-b4e0-4df9-92db-933ac6a4a95e',
 'f81f03e2-bc1f-49de-b83f-73cc5420e805',
 '58acb45c-295c-44dd-9eaf-f9c6c5473967',
 'e903131b-bf4f-4c3e-b77a-ee8276bbb53d',
 '4e2b5c8e-b771-4d8e-bde2-631f6619f5f6',
 '97ee8c2d-a44e-4199-98fe-aca7343c69a8',
 '07c2c036-1dc4-4843-a1d3-f9131b77a271',
 '0911c49b-ff9b-4ab8-ae40-1ea65f3d1a81',
 'ed0b915a-0151-49c2-b7bc-d5d658135b32',
 'a6e2458f-c6c3-429d-beb8-bd28fe2499a8',
 '1acbf71f-8bc3-4f1c-bf08-fba9f8085e88',
 '6ba5b4f7-cc89-48f2-b787-dd219204b732',
 '3e9459df-8b1b-4891-aa67-98a43ede78c3',
 '84d371a6-b12b-4e2b-a0ab-1dd9f0791afb',
 '67de9d8d-0142-4d31-b437-b5a0c9ea56f6',
 '0e9da279-7985-4e37-99e6-63b4b8413a25',
 '5f8ee286-0a75-4321-adfb-2b9800cf1e1c',
 '675fbcb4-c275-45e4-b57b-4736b5740768',
 '7ae11321-38e2-4884-96b2-558024956cd5',
 '6b4dd00f-746c-45f3-acf3-de6ffb0fa3c8',
 'b381d5ab-995e-44c6-aa1f-4942b512d43b',
 '18fa7ed1-b522-40fc-8458-3c3f7c561ad9',
 'c52f1ba0-8ca6-

## Step2. Make a retriever

In [46]:
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 2})

context = retriever.invoke("What is Climate Change?")

In [51]:
for i, doc in enumerate(context):
    print(f"Doc : {i+1}")
    print(f"Content : {doc}")
    print("-"*89)

Doc : 1
Content : page_content='Understanding Climate Change 
Chapter 1: Introduction to Climate Change 
Climate change refers to significant, long-term changes in the global climate. The term 
"global climate" encompasses the planet's overall weather patterns, including temperature, 
precipitation, and wind patterns, over an extended period. Over the past century, human 
activities, particularly the burning of fossil fuels and deforestation, have significantly 
contributed to climate change. 
Historical Context' metadata={'producer': 'MicrosoftÂ® Word 2021', 'creator': 'MicrosoftÂ® Word 2021', 'creationdate': '2024-07-13T20:17:34+03:00', 'author': 'Nir', 'moddate': '2024-07-13T20:17:34+03:00', 'source': '../data/Understanding_Climate_Change.pdf', 'total_pages': 33, 'page': 0, 'page_label': '1'}
-----------------------------------------------------------------------------------------
Doc : 2
Content : page_content='Adaptation Planning 
Adaptation planning involves assessing climate ris

## Step3. Use LLM to augment + generate a response from query and context

In [80]:
from langchain_ollama import ChatOllama

# this is LLM we'll use
llm = ChatOllama(
    model="llama3.2",
    temperature=0,
    verbose=True
)

# we need a system prompt that guides our LLM to use context and generate response to query
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate

# Set up system prompt
prompt = PromptTemplate.from_template("""
Use the context below to answer:

Context:
{context}

query: {query}
""")
# get retrived context
def get_context(retriever, query):
    docs = retriever.invoke(query)
    context = ""
    for doc in docs:
        context += doc.page_content
        context += "/n"
    return context

chain = prompt | llm

In [82]:
from pprint import pprint

if __name__ == "__main__":
    query = input("Write your query : ")
    context = get_context(retriever, query)
    print(f"Query : {query}")
    response = chain.invoke({'query' : query, 'context' : context})
    pprint(f"Response : {response.content}")

Query : What is climate change?
('Response : Climate change refers to significant, long-term changes in the '
 "global climate, encompassing the planet's overall weather patterns, "
 'including temperature, precipitation, and wind patterns, over an extended '
 'period.')
