<a href="https://colab.research.google.com/github/MumbuaFaithK/ai-and-data-projects/blob/main/Retrieval_Augmented_Generation_(RAG)_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Install & Import Libraries

In [3]:
# Install necessary libraries
!pip install -U langchain langchain-community transformers sentence-transformers faiss-cpu pypdf

# Imports
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

Collecting transformers
  Downloading transformers-4.54.1-py3-none-any.whl.metadata (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.7/41.7 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting sentence-transformers
  Downloading sentence_transformers-5.0.0-py3-none-any.whl.metadata (16 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.0 kB)
Collecting pypdf
  Downloading pypdf-5.9.0-py3-none-any.whl.metadata (7.1 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
 

# 2. Load and Preview the PDF

In [7]:
from google.colab import files

# Upload the document manually
uploaded = files.upload()
import os

# Rename uploaded file to 'document.pdf' for convenience
os.rename("The four pillars of effective communication design [Slides].pdf", "document.pdf")


## Load the Document
loader = PyPDFLoader("document.pdf")
docs = loader.load()
print(f"Loaded {len(docs)} pages.")
print(docs[0].page_content[:500])


Saving The four pillars of effective communication design [Slides].pdf to The four pillars of effective communication design [Slides] (2).pdf
Loaded 15 pages.
Please do not copy without permission. © ExploreAI 2023.
The four pillars of effective communication design
Design for impactful communication


# 3. Split the Document into Chunks

In [8]:
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
print(f"Total Chunks Created: {len(chunks)}")
print(chunks[0].page_content[:300])

Total Chunks Created: 22
Please do not copy without permission. © ExploreAI 2023.
The four pillars of effective communication design
Design for impactful communication


# 4. Create Embeddings & Vector Store

In [9]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever()

  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

  return forward_call(*args, **kwargs)


# 5. Load the LLM and Define RAG Query Function

In [11]:
model_name = "google/flan-t5-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
flan_pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

def query_rag(question):
    relevant_docs = retriever.get_relevant_documents(question)
    context = "\n".join([doc.page_content for doc in relevant_docs])
    prompt = f"Answer the question using only the context:\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"

    response = flan_pipeline(
        prompt,
        max_new_tokens=200,
        temperature=0.9,
        top_k=50,
        top_p=0.9,
        do_sample=True
    )
    return response[0]['generated_text']

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cpu


# 6. Sample Queries and Comparison

In [12]:
# Document-based answer
print("🔹 Answer from RAG:")
print(query_rag("Summarize the key points of this document in a paragraph of 200 words."))

# Generic answer without context (optional comparison)
no_context_prompt = "Summarize the key points of a document in a paragraph of 200 words."
print("\n🔸 Answer without document context:")
print(flan_pipeline(no_context_prompt, max_new_tokens=200)[0]['generated_text'])

🔹 Answer from RAG:


  relevant_docs = retriever.get_relevant_documents(question)
  return forward_call(*args, **kwargs)


The four pillars checklist is a checklist to ensure the visualisation is successful.

🔸 Answer without document context:
The following is a summary of the key points of the document.
