# RAG Mini Project

## Problem Statement

Build a Retrieval-Augmented Generation (RAG) system that answers user questions from a given knowledge source (e.g., PDF/TXT document) instead of relying only on the base language model.


## Dataset / Knowledge Source

- Type of data: PDF / TXT (update according to your file).
- Data source: Public / self-created (mention exact source, e.g., "Course notes PDF created by instructor").


## RAG Architecture

The RAG pipeline consists of:
1. Data loading (read PDF/TXT and extract raw text).
2. Text chunking (split document into overlapping chunks).
3. Embedding generation (convert chunks into dense vectors).
4. Vector store creation (index embeddings in a vector database).
5. Retrieval (find top-k similar chunks for a query).
6. Generation (LLM uses retrieved context to generate the final answer).

 block diagram:

[User Query] → [Retriever (Vector DB)] → [Top-k Chunks] → [LLM] → [Answer]

offline pipeline:

[Documents] → [Chunking] → [Embeddings] → [Vector Store]


In [1]:
# Basic imports
import os
import textwrap

# Data handling
from typing import List

# For PDF/TXT reading (install pdfplumber or PyPDF2 if needed)
!pip install --quiet pdfplumber sentence-transformers faiss-cpu

import pdfplumber

# Embeddings
from sentence_transformers import SentenceTransformer

# Vector store (FAISS)
import faiss
import numpy as np


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.9/67.9 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.6/6.6 MB[0m [31m64.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.8/23.8 MB[0m [31m71.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m95.6 MB/s[0m eta [36m0:00:00[0m
[?25h

## Data Loading


In [2]:
# Set your document path here
DOC_PATH = "/content/HBR_How_Apple_Is_Organized_For_Innovation-4.pdf"

def load_pdf(path: str) -> str:
    text = ""
    with pdfplumber.open(path) as pdf:
        for page in pdf.pages:
            page_text = page.extract_text() or ""
            text += page_text + "\n"
    return text

# If you use TXT instead of PDF:
def load_txt(path: str) -> str:
    with open(path, "r", encoding="utf-8") as f:
        return f.read()

# Choose loader according to your data type
if DOC_PATH.lower().endswith(".pdf"):
    raw_text = load_pdf(DOC_PATH)
else:
    raw_text = load_txt(DOC_PATH)

print("Characters in document:", len(raw_text))
print(raw_text[:1000])  # preview


Characters in document: 36089
REPRINT R2006F
PUBLISHED IN HBR
NOVEMBER–DECEMBER 2020
ARTICLE
ORGANIZATIONAL CULTURE
How Apple Is
Organized
for Innovation
It’s about experts leading experts.
by Joel M. Podolny and Morten T. Hansen
This article is made available to you with compliments of Apple Inc for your personal use. Further posting, copying or distribution is not permitted.
2 Harvard Business Review
November–December 2020
This article is made available to you with compliments of Apple Inc for your personal use. Further posting, copying or distribution is not permitted.
FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG
ORGANIZATIONAL
CULTURE
How Apple Is
Organized
for Innovation
It’s about experts
leading experts.
AUTHORS
Joel M. Morten T.
Podolny Hansen
Dean, Apple Faculty, Apple
University University
PHOTOGRAPHER MIKAEL JANSSON
Harvard Business Review 3
November–December 2020
This article is made available to you with compliments of Apple Inc for your persona

## Text Chunking Strategy

- Chunk size: 500–1000 characters (here we choose 800 characters).
- Chunk overlap: 200 characters.
- Reason: 800 characters are long enough to contain meaningful context but small enough for efficient retrieval and model input. Overlap of 200 helps preserve context across boundaries.


In [3]:
CHUNK_SIZE = 800
CHUNK_OVERLAP = 200

def chunk_text(text: str, chunk_size: int = CHUNK_SIZE, overlap: int = CHUNK_OVERLAP) -> List[str]:
    chunks = []
    start = 0
    text_length = len(text)

    while start < text_length:
        end = start + chunk_size
        chunk = text[start:end]
        chunks.append(chunk)
        start += chunk_size - overlap

    return chunks

chunks = chunk_text(raw_text)
print("Number of chunks:", len(chunks))
print("First chunk:\n", chunks[0][:500])


Number of chunks: 61
First chunk:
 REPRINT R2006F
PUBLISHED IN HBR
NOVEMBER–DECEMBER 2020
ARTICLE
ORGANIZATIONAL CULTURE
How Apple Is
Organized
for Innovation
It’s about experts leading experts.
by Joel M. Podolny and Morten T. Hansen
This article is made available to you with compliments of Apple Inc for your personal use. Further posting, copying or distribution is not permitted.
2 Harvard Business Review
November–December 2020
This article is made available to you with compliments of Apple Inc for your personal use. Further po


## Embedding Details

- Embedding model used: `sentence-transformers/all-MiniLM-L6-v2`.
- Reason: Lightweight, fast, and provides good semantic similarity performance. Open-source and easy to use with SentenceTransformers.


In [4]:
# Load embedding model
embed_model_name = "sentence-transformers/all-MiniLM-L6-v2"
embedder = SentenceTransformer(embed_model_name)

def embed_texts(texts: List[str]) -> np.ndarray:
    embeddings = embedder.encode(texts, convert_to_numpy=True, show_progress_bar=True)
    return embeddings

chunk_embeddings = embed_texts(chunks)
print("Embeddings shape:", chunk_embeddings.shape)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]



config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/2 [00:00<?, ?it/s]

Embeddings shape: (61, 384)


## Vector Database

- Vector store used: FAISS (L2 index).
- Reason: Simple, fast in-memory similarity search suitable for local experiments.


In [5]:
# Create FAISS index
embedding_dim = chunk_embeddings.shape[1]
index = faiss.IndexFlatL2(embedding_dim)
index.add(chunk_embeddings)

print("Number of vectors in FAISS index:", index.ntotal)


Number of vectors in FAISS index: 61


## Retrieval

We retrieve top-k most similar chunks for a user query using cosine/L2 similarity in the FAISS index.


In [6]:
def retrieve(query: str, top_k: int = 5):
    # Embed query
    query_emb = embedder.encode([query], convert_to_numpy=True)
    # Search in FAISS
    distances, indices = index.search(query_emb, top_k)
    retrieved_chunks = [chunks[i] for i in indices[0]]
    return retrieved_chunks, distances[0]

# Test retrieval only
test_query = "Write your sample query about the document."
retrieved, dists = retrieve(test_query, top_k=3)
for i, (chunk, dist) in enumerate(zip(retrieved, dists)):
    print(f"\n=== Retrieved Chunk {i+1} (distance={dist:.4f}) ===\n")
    print(textwrap.shorten(chunk.replace("\n", " "), width=400))



=== Retrieved Chunk 1 (distance=1.4565) ===

le Inc for your personal use. Further posting, copying or distribution is not permitted.

=== Retrieved Chunk 2 (distance=1.7309) ===

. Apple’s way of organizing has led to tremendous innovation Another issue that emerged was the ability to preview a and success over the past two decades. Yet it has not been portrait photo with a blurred background. The camera team without challenges, especially with revenues and head count had designed the feature so that users could see its effect on having exploded since 2008. Harvard [...]

=== Retrieved Chunk 3 (distance=1.7996) ===

often passionate critiques of his team’s to innovate and prosper by being organized this way. work. (Clearly, general managers without his core expertise would find it difficult to teach what they don’t know.) APPLE’S FUNCTIONAL ORGANIZATION is rare, if not unique, The second challenge for Rosner involved the addition among very large companies. It flies in the face of pr

## Generation (RAG)

The generation step takes the user query plus retrieved chunks and forms a final answer using an LLM.


In [7]:
# Example: simple heuristic "generation" without external API
# For your assignment, if allowed, replace with real LLM call.

def build_context(chunks: List[str], max_chars: int = 3000) -> str:
    context = ""
    for ch in chunks:
        if len(context) + len(ch) > max_chars:
            break
        context += ch + "\n\n"
    return context

def llm_generate_stub(query: str, context: str) -> str:
    # This is a placeholder. Replace with your LLM call if required.
    # For demonstration, we just echo context + query.
    answer = f"This is a stub answer.\n\nQuery: {query}\n\nRelevant context:\n{context[:1000]}"
    return answer

def rag_answer(query: str, top_k: int = 5) -> str:
    retrieved_chunks, _ = retrieve(query, top_k=top_k)
    context = build_context(retrieved_chunks)
    answer = llm_generate_stub(query, context)
    return answer


## Test Queries and Outputs


In [8]:
test_queries = [
    "Question 1 about the document content.",
    "Question 2 about a specific topic in the document.",
    "Question 3 asking for summary or explanation."
]

for i, q in enumerate(test_queries, start=1):
    print(f"\n============================")
    print(f"Test Query {i}: {q}")
    print(f"============================\n")
    ans = rag_answer(q, top_k=5)
    print(ans)
    print("\n" + "="*60 + "\n")



Test Query 1: Question 1 about the document content.

This is a stub answer.

Query: Question 1 about the document content.

Relevant context:
 often passionate critiques of his team’s to innovate and prosper by being organized this way.
work. (Clearly, general managers without his core expertise
would find it difficult to teach what they don’t know.) APPLE’S FUNCTIONAL ORGANIZATION is rare, if not unique,
The second challenge for Rosner involved the addition among very large companies. It flies in the face of prevailing
of activities beyond his original expertise. Six years ago he management theory that companies should be reorganized
was given responsibility for the engineering and design of into divisions and business units as they become large. But
News. Consequently, he had to learn about publishing news something vital gets lost in a shift to business units: the
content via an app—to understand news publications, digital al

0 years he has assumed responsibility for new
we have 