In [1]:
import sys, os

# Make sure we're in app/ so local imports work
if os.path.basename(os.getcwd()) != "app":
    os.chdir(os.path.join(os.getcwd(), "app"))

sys.path.insert(0, os.getcwd())

from config import DEVICE
print(f"Device: {DEVICE}")

Device: cuda


# DocuMind — RAG Pipeline

This notebook runs the full **Retrieval-Augmented Generation** pipeline:
1. Load a document from disk
2. Chunk the text into overlapping segments
3. Embed chunks into vectors (GPU-accelerated)
4. Build a FAISS similarity index (GPU-accelerated)
5. Retrieve relevant chunks for a query
6. Build a grounded prompt
7. Generate an answer with an LLM

## Step 1 — Load Document

In [2]:
from ingestion import load_text_file

document = load_text_file(os.path.join("..", "data", "bangladesh.txt"))

Loaded file: ..\data\bangladesh.txt (2,124 characters)


## Step 2 — Chunk Document

In [3]:
from chunking import split_document

chunks = split_document(document)

print(f"\nPreview of first chunk:\n{chunks[0][:200]}...")

  from .autonotebook import tqdm as notebook_tqdm


Total chunks created: 5

Preview of first chunk:
Bangladesh is a South Asian country located on the Bay of Bengal. It shares its borders with India on the west, north, and east, and Myanmar on the southeast. The capital city of Bangladesh is Dhaka, ...


## Step 3 — Embed Chunks (GPU-Accelerated)

In [4]:
from embedding import EmbeddingModel

embedder = EmbeddingModel()
chunk_embeddings = embedder.embed_documents(chunks)

print(f"Embedding shape: {chunk_embeddings.shape}")

Loading embedding model on [CUDA]...


Loading weights: 100%|██████████| 103/103 [00:00<00:00, 829.32it/s, Materializing param=pooler.dense.weight]                             
[1mBertModel LOAD REPORT[0m from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


Embedding model 'all-MiniLM-L6-v2' loaded successfully.


Batches: 100%|██████████| 1/1 [00:00<00:00,  6.82it/s]

Embedding shape: (5, 384)





## Step 4 — Build FAISS Index

In [5]:
from vector_store import VectorStore

vector_db = VectorStore()
vector_db.build_index(chunk_embeddings, chunks)

FAISS index built on [CPU] with 5 vectors (dim=384).


## Step 5 — Retrieve Relevant Chunks

In [6]:
from retriever import Retriever

retriever = Retriever(embedder, vector_db)

query = "What is the capital of Bangladesh?"
relevant_chunks = retriever.retrieve(query)

print(f"Retrieved {len(relevant_chunks)} chunks for: '{query}'\n")
for i, chunk in enumerate(relevant_chunks, 1):
    print(f"--- Chunk {i} ---\n{chunk}\n")

Retrieved 3 chunks for: 'What is the capital of Bangladesh?'

--- Chunk 1 ---
Bangladesh is a South Asian country located on the Bay of Bengal. It shares its borders with India on the west, north, and east, and Myanmar on the southeast. The capital city of Bangladesh is Dhaka, which is also the largest city in the country.

Bangladesh gained independence from Pakistan on December 16, 1971, after a nine-month Liberation War. The Father of the Nation is Bangabandhu Sheikh Mujibur Rahman. The country celebrates Victory Day every year on December 16.

--- Chunk 2 ---
Geographically, Bangladesh is mostly flat and consists of fertile plains. It is formed by the delta of three major rivers: the Ganges, the Brahmaputra, and the Meghna. Because of this delta formation, the country is highly prone to floods and cyclones.

Bangladesh is famous for the Sundarbans, which is the largest mangrove forest in the world and home to the Royal Bengal Tiger. Another important tourist attraction is Cox’s Baz

## Step 6 — Build Prompt & Generate Answer

In [7]:
from prompt import build_prompt
from llm import LLM

prompt = build_prompt(query, relevant_chunks)

llm = LLM()
answer = llm.generate(prompt)

print("\n" + "=" * 50)
print("QUESTION:", query)
print("=" * 50)
print("\nANSWER:\n")
print(answer)

Loading LLM on [CUDA]...


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Loading weights: 100%|██████████| 201/201 [00:01<00:00, 134.07it/s, Materializing param=model.norm.weight]                              


LLM 'TinyLlama/TinyLlama-1.1B-Chat-v1.0' loaded successfully.

QUESTION: What is the capital of Bangladesh?

ANSWER:

The capital of Bangladesh is Dhaka.
