# Advanced RAG System Demo

### Objectives:
1. **System Resource Monitoring**: Track RAM/CPU usage during heavy tasks.
2. **Pipeline Execution**: Load data, chunk, index, retrieve, and generate.
3. **Evaluation**: Compare Generated Answers vs Reference Answers using **BLEU-4** and **ROUGE-L** metrics.

In [1]:
# Setup Environment & Utils
!pip install -r requirements.txt

import sys
import os
import psutil
from dotenv import load_dotenv

# Ensure src is in python path
sys.path.append(os.getcwd())

def print_system_usage(stage=""):
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    mem_mb = mem_info.rss / 1024 / 1024
    print(f"[{stage}] Memory: {mem_mb:.2f} MB")

You should consider upgrading via the '/Users/gizemcidal/Desktop/rag_and_finetuning_task_vdf/venv/bin/python -m pip install --upgrade pip' command.[0m


In [2]:
# Load Configuration & Modules
from rag.config import RAGConfig
from rag.data_loader import DataLoader
from rag.chunking import HierarchicalChunker
from rag.vector_db import VectorDBHandler
from rag.retriever import HierarchicalRetriever
from rag.generator import RAGGenerator
from rag.evaluator import Evaluator

config = RAGConfig()
print("Configuration Loaded.")
print_system_usage("Init")

Configuration Loaded.
[Init] Memory: 521.23 MB


In [3]:
# Authenticate with Hugging Face (Required for Gemma Model)
from huggingface_hub import login

# Load existing .env file
load_dotenv()
hf_token = os.getenv("HF_TOKEN")

if hf_token and hf_token != "your_huggingface_token_here":
    print("Logging in with token from .env...")
    login(token=hf_token)
else:
    print("Please Paste Token manually or update .env file.")
    print("Get token: https://huggingface.co/settings/tokens")
    login()

Logging in with token from .env...


Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


In [4]:
# Data Loading
loader = DataLoader(config)

# Download Book
book_text = loader.download_book()
print(f"Book loaded. Length: {len(book_text)} chars")

# Load QA Pairs
qa_pairs = loader.load_qa_pairs()
print(f"Loaded {len(qa_pairs)} QA pairs for testing.")
print_system_usage("Data Loading")

`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'narrativeqa' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.


Book already exists at /Users/gizemcidal/Desktop/rag_and_finetuning_task_vdf/data/zuleika_dobson.txt
Book loaded. Length: 467598 chars
Loading NarrativeQA test split for ID 1845...


Resolving data files:   0%|          | 0/24 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/24 [00:00<?, ?it/s]

Found 40 QA pairs for Book ID 1845.
Loaded 40 QA pairs for testing.
[Data Loading] Memory: 572.02 MB


In [5]:
# Visualize 'Dirty' Patterns: Hard Wraps, Extra Spaces, Hyphenation
import re

print("--- DIAGNOSTICS: Text Hygiene Check ---")

# 1. Hard Wraps (Lines split by single \n)
# Finding snippets where a line break occurs mid-sentence (surrounded by lowercase/words)
hard_wrap_pattern = r'([a-z,]+)\n([a-z]+)'
matches_hw = re.findall(hard_wrap_pattern, book_text)
print(f"\n[Hard Wraps] Potential artificial line breaks mid-sentence: {len(matches_hw)}")
if matches_hw:
    print("Examples (WordEnd - Newline - WordStart):")
    # Let's verify by checking contexts in the actual text
    # Displaying a few snippets
    snippet_indices = [m.start() for m in re.finditer(hard_wrap_pattern, book_text)][:3]
    for idx in snippet_indices:
        safe_snippet = book_text[idx:idx+30].replace(chr(10), '[\\n]')
        print(f"  '...{safe_snippet}...'")

# 2. Excessive Whitespace
matches_ws = re.findall(r'[ ]{2,}', book_text)
print(f"\n[Whitespace] Sequences of multiple spaces found: {len(matches_ws)}")
if matches_ws:
    # Find a context example
    ws_obj = re.search(r'[ ]{2,}', book_text)
    if ws_obj:
        start = max(0, ws_obj.start() - 10)
        end = min(len(book_text), ws_obj.end() + 10)
        print(f"  Example: '...{book_text[start:end]}...'")

# 3. Hyphenation at Line End
matches_hyphen = re.findall(r'(\w+-\n\w+)', book_text)
print(f"\n[Hyphenation] Words split by hyphen+newline: {len(matches_hyphen)}")
if matches_hyphen:
    print("Examples:", matches_hyphen[:3])

--- DIAGNOSTICS: Text Hygiene Check ---

[Hard Wraps] Potential artificial line breaks mid-sentence: 4942
Examples (WordEnd - Newline - WordStart):
  '...with[\n]almost no restrictions wh...'
  '...or[\n]re-use it under the terms o...'
  '...included[\n]with this eBook or on...'

[Whitespace] Sequences of multiple spaces found: 156
  Example: '...hatsoever.  You may co...'

[Hyphenation] Words split by hyphen+newline: 0


In [13]:
print(f"Total Character Count: {len(book_text)}")
print("--- First 500 Characters (Should be CLEAN text, no Gutenberg headers) ---")
print(book_text[:2500])
print("\n--- Last 500 Characters (Should be CLEAN text, no legalese) ---")
print(book_text[-2000:])

Total Character Count: 467598
--- First 500 Characters (Should be CLEAN text, no Gutenberg headers) ---
The Project Gutenberg EBook of Zuleika Dobson, by Max Beerbohm

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org


Title: Zuleika Dobson
       or, An Oxford Love Story

Author: Max Beerbohm

Posting Date: November 25, 2008 [EBook #1845]
Release Date: August, 1999
Last Updated: October 18, 2016

Language: English

Character set encoding: UTF-8

*** START OF THIS PROJECT GUTENBERG EBOOK ZULEIKA DOBSON ***




Produced by Judy Boss





ZULEIKA DOBSON

or, AN OXFORD LOVE STORY

By Max Beerbohm





         NOTE to the 1922 edition

         I was in Italy when this book was first published.
         A year later (1912) I visited London, and I found
         that most of my friends and acq

In [6]:
import random
print(f"Total QA Pairs available for testing: {len(qa_pairs)}")
print("--- Random Sample of 3 Questions ---")
for i, item in enumerate(random.sample(qa_pairs, 3)):
    print(f"Q{i+1}: {item['question']}")
    print(f"A{i+1}: {item['answer1']}")
    print("-" * 30)

Total QA Pairs available for testing: 40
--- Random Sample of 3 Questions ---
Q1: Where does Zulika go when she leaves Oxford?
A1: Cambridge.
------------------------------
Q2: Who does Zuleika fall in with love while at school?
A2: The Duke of Dorset.
------------------------------
Q3: Who is the first person Zuleika falls in love with?
A3: The Duke of Dorset
------------------------------


In [7]:
# Hierarchical Chunking
chunker = HierarchicalChunker(
    parent_chunk_size=config.PARENT_CHUNK_SIZE,
    child_chunk_size=config.CHILD_CHUNK_SIZE,
    overlap=config.CHUNK_OVERLAP
)

chunks = chunker.chunk_data(book_text)
print(f"Created {len(chunks['parents'])} parent chunks and {len(chunks['children'])} child chunks.")

parents = chunks['parents']
children = chunks['children']
print_system_usage("Chunking")

Created 346 parent chunks and 30866 child chunks.
[Chunking] Memory: 580.94 MB


In [None]:
parent_ids = list(parents.keys())
example_parent_id = parent_ids[5]
parent_text = parents[example_parent_id]

print(f"*** PARENT CHUNK (ID: {example_parent_id}) ***")
print(f"Length: {len(parent_text)} chars")
print(f"Content: {parent_text[:300]}... [truncated]")

print("\n    || Converted into CHILDREN ||")
print("    \/")

related_children = [c for c in children if c['parent_id'] == example_parent_id]
for i, child in enumerate(related_children):
    print(f"    > Child {i+1} (ID: {child['child_id']}): {child['text'][:100]}... (Len: {len(child['text'])})")

*** PARENT CHUNK (ID: acb8ef14-b8bd-4eb4-938d-15c5361fbe13) ***
Length: 1998 chars
Content: single process. She was one of those who are born to make chaos cosmic.

Insomuch that ere the loud chapel-clock tolled another hour all the
trunks had been sent empty away. The carpet was unflecked by any scrap
of silver-paper. From the mantelpiece, photographs of Zuleika surveyed
the room with a p... [truncated]

    || Converted into CHILDREN ||
    \/
    > Child 1 (ID: 543d7785-14b9-499b-8b35-96f48be44e2a): single process. She was one of those who are born to make chaos cosmic.

Insomuch that ere the loud ... (Len: 497)
    > Child 2 (ID: 3ed86c8a-e3cf-44a4-b913-66ac556df3f8): able, and round it stood
a multitude of multiform glass vessels, domed, all of them, with dull
gold,... (Len: 495)
    > Child 3 (ID: fd3ef0e1-5f07-43da-b657-f7160ff011ca): ack of the other, A.B.C. GUIDE, in amethysts,
beryls, chrysoprases, and garnets. And Zuleika’s great... (Len: 493)
    > Child 4 (ID: a02991d7-94e

In [None]:
import gc
try:
    if 'vdb' in locals():
        print("Cleaning up previous DB instance...")
        if hasattr(vdb, 'close'):
            vdb.close()
        del vdb
        gc.collect() 
except Exception as e:
    print(f"Cleanup warning: {e}")

vdb = VectorDBHandler(config)


from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer(config.EMBEDDING_MODEL_NAME)
vdb.create_collection()

print("Indexing chunks... (this creates embeddings using CPU/GPU)")
vdb.index_chunks(chunks, embedding_model)
print_system_usage("Indexing")

Initializing Qdrant at /Users/gizemcidal/Desktop/rag_and_finetuning_task_vdf/data/qdrant_db


  self.client = QdrantClient(path=self.config.QDRANT_PATH)


Collection gutenberg_1845_children already exists.
Indexing chunks... (this creates embeddings using CPU/GPU)
Generating embeddings for 30866 chunks...


Batches:   0%|          | 0/965 [00:00<?, ?it/s]

  self.client.upsert(


Upserted 30866 points.
[Indexing] Memory: 2011.97 MB


In [7]:
# Initialize Components
retriever = HierarchicalRetriever(config, vdb, parents, embedding_model)
generator = RAGGenerator(config)
evaluator = Evaluator()
print("RAG Components Ready.")
print_system_usage("Model Load")

Loading embedding model: sentence-transformers/all-MiniLM-L6-v2
Loading Reranker model: cross-encoder/ms-marco-MiniLM-L-6-v2


config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

Loading LLM: google/gemma-3-1b-it


`torch_dtype` is deprecated! Use `dtype` instead!
Device set to use mps


RAG Components Ready.
[Model Load] Memory: 1234.86 MB


In [8]:
# Run RAG Loop & Evaluation
import pandas as pd

# Run on a subset or all pairs
test_pairs = qa_pairs[:5] # Testing on first 5 pairs for demo speed
results = []

print(f"Running RAG on {len(test_pairs)} queries...")

for i, qa in enumerate(test_pairs):
    question = qa['question']
    reference = qa['answer1']
    
    # 1. Retrieve
    context = retriever.retrieve_context(question, top_k=config.TOP_K)
    
    # 2. Generate
    generated_answer = generated_answer = generator.generate_answer(question, context, do_sample=False)
    
    # 3. Evaluate
    scores = evaluator.evaluate(generated_answer, reference)
    
    results.append({
        "Question": question,
        "Generated Answer": generated_answer,
        "Reference Answer": reference,
        "BLEU-4": scores['bleu'],
        "ROUGE-L": scores['rouge']
    })
    print(f".", end="") # Progress indicator

print("\nDone!")
print_system_usage("Inference Complete")

Running RAG on 5 queries...
.....
Done!
[Inference Complete] Memory: 3512.67 MB


In [None]:
# Results Analysis
df_results = pd.DataFrame(results)

# Calculate Averages
avg_bleu = df_results['BLEU-4'].mean()
avg_rouge = df_results['ROUGE-L'].mean()

print("--- Evaluation Summary ---")
print(f"Average BLEU-4: {avg_bleu:.4f}")
print(f"Average ROUGE-L: {avg_rouge:.4f}")

# Display Table
df_results[['Question', 'Generated Answer', 'BLEU-4', 'ROUGE-L']]

--- Evaluation Summary ---
Average BLEU-4: 0.0039
Average ROUGE-L: 0.1023


Unnamed: 0,Question,Generated Answer,BLEU-4,ROUGE-L
0,Who are Zuleika's most prominent suitors?,The text does not mention who Zuleika’s most p...,0.010331,0.2
1,Why does Zuleika reject the Duke?,Please provide me with the context! I need the...,0.009134,0.060606
2,Who is the first person Zuleika falls in love ...,"According to the text, Zuleika falls in love w...",0.0,0.117647
3,Where do Zuleika and her suitors meet?,"According to the text, Zuleika and her suitors...",0.0,0.133333
4,How does Zuleika stop the Duke's first suicide...,Please provide me with the context! I need the...,0.0,0.0


: 