# RAG vs No-RAG Comparison Evaluation

This notebook compares model performance **with and without RAG** on a **fixed test set** to ensure fair comparison.

## Problem
Random test set splits mean different models were evaluated on different samples, making comparisons unreliable.

## Solution
1. Create a **fixed test set** (same samples for all evaluations)
2. Evaluate each model **twice** on the same test set:
   - Once **without RAG**
   - Once **with RAG**
3. Compare metrics side-by-side


In [None]:
# Install dependencies
%pip install -q transformers peft torch evaluate bert-score faiss-cpu pandas pyarrow google-generativeai numpy tqdm


In [None]:
import json
import torch
import numpy as np
from pathlib import Path
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from peft import PeftModel
import faiss
import pandas as pd
import google.generativeai as genai
from evaluate import load
from tqdm import tqdm

# Set paths
WORKDIR = Path("/content") if "content" in str(Path.cwd()) else Path.cwd()
MODELS_DIR = WORKDIR / "models" if (WORKDIR / "models").exists() else Path("./models")
BASE_MODEL_NAME = "masakhane/afri-byt5-base"
BASE_MODEL_PATH = MODELS_DIR / "afri-byt5-base"

# RAG paths
RAG_INDEX_PATH = WORKDIR / "rag_pipeline" / "4_vector_db" / "faiss_index.bin"
RAG_METADATA_PATH = WORKDIR / "rag_pipeline" / "4_vector_db" / "metadata.parquet"

# Fixed test set path
FIXED_TEST_SET_PATH = WORKDIR / "fixed_test_set.json"

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
print(f"Working directory: {WORKDIR}")


## Step 1: Create Fixed Test Set

Create a fixed test set that will be used for all comparisons. This ensures fair evaluation.
