## 1. Load the RAG Mini-Wikipedia Dataset

Loading the dataset that contains both questions and Wikipedia passages. This dataset will help us evaluate how well the retriever works by matching questions to the relevant passages from Wikipedia. Whu


### Why this data?
Dataset is opensource, and indexing time would be less

In [28]:
from datasets import load_dataset

questions_dataset = load_dataset("rag-datasets/rag-mini-wikipedia", "question-answer",  split="test")
passages_dataset = load_dataset("rag-datasets/rag-mini-wikipedia", "text-corpus", split="passages")

print(f"Loaded {len(questions_dataset)} questions and {len(passages_dataset)} passages.")
sample = questions_dataset[0]
print("Example question:", sample["question"])
print("Answer:", sample["answer"])
print("Ground-truth passage ID:", sample["id"])


Loaded 918 questions and 3200 passages.
Example question: Was Abraham Lincoln the sixteenth President of the United States?
Answer: yes
Ground-truth passage ID: 0


## 2. Configure Retriever API

In [29]:
import requests
import time
import numpy as np
import math

INDEX_ENDPOINT = "http://localhost:3003//memoire/ingest/raw"
SEARCH_ENDPOINT = "http://localhost:3003//memoire/search"
API_KEY = "abc123" 

K = 5  # for Precision@K, Recall@K, etc.


## 3. Index the Passage Corpus

In [20]:

documents = []
if 'id' in passages_dataset.column_names:
    doc_ids = passages_dataset['id']
else:
    doc_ids = list(range(len(passages_dataset)))
text_field = 'passage'
doc_texts = passages_dataset[text_field]

# Updatng by batches to avoid content too long error
for doc_id, text in zip(doc_ids, doc_texts):
    documents.append({"documentID": str(doc_id), "content": text})
    if doc_id%32 == 0:
        index_response = requests.post(INDEX_ENDPOINT, headers = {"Authorization": f"Bearer {API_KEY}"}, json = {"documents": documents})
        documents = []

## 4. Retrieve Documents for Each Question

In [21]:
%%time
found_flags = []   
ranks = []         
latencies = []    

for item in questions_dataset:
    query = item["question"]
    true_id = item["id"]
    start_time = time.time()
    response = requests.post(SEARCH_ENDPOINT, headers = {"Authorization": f"Bearer {API_KEY}"}, json={"query": query, "maxResults": K})
    elapsed = time.time() - start_time
    latencies.append(elapsed)
    results = response.json()
    top_docs = results["results"] if "results" in results else results
    retrieved_ids = [int(doc["documentID"]) for doc in top_docs]

    if true_id in retrieved_ids:
        rank = retrieved_ids.index(true_id) + 1  
        found_flags.append(1)
        ranks.append(rank)
    else:
        found_flags.append(0)
        ranks.append(None)

total_queries = len(found_flags)
print(f"Completed retrieval for {total_queries} queries.")


Completed retrieval for 918 queries.
CPU times: user 2.26 s, sys: 204 ms, total: 2.46 s
Wall time: 1min 37s


## 5. Compute Precision@K and Recall@K
Precision@K measures how many of the top K results are relevant, while Recall@K measures how many relevant documents we found in the top K. In this case, we have a single relevant document for each question, so if it appears in the top K results, we count it as found.

In [22]:
%%time

total_found = sum(found_flags)  
N = len(found_flags)

precision_at_k = total_found / (N * K)
recall_at_k    = total_found / N

print(f"Precision@{K}: {precision_at_k:.3f}")
print(f"Recall@{K}: {recall_at_k:.3f}")


Precision@5: 0.000
Recall@5: 0.001
CPU times: user 380 μs, sys: 0 ns, total: 380 μs
Wall time: 356 μs


## 6. Compute Mean Reciprocal Rank (MRR)
Mean Reciprocal Rank (MRR) tells us how early the relevant document appears in the top results. If the relevant document is ranked first, we get a high MRR score (1/1 = 1). The later it appears, the lower the score.

In [25]:
%time
# Calculate reciprocal ranks for each query (0 if not found in top K)
mrr_sum = 0.0
for r in ranks:
    if r is not None:
        mrr_sum += 1.0 / r
mrr = mrr_sum / N

print(f"Mean Reciprocal Rank (MRR): {mrr:.3f}")


CPU times: user 14 μs, sys: 0 ns, total: 14 μs
Wall time: 25.5 μs
Mean Reciprocal Rank (MRR): 0.001


## 7. Compute Normalized Discounted Cumulative Gain (nDCG) (similar to mmr)
nDCG helps us measure how well the retriever ranks the relevant document. If the correct document appears early in the results, it contributes more to the score.

In [26]:
ndcg_sum = 0.0
for r in ranks:
    if r is not None:
        ndcg_sum += 1.0 / math.log2(r + 1)
avg_ndcg = ndcg_sum / N

print(f"nDCG@{K}: {avg_ndcg:.3f}")


nDCG@5: 0.001


## 8. Analyze Query Latency

In [27]:
latencies_ms = np.array(latencies) * 1000.0

avg_latency    = np.mean(latencies_ms)
median_latency = np.median(latencies_ms)
p95_latency    = np.percentile(latencies_ms, 95)
max_latency    = np.max(latencies_ms)

print(f"Average latency per query: {avg_latency:.2f} ms")
print(f"Median latency: {median_latency:.2f} ms")
print(f"95th percentile latency: {p95_latency:.2f} ms")
print(f"Max latency: {max_latency:.2f} ms")


Average latency per query: 106.11 ms
Median latency: 104.61 ms
95th percentile latency: 122.53 ms
Max latency: 301.70 ms
