# The goal here, is to use the the 2 documents from search-evalaution to make evaluation using 

# RAG Evaluation Metrics: Hit Rate and MRR

RAG evaluation metrics like hit rate and MRR (Mean Reciprocal Rank) are essential for measuring retrieval quality in Retrieval-Augmented Generation systems.

## Hit Rate (Recall@k)

Hit rate measures the percentage of queries where at least one relevant document appears in the top-k retrieved results. It's calculated as:

$$\text{Hit Rate@k} = \frac{\text{Number of queries with} \geq 1 \text{ relevant doc in top-k}}{\text{Total number of queries}}$$

For example, if you retrieve the top 5 documents for 100 queries, and 85 of those queries have at least one relevant document in those 5 results, your Hit Rate@5 = 0.85 or 85%.

## Mean Reciprocal Rank (MRR)

MRR measures how highly ranked the first relevant document is, on average. It's calculated as:

$$\text{MRR} = \frac{1}{N} \times \sum_{i=1}^{N} \frac{1}{\text{rank}_i}$$

Where $\text{rank}_i$ is the position of the first relevant document for query $i$, and $N$ is the total number of queries.

For instance, if across 3 queries the first relevant documents appear at positions 1, 3, and 2 respectively:

$$\text{MRR} = \frac{1}{3} \times \left(\frac{1}{1} + \frac{1}{3} + \frac{1}{2}\right) = \frac{1}{3} \times (1 + 0.33 + 0.5) = 0.61$$

## Key Differences

- **Hit rate** only cares whether you found relevant content, not where it ranked
- **MRR** penalizes systems that place relevant documents lower in the ranking
- Hit rate ranges from 0-1, MRR also ranges from 0-1 but gives higher scores when relevant docs rank higher

## Practical Usage

- **Hit rate** is useful when you just need to ensure relevant information is retrieved
- **MRR** is better when ranking quality matters (e.g., when users typically only look at top results)
- Both are commonly evaluated at different k values (k=1, 5, 10, etc.) to understand performance across different retrieval depths

These metrics help optimize your retrieval component before generation, ensuring your RAG system has access to relevant context for producing accurate responses.

opening the transformed document, that now containce and ID

In [21]:
import json

with open('documents-with-ids.json', 'rt') as f_in:
    documents = json.load(f_in)

Initialising elastic search

In [3]:
from elasticsearch import Elasticsearch

es_client = Elasticsearch('http://localhost:9200') 

index_settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},
            "section": {"type": "text"},
            "question": {"type": "text"},
            "course": {"type": "keyword"},
            "id": {"type": "keyword"},
        }
    }
}

index_name = "course-questions"

es_client.indices.delete(index=index_name, ignore_unavailable=True)
es_client.indices.create(index=index_name, body=index_settings)

  es_client.indices.create(index=index_name, body=index_settings)


ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'course-questions'})

Indexing document to elastic search

In [22]:
from tqdm.auto import tqdm

for doc in tqdm(documents):
    es_client.index(index=index_name, document=doc)

  0%|          | 0/948 [00:00<?, ?it/s]

In [5]:
def elastic_search(query, course):
    search_query = {
        "size": 5,
        "query": {
            "bool": {
                "must": {
                    "multi_match": {
                        "query": query,
                        "fields": ["question^3", "text", "section"],
                        "type": "best_fields"
                    }
                },
                "filter": {
                    "term": {
                        "course": course
                    }
                }
            }
        }
    }

    response = es_client.search(index=index_name, body=search_query)
    
    result_docs = []
    
    for hit in response['hits']['hits']:
        result_docs.append(hit['_source'])
    
    return result_docs

Testing elastic search with a query

In [23]:
elastic_search(
    query="I just discovered the course. Can I still join?",
    course="data-engineering-zoomcamp"
)

  response = es_client.search(index=index_name, body=search_query)


[{'text': "Yes, even if you don't register, you're still eligible to submit the homeworks.\nBe aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.",
  'section': 'General course-related questions',
  'question': 'Course - Can I still join the course after the start date?',
  'course': 'data-engineering-zoomcamp',
  'id': '31b29e57'},
 {'text': "Yes, even if you don't register, you're still eligible to submit the homeworks.\nBe aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.",
  'section': 'General course-related questions',
  'question': 'Course - Can I still join the course after the start date?',
  'course': 'data-engineering-zoomcamp',
  'id': '7842b56a'},
 {'text': 'You can start by installing and setting up all the dependencies and requirements:\nGoogle cloud account\nGoogle Cloud SDK\nPython 3 (installed with Anaconda)\nTerrafor

Evalaution by using `ground-truth-data` that contains the 5 sample question for each document

In [7]:
import pandas as pd

In [27]:
df_ground_truth = pd.read_csv('ground-truth-data.csv')

In [None]:
# convert to list of dictionary
ground_truth = df_ground_truth.to_dict(orient='records')

In [29]:
ground_truth[0]

{'question': 'When does the course begin?',
 'course': 'data-engineering-zoomcamp',
 'document': 'c02e79ef'}

Runinng the search for each document sample questions

In [30]:
relevance_total = []

for q in tqdm(ground_truth):
    doc_id = q['document']
    results = elastic_search(query=q['question'], course=q['course'])
    relevance = [d['id'] == doc_id for d in results]
    relevance_total.append(relevance)

  0%|          | 0/4627 [00:00<?, ?it/s]

  response = es_client.search(index=index_name, body=search_query)


Using an example 

In [13]:
example = [
    [True, False, False, False, False], # 1, 
    [False, False, False, False, False], # 0
    [False, False, False, False, False], # 0 
    [False, False, False, False, False], # 0
    [False, False, False, False, False], # 0 
    [True, False, False, False, False], # 1
    [True, False, False, False, False], # 1
    [True, False, False, False, False], # 1
    [True, False, False, False, False], # 1
    [True, False, False, False, False], # 1 
    [False, False, True, False, False],  # 1/3
    [False, False, False, False, False], # 0
]

# 1 => 1
# 2 => 1 / 2 = 0.5
# 3 => 1 / 3 = 0.3333
# 4 => 0.25
# 5 => 0.2
# rank => 1 / rank
# none => 0

## Hit Rate (Recall@k)

Hit rate measures the percentage of queries where at least one relevant document appears in the top-k retrieved results. It's calculated as:

$$\text{Hit Rate@k} = \frac{\text{Number of queries with} \geq 1 \text{ relevant doc in top-k}}{\text{Total number of queries}}$$



In [14]:
def hit_rate(relevance_total):
    cnt = 0

    for line in relevance_total:
        if True in line:
            cnt = cnt + 1

    return cnt / len(relevance_total)

## Mean Reciprocal Rank (MRR)

MRR measures how highly ranked the first relevant document is, on average. It's calculated as:

$$\text{MRR} = \frac{1}{N} \times \sum_{i=1}^{N} \frac{1}{\text{rank}_i}$$

In [15]:
def mrr(relevance_total):
    total_score = 0.0

    for line in relevance_total:
        for rank in range(len(line)):
            if line[rank] == True:
                total_score = total_score + 1 / (rank + 1)

    return total_score / len(relevance_total)

In [16]:
hit_rate(example)

0.5833333333333334

In [17]:
mrr(example)

0.5277777777777778

- hit-rate (recall)
- Mean Reciprocal Rank (mrr)

In [32]:
hit_rate(relevance_total), mrr(relevance_total)

(0.6202723146747352, 0.28552697932425625)

In [33]:
import minsearch

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course", "id"]
)

index.fit(documents)

<minsearch.minsearch.Index at 0x7fcbf57acda0>

In [34]:
def minsearch_search(query, course):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': course},
        boost_dict=boost,
        num_results=5
    )

    return results

In [35]:
relevance_total = []

for q in tqdm(ground_truth):
    doc_id = q['document']
    results = minsearch_search(query=q['question'], course=q['course'])
    relevance = [d['id'] == doc_id for d in results]
    relevance_total.append(relevance)

  0%|          | 0/4627 [00:00<?, ?it/s]

In [36]:
hit_rate(relevance_total), mrr(relevance_total)

(0.7722066133563864, 0.661454506159499)

In [37]:
def evaluate(ground_truth, search_function):
    relevance_total = []

    for q in tqdm(ground_truth):
        doc_id = q['document']
        results = search_function(q)
        relevance = [d['id'] == doc_id for d in results]
        relevance_total.append(relevance)

    return {
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }

In [38]:
evaluate(ground_truth, lambda q: elastic_search(q['question'], q['course']))

  0%|          | 0/4627 [00:00<?, ?it/s]

  response = es_client.search(index=index_name, body=search_query)


{'hit_rate': 0.6202723146747352, 'mrr': 0.28552697932425625}

In [39]:
evaluate(ground_truth, lambda q: minsearch_search(q['question'], q['course']))

  0%|          | 0/4627 [00:00<?, ?it/s]

{'hit_rate': 0.7722066133563864, 'mrr': 0.661454506159499}