## Homework: Search Evaluation

In this homework, we will evaluate the results of vector
search.

> It's possible that your answers won't match exactly. If it's the case, select the closest one.


## Required libraries

We will use minsearch and Qdrant. Make sure you have the most up-to-date versions:

```bash
pip install -U minsearch qdrant_client
``` 

minsearch should be at least 0.0.4.



## Evaluation data

For this homework, we will use the same dataset we generated
in the videos.

Let's get them:

```python
import requests
import pandas as pd

url_prefix = 'https://raw.githubusercontent.com/DataTalksClub/llm-zoomcamp/main/03-evaluation/'
docs_url = url_prefix + 'search_evaluation/documents-with-ids.json'
documents = requests.get(docs_url).json()

ground_truth_url = url_prefix + 'search_evaluation/ground-truth-data.csv'
df_ground_truth = pd.read_csv(ground_truth_url)
ground_truth = df_ground_truth.to_dict(orient='records')
```

Here, `documents` contains the documents from the FAQ database
with unique IDs, and `ground_truth` contains generated
question-answer pairs. 

Also, we will need the code for evaluating retrieval:

```python
from tqdm.auto import tqdm

def hit_rate(relevance_total):
    cnt = 0

    for line in relevance_total:
        if True in line:
            cnt = cnt + 1

    return cnt / len(relevance_total)

def mrr(relevance_total):
    total_score = 0.0

    for line in relevance_total:
        for rank in range(len(line)):
            if line[rank] == True:
                total_score = total_score + 1 / (rank + 1)

    return total_score / len(relevance_total)

def evaluate(ground_truth, search_function):
    relevance_total = []

    for q in tqdm(ground_truth):
        doc_id = q['document']
        results = search_function(q)
        relevance = [d['id'] == doc_id for d in results]
        relevance_total.append(relevance)

    return {
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }
```



## Q1. Minsearch text

Now let's evaluate our usual minsearch approach, indexing documents with:
```python
text_fields=["question", "section", "text"],
keyword_fields=["course", "id"]
```
but tweak the parameters for search. Let's use the following boosting params:

```python
boost = {'question': 1.5, 'section': 0.1}
```

What's the hitrate for this approach?

* 0.64
* 0.74
* 0.84
* 0.94

In [None]:
!pip install -U minsearch qdrant_client ipwidget jupyter

In [None]:
import requests  # for downloading datasets
import pandas as pd  # for loading and handling tabular data
from minsearch import Index  # the MinSearch class for text-based search
from tqdm.auto import tqdm  # for progress bars during evaluation

In [2]:
# Define base URL to GitHub raw data
url_prefix = 'https://raw.githubusercontent.com/DataTalksClub/llm-zoomcamp/main/03-evaluation/'

# URL for the documents JSON file
docs_url = url_prefix + 'search_evaluation/documents-with-ids.json'

# Download and parse the documents
documents = requests.get(docs_url).json()

# URL for the ground truth CSV file
ground_truth_url = url_prefix + 'search_evaluation/ground-truth-data.csv'

# Load ground truth into a DataFrame
df_ground_truth = pd.read_csv(ground_truth_url)

# Convert DataFrame to list of dictionaries for easier processing
ground_truth = df_ground_truth.to_dict(orient='records')


In [3]:
# Compute Hit Rate: % of queries for which the correct document was retrieved
def hit_rate(relevance_total):
    cnt = 0
    for line in relevance_total:
        if True in line:  # If any retrieved doc matches the correct ID
            cnt = cnt + 1
    return cnt / len(relevance_total)

# Compute Mean Reciprocal Rank (MRR)
def mrr(relevance_total):
    total_score = 0.0
    for line in relevance_total:
        for rank in range(len(line)):
            if line[rank] == True:  # True means relevant doc found at this rank
                total_score = total_score + 1 / (rank + 1)
                break  # only the first correct hit counts for MRR
    return total_score / len(relevance_total)

# Main evaluation loop
def evaluate(ground_truth, search_function):
    relevance_total = []

    for q in tqdm(ground_truth):  # iterate over each query
        doc_id = q['document']  # correct document id
        results = search_function(q)  # run search
        relevance = [d['id'] == doc_id for d in results]  # check if results match the true id
        relevance_total.append(relevance)  # collect all relevance flags

    return {
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }


In [4]:
# Create a MinSearch index with specified text and keyword fields
index = Index(
    text_fields=["question", "section", "text"],  # full-text searchable fields
    keyword_fields=["course", "id"]  # fields for exact matching
)

# Fit the index to our document list
index.fit(documents)


<minsearch.minsearch.Index at 0x78353f668dd0>

In [5]:
def search_function(q):
    return index.search(
        q["question"],  # use the question as the search query
        filter_dict={"course": q["course"]},  # filter by course to narrow down results
        boost_dict={"question": 1.5, "section": 0.1},  # boost weights for fields
        num_results=5  # how many top results to return
    )


In [6]:
# Evaluate using the ground truth and the defined search function
results = evaluate(ground_truth, search_function)

# Print final evaluation metrics
print(results)


  0%|          | 0/4627 [00:00<?, ?it/s]

{'hit_rate': 0.848714069591528, 'mrr': 0.7283553058137033}
