# Context

In `make_synthetic_questions.ipynb`, we generated synthetic questions to bootstrap evaluation of the retrieval system in our hardware store's Q&A system.

This notebook shows the first step in calculating precision and recall with different retrieval parameters. We will run more advanced experiments in future notebooks after we have these baseline scores.

## Data

Here is a brief review of the data.

In [1]:
import json
import lancedb
import pandas as pd
from typing import List, Dict
from concurrent.futures import ThreadPoolExecutor
from scoring_utils import EvalQuestion, score, score_reranked_search

pd.set_option("display.max_colwidth", 160)

db = lancedb.connect("./lancedb")
reviews_table = db.open_table("reviews")
reviews_table.to_pandas().head()

* 'allow_population_by_field_name' has been renamed to 'populate_by_name'
* 'smart_union' has been removed


Unnamed: 0,id,product_title,product_description,review,vector
0,0,Hammer,This 16 oz claw hammer is perfect for general carpentry and DIY projects. It features a comfortable grip and a durable steel head.,"I've been using this hammer for a few months now, and it's become my go-to tool for all my DIY projects. The 16 oz weight is perfect for driving nails witho...","[0.026041072, 0.04662072, 0.003556133, -0.014435542, 0.029466875, -0.014013522, -0.021647107, 0.005734497, 0.015900197, -0.013504617, 0.021088552, -0.021051..."
1,1,Hammer,This 16 oz claw hammer is perfect for general carpentry and DIY projects. It features a comfortable grip and a durable steel head.,"This hammer is a solid addition to my toolbox. The balance between the handle and the head makes it easy to control, and the 16 oz weight is just right for ...","[0.026080444, 0.04409138, 0.008676617, 0.010105856, 0.017947696, 0.0021928695, -0.037514355, 0.0035130181, 0.024208521, -0.020034637, 0.020540563, -0.048670..."
2,2,Hammer,This 16 oz claw hammer is perfect for general carpentry and DIY projects. It features a comfortable grip and a durable steel head.,"I purchased this hammer for some home renovation work, and it has exceeded my expectations. The steel head is tough and has withstood a lot of heavy use wit...","[0.03338692, 0.02774543, 0.0019985342, 0.0033709116, -0.005106521, -0.029180119, -0.030395957, 0.009209975, 0.05053024, -0.03496751, 0.05111384, -0.01512502..."
3,3,Hammer,This 16 oz claw hammer is perfect for general carpentry and DIY projects. It features a comfortable grip and a durable steel head.,"As a professional carpenter, I rely on my tools daily, and this hammer has not disappointed. The 16 oz weight is perfect for driving nails quickly and effic...","[0.02476854, 0.05620057, 0.022624861, -0.0050912397, 0.020209994, -0.014205107, -0.030089, 0.01576767, 0.015677273, -0.020804025, 0.02534966, -0.02673143, -..."
4,4,Hammer,This 16 oz claw hammer is perfect for general carpentry and DIY projects. It features a comfortable grip and a durable steel head.,"This hammer is a great value for the price. The 16 oz weight is perfect for general carpentry and DIY projects. The grip is comfortable and doesn't slip, ev...","[0.028411018, 0.0551858, -0.0011977376, -0.008559253, 0.033493266, 0.0071027544, -0.03272473, 0.025956662, 0.021209097, -0.035823666, 0.033493266, -0.019560..."


In [2]:
with open("synthetic_eval_dataset.json", "r") as f:
    synthetic_questions = json.load(f)
synthetic_questions[:5]
eval_questions = [EvalQuestion(**question) for question in synthetic_questions]

## Set Up Evaluation

Load the evaluation questions into a structured format.

Build a simple search function

In [3]:
def run_simple_request(q: EvalQuestion, n_return_vals=5):
    results = (
        reviews_table.search(q.question).select(["id"]).limit(n_return_vals).to_list()
    )
    return [str(q.chunk_id) == str(r["id"]) for r in results]

Now do the benchmarking. For simplicity, we just compare retrieval sizes with a simple semantic search in this cell.

In [4]:
def score_simple_search(n_to_retrieve: List[int]) -> Dict[str, float]:
    # parallelize to speed this up 5-10X
    with ThreadPoolExecutor() as executor:
        hits = list(
            executor.map(lambda q: run_simple_request(q, n_to_retrieve), eval_questions)
        )
    return score(hits)

k_to_retrieve = [5, 10, 20]
scores = pd.DataFrame([score_simple_search(n) for n in k_to_retrieve])
scores["n_retrieved"] = k_to_retrieve
scores

Unnamed: 0,precision,recall,n_retrieved
0,0.083,0.415,5
1,0.0635,0.635,10
2,0.041972,0.839444,20


If you have Cohere set up, you can see uf a reranker improves results (we'll talk more about rerankers in the coming weeks).

In [8]:
k_to_retrieve = [5, 10, 20]
reranked_scores = pd.DataFrame([score_reranked_search(eval_questions, reviews_table, n) for n in k_to_retrieve])
reranked_scores["n_retrieved"] = k_to_retrieve
print(reranked_scores)

   precision    recall  n_retrieved
0   0.107667  0.538333            5
1   0.076500  0.765000           10
2   0.045667  0.913333           20
