# Context

In `make_synthetic_questions.ipynb`, we generated synthetic questions to bootstrap evaluation of the retrieval system in our hardware store's Q&A system.

This notebook shows the first step in calculating precision and recall with different retrieval parameters. We will run more advanced experiments in future notebooks after we have these baseline scores.

## Data

Here is a brief review of the data.

In [1]:
import json
import lancedb
import pandas as pd
from typing import List, Dict

pd.set_option("display.max_colwidth", 160)

db = lancedb.connect("./lancedb")
reviews_table = db.open_table("reviews")
reviews_table.to_pandas().head()

Unnamed: 0,id,product_id,review,vector
0,0,0,This hammer has been a game-changer for my general carpentry work. The ergonomic grip is fantastic; it doesn't slip even when my hands are sweaty. The balan...,"[0.028299237, 0.040523008, -0.015594003, 0.018121675, 0.024046326, 0.023604987, -0.015166037, 0.023404378, 0.033060357, -0.021211054, 0.02139829, -0.0227490..."
1,1,0,"I've used many hammers over the years, but this one is easily the most comfortable. The handle has a great ergonomic design that reduces hand fatigue, even ...","[0.040150184, 0.049218856, 0.0037333986, 0.02435106, 0.02382141, -0.0025449118, -0.015101533, 0.0005264221, 0.04064108, 0.0016567764, 0.018369872, -0.028627..."
2,2,0,"This versatile claw hammer is a must-have for any DIY enthusiast. The grip is soft yet firm, making it easier to handle during extended projects. The balanc...","[0.02847098, 0.04930216, -0.0132550495, 0.014579281, 0.030482793, -0.0014380328, -0.021404168, 0.025045805, 0.0264337, -0.019099494, 0.023301383, -0.0213277..."
3,3,0,"I've been using this hammer for home repairs, and it has exceeded my expectations. The ergonomic grip is cushioned and prevents blisters, which is a big plu...","[0.032133453, 0.04300249, 0.015580612, 0.027521593, 0.019880861, -0.014733027, -0.018484838, 0.015954547, 0.05053104, -0.02026726, 0.034925498, -0.010520029..."
4,4,0,"This hammer ticks all the boxes for what I need in my toolkit. The ergonomic grip fits perfectly in my hand and reduces vibrations, which is essential for m...","[0.0028318465, 0.059558474, -0.010654663, 0.01611974, 0.04999939, 0.01956665, -0.0065574544, 0.021219628, 0.0003812101, -0.0067977128, 0.039620224, -0.02911..."


In [2]:
products_table = db.open_table("products")
products_table.to_pandas().head()

Unnamed: 0,id,title,description,vector
0,0,Hammer,A versatile claw hammer for general carpentry and home repair. Features an ergonomic grip and balanced weight for efficient and comfortable use.,"[0.034557875, 0.0626897, 0.014434814, 0.007270958, 0.014863218, 0.0004704256, -0.037223496, 0.020837065, 0.025942206, -0.0033439265, 0.010721985, -0.0361048..."
1,1,Cordless Drill,A powerful cordless drill with a rechargeable lithium-ion battery. Includes multiple drill bits and adjustable speed settings for various tasks.,"[0.0028601808, -0.008256353, -0.0016653886, -0.032711685, -0.00029657382, 0.012476035, -0.03971834, 0.06906264, 0.022567715, 0.010666853, 0.02626974, 0.0372..."
2,2,Adjustable Wrench,A durable adjustable wrench with a non-slip handle. Perfect for tightening or loosening nuts and bolts of different sizes.,"[0.026549695, 0.027072059, -0.017692227, -0.024937183, -0.03470311, -0.013490607, 0.002430126, -0.02614089, -0.026095467, -0.013365693, 0.024006013, -0.0016..."
3,3,Screwdriver Set,"A comprehensive screwdriver set that includes flathead, Phillips, and Torx drivers. Made from high-quality steel for long-lasting performance.","[0.036627404, -0.014450144, -0.028289104, -0.033025783, 0.025975335, -0.045664202, 0.0013253696, 0.00069917855, 0.004864918, -0.02460017, 0.058761008, -0.00..."
4,4,Tape Measure,A 25-foot tape measure with a lockable blade and easy-to-read markings. Ideal for accurate measurements in construction and DIY projects.,"[0.048835468, 0.074131005, 0.0055233533, -0.030364534, -0.034419734, 0.019200375, -0.0253079, -0.059294913, 0.007121324, -7.2152085e-05, 0.011306338, -0.039..."


In [3]:
with open("synthetic_eval_dataset.json", "r") as f:
    synthetic_questions = json.load(f)
synthetic_questions

[{'question': 'What feature of the Sawzall PX-1000 contributes to a secure hold during use?',
  'answer': 'The ergonomic grip is designed to prevent slipping even when hands are sweaty.',
  'chunk_id': '0'},
 {'question': 'How has the Sawzall PX-1000 performed in terms of maintaining its condition over time?',
  'answer': 'The product has remained in perfect condition after multiple heavy-duty projects over a period of six months.',
  'chunk_id': '0'},
 {'question': 'What feature of the Sawzall PX-1000 contributes to comfort during extended use?',
  'answer': 'The ergonomic design of the handle reduces hand fatigue, even after hours of use.',
  'chunk_id': '1'},
 {'question': 'How does the claw feature of the Sawzall PX-1000 perform when removing nails?',
  'answer': 'The claw is sturdy and removes nails without bending them.',
  'chunk_id': '1'},
 {'question': "What is a notable feature of the Sawzall PX-1000's grip?",
  'answer': 'The grip is soft yet firm, making it easier to handle

## Set Up Evaluatoin

Load the evaluation questions into a structured format.

In [4]:
from pydantic import BaseModel


class EvalQuestion(BaseModel):
    question: str
    answer: str
    chunk_id: str


eval_questions = [EvalQuestion(**question) for question in synthetic_questions]

Build a simple search function

In [5]:
def run_simple_request(q: EvalQuestion, n_return_vals=5):
    results = reviews_table.search(q.question).limit(n_return_vals).to_list()
    returned_ids = [r["id"] for r in results]
    return [str(q.chunk_id) == r for r in returned_ids]

Now do the benchmarking. For simplicity, we will just compare retrieval sizes for now.

In [6]:
def score(hits):
    # This implementation assumes
    n_retrieval_requests = len(hits)
    total_retrievals = sum([len(l) for l in hits])
    true_positives = sum([sum(sublist) for sublist in hits])
    precision = true_positives / total_retrievals
    recall = true_positives / n_retrieval_requests
    return {"precision": precision, "recall": recall}


def score_simple_search(n_to_retrieve: List[int]) -> Dict[str, float]:
    in_raw_dataset = [run_simple_request(i, n_to_retrieve) for i in eval_questions]
    return score(in_raw_dataset)


k_to_retrieve = [5, 10, 20]
scores = pd.DataFrame([score_simple_search(n) for n in k_to_retrieve])
scores["n_retrieved"] = k_to_retrieve
scores

Unnamed: 0,precision,recall,n_retrieved
0,0.051429,0.257143,5
1,0.034805,0.348052,10
2,0.021494,0.42987,20
