# Context

In `make_synthetic_questions.ipynb`, we generated synthetic questions to bootstrap evaluation of the retrieval system in our hardware store's Q&A system.

This notebook shows the first step in calculating precision and recall with different retrieval parameters. We will run more advanced experiments in future notebooks after we have these baseline scores.

## Data

Here is a brief review of the data.

In [1]:
import json
import lancedb
import pandas as pd
from typing import List, Dict

pd.set_option("display.max_colwidth", 160)

db = lancedb.connect("./lancedb")
reviews_table = db.open_table("reviews")
reviews_table.to_pandas().head()

Unnamed: 0,id,product_id,review,vector
0,0,0,"I recently purchased this 16-ounce claw hammer for some home renovation projects, and I have to say, it exceeded my expectations. The hammer's weight is per...","[0.038354803, 0.04700025, 0.015779192, 0.008614214, 0.028684895, -0.0061779968, -0.022600599, 0.015216988, 0.022338238, -0.0005504914, 0.03358232, -0.030633..."
1,1,0,I bought this 16-ounce claw hammer a few months ago for general household tasks and it has quickly become my go-to tool for almost everything. The hammer st...,"[0.021753523, 0.06019874, 0.00015461344, 0.015222528, 0.056939416, -0.005882834, -0.032173485, 0.016913919, 0.023346147, -0.0023503557, 0.020346086, -0.0305..."
2,2,1,"I've had this 18V Cordless Drill for about six months now, and it has become an indispensable tool in my DIY kit. The drill offers incredible power, easily ...","[-0.00833237, 0.023142243, -0.019665092, -0.019304585, 0.035004094, -0.008117229, -0.024467979, 0.07805564, 0.04560999, 0.0064077266, 0.041190866, -0.015978..."
3,3,1,"This 18V Cordless Drill has exceeded my expectations in every way. I've used it for various home improvement projects, from assembling furniture to fixing u...","[-0.005477526, -0.005700413, -0.0036303112, -0.013446502, 0.019308737, 0.008976548, -0.01956521, 0.0635564, 0.041353185, -0.0035326073, 0.033707853, 0.01461..."
4,4,2,"I purchased this circular saw for a home renovation project, and it has exceeded my expectations. The 7-1/4 inch blade size is perfect for cutting through a...","[0.0047313096, 0.04547153, 0.022017673, -0.00079554913, 0.0035151835, 0.0201182, -0.037410352, 0.060783137, 0.02023402, 0.014570348, 0.046073806, -0.0418347..."


In [2]:
products_table = db.open_table("products")
products_table.to_pandas().head()

Unnamed: 0,id,title,description,vector
0,0,Hammer,"A versatile 16-ounce claw hammer ideal for driving nails, removing nails, and other household tasks. This durable hammer features a comfortable, ergonomic h...","[0.022986636, 0.043873817, 0.023010492, 0.0018892104, 0.027817765, -0.01138596, -0.037503883, 0.021030325, 0.014791608, -0.011457532, 0.028032482, -0.006608..."
1,1,Cordless Drill,"This 18V cordless drill offers the power and versatility needed for various tasks. It comes with a rechargeable battery, multiple speed settings, and a set ...","[-0.02154981, 0.0036577967, 0.01569321, -0.036480013, -0.00079587364, 0.010104695, -0.040171318, 0.06920685, 0.033448603, -0.001461572, 0.031860724, 0.01800..."
2,2,Circular Saw,"A robust 7-1/4 inch circular saw that is perfect for cutting through wood, plastic, and metal. The saw features an adjustable bevel, comfortable grip, and a...","[0.008456307, 0.036645822, 0.008084602, 0.002283531, 0.009609689, 0.030632934, -0.03209789, 0.07298554, 0.019973723, 0.007887817, 0.054312784, -0.046878666,..."
3,3,Adjustable Wrench,"A durable 10-inch adjustable wrench that is perfect for gripping and turning nuts, bolts, and pipes. The ergonomic handle ensures comfort during prolonged use.","[0.04762361, 0.04297113, 0.016914915, -0.011292196, -0.021579083, 0.00028877074, -0.004474211, -0.0067157005, -0.00034027823, -0.01869174, 0.043462098, -0.0..."
4,4,Ladder,"A sturdy 6-foot aluminum ladder ideal for home and professional use. It includes non-slip steps, a top tray for tools, and safety locks for extra stability.","[0.010086252, 0.0013882515, 0.014944766, -0.014897881, 0.022118254, -0.039383855, -0.026162133, 0.009658421, -0.0096877245, 0.019809142, -0.05752856, -0.031..."


In [3]:
with open("synthetic_eval_dataset.json", "r") as f:
    synthetic_questions = json.load(f)
synthetic_questions

[{'question': 'What is the weight of the Sawzall PX-1000 and how does it affect its use?',
  'answer': 'The Sawzall PX-1000 weighs 16 ounces, which makes it perfectly balanced and easy to drive nails with precision and minimal effort.',
  'chunk_id': '0'},
 {'question': 'What features contribute to the comfort of using the Sawzall PX-1000?',
  'answer': 'The Sawzall PX-1000 features an ergonomic handle with a rubber grip that is comfortable to hold and reduces hand fatigue, along with a non-slip texture for a secure grip.',
  'chunk_id': '0'},
 {'question': 'What is the weight of the Sawzall PX-1000?',
  'answer': 'The Sawzall PX-1000 is a 16-ounce claw hammer.',
  'chunk_id': '1'},
 {'question': 'What material is the handle of the Sawzall PX-1000 made from?',
  'answer': 'The handle of the Sawzall PX-1000 is made of durable rubber material.',
  'chunk_id': '1'},
 {'question': 'What type of materials can the Sawzall PX-1000 drill into?',
  'answer': 'The Sawzall PX-1000 can drill into 

## Set Up Evaluatoin

Load the evaluation questions into a structured format.

In [4]:
from pydantic import BaseModel


class EvalQuestion(BaseModel):
    question: str
    answer: str
    chunk_id: str


eval_questions = [EvalQuestion(**question) for question in synthetic_questions]

Build a simple search function

In [5]:
def run_simple_request(q: EvalQuestion, n_return_vals=5):
    results = reviews_table.search(q.question).limit(n_return_vals).to_list()
    returned_ids = [r["id"] for r in results]
    return [str(q.chunk_id) == r for r in returned_ids]

Now do the benchmarking. For simplicity, we will just compare retrieval sizes for now.

In [6]:
def score(hits):
    # This implementation assumes
    n_retrieval_requests = len(hits)
    total_retrievals = sum([len(l) for l in hits])
    true_positives = sum([sum(sublist) for sublist in hits])
    precision = true_positives / total_retrievals
    recall = true_positives / n_retrieval_requests
    return {"precision": precision, "recall": recall}


def score_simple_search(n_to_retrieve: List[int]) -> Dict[str, float]:
    in_raw_dataset = [run_simple_request(i, n_to_retrieve) for i in eval_questions]
    return score(in_raw_dataset)


k_to_retrieve = [5, 10, 20]
scores = pd.DataFrame([score_simple_search(n) for n in k_to_retrieve])
scores["n_retrieved"] = k_to_retrieve
scores

Unnamed: 0,precision,recall,n_retrieved
0,0.085417,0.427083,5
1,0.055208,0.552083,10
2,0.036198,0.723958,20
