## Step 3: Vector Index

3a - Baseline tf-idf 

3b - Dense model (all-MiniLM-L6-v2)

3c - Two tower dense model

3d - Hybrid model


REfrence links:

Uber blog on two tower arch: https://www.uber.com/blog/innovative-recommendation-applications-using-two-tower-embeddings/


https://www.kaggle.com/code/abhishekmungoli/amazonproductsearch-minidataset-input-embeddings


https://www.kaggle.com/code/abhishekmungoli/two-tower-retrieval-recommendation-model-training:

In [14]:
#!pip install chromadb
#!pip install sentence_transformers
#!pip install tfidf_index
#!pip install pandas
#!pip install numpy

In [None]:
import pandas as pd
import numpy as np
import os
import warnings
import chromadb
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors
import shutil
from tqdm import tqdm
from sentence_transformers import SentenceTransformer
from sentence_transformers import CrossEncoder

In [16]:
# Load the saved sample_2c DataFrame from the CSV file
df_sample_2c = pd.read_csv('/Users/ashrithgrandi/Desktop/Grainger/sample_2c_full_data.csv')

df_sample_2c.head()

Unnamed: 0,example_id,query,query_id,product_id,product_locale,esci_label,small_version,large_version,split,product_title,product_description,product_bullet_point,product_brand,product_color
0,118060,6 dining chairs,4845,B08CZ6TC2L,us,E,1,1,train,Yaheetech Dining Chairs Velvet Armchairs for C...,Set of 6 Kitchen Dining Chairs for Counter Lou...,STRONG METAL LEGS: To enhance the weight capac...,Yaheetech,Grey
1,118064,6 dining chairs,4845,B08HQG1MFS,us,E,1,1,train,CozyCasa Dining Chairs Modern Style Dining Cha...,<b>If you are in search of some quality-reliab...,Dining Chairs set of 6 -- White PP backrest an...,CozyCasa,White
2,118065,6 dining chairs,4845,B08K2K3J4C,us,E,1,1,train,Yaheetech Dining Chairs with Waterproof leathe...,Make every long-time sitting comfortable. The ...,MULTIPLE USE: Sold in a set of 6 chairs. Desig...,Yaheetech,Brown
3,118066,6 dining chairs,4845,B08K2V66N8,us,E,1,1,train,Yaheetech Dining Chairs Dining Room Chairs Liv...,Make every dinner time comfortable. Constructe...,MULTIPLE USE: Sold in a set of 6 chairs. This ...,Yaheetech,Khaki
4,118067,6 dining chairs,4845,B08K8VDTW8,us,E,1,1,train,Modern Dining Chairs Set of 6 - Faux Leather D...,<b>Modern Dining Chairs Set of 6 - Faux Leathe...,Comfortable Dining Chairs Set of 6 - The dinin...,WENYU,Grey


In [None]:
# 2a. Create the Product Corpus (de-duplicated products)
print("Processing data into a unique Product Corpus...")
product_columns = [
    'product_id', 
    'product_title', 
    'product_description', 
    'product_bullet_point', 
    'product_brand', 
    'product_color'
]
product_corpus_df = df_sample_2c[product_columns].drop_duplicates(subset=['product_id']).reset_index(drop=True)

# Fill NaNs with empty strings
text_cols_to_fill = product_columns[1:] # All except product_id
for col in text_cols_to_fill:
    product_corpus_df[col] = product_corpus_df[col].fillna('')

# Combine all text fields into a single 'product_text' for embedding
product_corpus_df['product_text'] = (
    product_corpus_df['product_title'] + ' ' +
    product_corpus_df['product_brand'] + ' ' +
    product_corpus_df['product_color'] + ' ' +
    product_corpus_df['product_description'] + ' ' +
    product_corpus_df['product_bullet_point']
)
# Cleaning up extra whitespaces
product_corpus_df['product_text'] = product_corpus_df['product_text'].str.replace(r'\s+', ' ', regex=True).str.strip()
print(f"Created a corpus of {len(product_corpus_df)} unique products.")

# 2b. Create Query Evaluation Set (query-to-product pairs)
query_eval_set = df_sample_2c[['query', 'query_id', 'product_id', 'esci_label']].copy()
print(f"Created an evaluation set of {len(query_eval_set)} query-product pairs.")

Processing data into a unique Product Corpus...
Created a corpus of 485 unique products.
Created an evaluation set of 485 query-product pairs.


## 3a: tf-idf

In [None]:
documents = product_corpus_df['product_text'].tolist()

print(f"Creating TF-IDF embeddings (sparse vectors) for {len(documents)} documents...")

# stop words and limit to the top 5000 most frequent terms
tfidf_vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)

# Create the sparse TF-IDF matrix
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
print(f"TF-IDF matrix created with shape: {tfidf_matrix.shape}")

# Using 'cosine' similarity and 'brute' force
print("Building in-memory sparse index with scikit-learn (NearestNeighbors)...")
n_neighbors = 10
nn_index = NearestNeighbors(n_neighbors=n_neighbors, metric='cosine', algorithm='brute')
nn_index.fit(tfidf_matrix)
print("In-memory sparse index built successfully.")

Creating TF-IDF embeddings (sparse vectors) for 485 documents...
TF-IDF matrix created with shape: (485, 5000)
Building in-memory sparse index with scikit-learn (NearestNeighbors)...
In-memory sparse index built successfully.


In [7]:
test_query = "carpenter bench press"

# 2. Find the ground truth for this query from our in-memory DataFrame
ground_truth_df = query_eval_set[query_eval_set['query'] == test_query]
ground_truth_ids = ground_truth_df['product_id'].tolist()

print(f"Test Query: '{test_query}'")
print(f"Ground Truth 'Exact' Product IDs ({len(ground_truth_ids)}): {ground_truth_ids}")

# 3. Embed the test query *using the same vectorizer*
# .transform returns a sparse 2D matrix
query_vector = tfidf_vectorizer.transform([test_query])

# 4. Search the index
print(f"\nSearching index for Top {n_neighbors} results...")
# .kneighbors returns (distances, indices)
distances, indices = nn_index.kneighbors(query_vector)

# Flatten the results from 2D to 1D
result_indices = indices[0]
result_distances = distances[0]


Test Query: 'carpenter bench press'
Ground Truth 'Exact' Product IDs (8): ['B088R8VC7V', 'B07N4QN64D', 'B01M4F8JZJ', 'B01LZV1QW1', 'B00HQONFVE', 'B00BHSPJC8', 'B00068U7XQ', 'B00SIQ1DLS']

Searching index for Top 10 results...


In [8]:
# 5. Display results
print("Search Results (lower distance is better):")

# Get the full product info from the original corpus_df using the indices
results_df = product_corpus_df.iloc[result_indices].copy()

# Add the distance and a ground_truth check
results_df['_distance'] = result_distances
results_df['is_ground_truth'] = results_df['product_id'].isin(ground_truth_ids)

# Display the relevant columns
print(results_df[['product_id', '_distance', 'is_ground_truth', 'product_title']])

# Calculate a simple metric for this query
matches_in_top_10 = results_df['is_ground_truth'].sum()
print(f"\nQuery-specific Metric: Found {matches_in_top_10} out of {len(ground_truth_ids)} ground truth items in the Top 10 results.")


Search Results (lower distance is better):
     product_id  _distance  is_ground_truth  \
73   B01LZV1QW1   0.813588             True   
70   B088R8VC7V   0.868601             True   
76   B00068U7XQ   0.883066             True   
431  B06XD1DVBG   0.885864            False   
71   B07N4QN64D   0.893176             True   
427  B06XD5G2XB   0.911316            False   
77   B00SIQ1DLS   0.912220             True   
74   B00HQONFVE   0.916837             True   
129  B07G15QZNQ   0.924860            False   
342  B0015TX0MU   0.925376            False   

                                         product_title  
73   Genesis GDP805P 5-Speed 2.6 Amp 8" Drill Press...  
70   Workbench Mounted Drilling Machine, 350W 5 Spe...  
76              Palmgren Ratcheting arbor press, 3 ton  
431  Creative Teaching Press Safari Friends Calenda...  
71   Drill Stand for Hand Drill, Electric Bench Cla...  
427  CTP Woodland Friends Calendar Set Bulletin Boa...  
77   Yost M7WW Rapid Acting Wood Working

## 3b: Dense model (all-MiniLM-L6-v2)


In [9]:
# Step 1: Setup, Constants, and Model Loading
MODEL_NAME = 'all-MiniLM-L6-v2'
DB_PATH = "./chroma_data"
COLLECTION_NAME = "product_embeddings"

model = SentenceTransformer(MODEL_NAME)
print("Model loaded")

Model loaded


In [10]:
# Step 3: Embed Documents for Indexing ---
documents = product_corpus_df['product_text'].tolist()
product_ids = product_corpus_df['product_id'].tolist()

print(f"Embedding {len(documents)}")
embeddings = model.encode(documents, show_progress_bar=True)
print("Embeddings generated.")

# Prepare data for ChromaDB
str_product_ids = [str(pid) for pid in product_ids]
metadatas = [{"product_id": pid, "text": doc} for pid, doc in zip(product_ids, documents)]

# --- Step 4: Create the ChromaDB Persistent Index ---
# Clean up old data directory if it exists, for a clean run
if os.path.exists(DB_PATH):
    shutil.rmtree(DB_PATH)
    
# Initialize a persistent ChromaDB client
client = chromadb.PersistentClient(path=DB_PATH)

# Create a collection
print(f"Creating ChromaDB collection '{COLLECTION_NAME}' at '{DB_PATH}'...")
collection = client.get_or_create_collection(
    name=COLLECTION_NAME, 
    metadata={"hnsw:space": "l2"} # 'l2' (Euclidean) is standard for all-MiniLM
)

# Add the data to the collection
print(f"Adding {len(str_product_ids)} vectors to collection...")
collection.add(
    embeddings=embeddings.tolist(),
    documents=documents,
    metadatas=metadatas,
    ids=str_product_ids # Chroma requires a list of unique string IDs
)

print(f"Successfully created collection with {collection.count()} vectors.")


Embedding 485


Batches: 100%|██████████| 16/16 [00:09<00:00,  1.61it/s]


Embeddings generated.
Creating ChromaDB collection 'product_embeddings' at './chroma_data'...
Adding 485 vectors to collection...
Successfully created collection with 485 vectors.


In [11]:
# Step 5: Run a Test Query 

test_query = "carpenter bench press"

# 2. Find the ground truth for this query from our in-memory DataFrame
ground_truth_df = query_eval_set[query_eval_set['query'] == test_query]
ground_truth_ids = ground_truth_df['product_id'].tolist()

print(f"Test Query: '{test_query}'")
print(f"Ground Truth 'Exact' Product IDs ({len(ground_truth_ids)}): {ground_truth_ids}")

# 3. Embed the test query
query_vector = model.encode([test_query]).tolist()

# 4. Search the index
n_neighbors = 10
print(f"\nSearching index for Top {n_neighbors} results...")
search_results = collection.query(
    query_embeddings=query_vector,
    n_results=n_neighbors,
)

# 5. Display results
print("Search Results (lower distance is better):")

# Process Chroma's output format
result_metadatas = search_results['metadatas'][0]
result_distances = search_results['distances'][0]

# Create a DataFrame for easy viewing
results_df = pd.DataFrame({
    'product_id': [meta['product_id'] for meta in result_metadatas],
    '_distance': result_distances
})

# Add a column to show if the result is a "Ground Truth" match
results_df['is_ground_truth'] = results_df['product_id'].isin(ground_truth_ids)

# Join with corpus_df to get the title
results_df = results_df.merge(product_corpus_df[['product_id', 'product_title']], on='product_id', how='left')

# Print the relevant columns
print(results_df[['product_id', '_distance', 'is_ground_truth', 'product_title']])

# Calculate a simple metric for this query
matches_in_top_10 = results_df['is_ground_truth'].sum()
print(f"\nQuery-specific Metric: Found {matches_in_top_10} out of {len(ground_truth_ids)} ground truth items in the Top 10 results.")

print("\n--- Compare this to our TF-IDF result ---")
print("TF-IDF Metric: Found 6 out of 8 ground truth items in the Top 10 results.")

Test Query: 'carpenter bench press'
Ground Truth 'Exact' Product IDs (8): ['B088R8VC7V', 'B07N4QN64D', 'B01M4F8JZJ', 'B01LZV1QW1', 'B00HQONFVE', 'B00BHSPJC8', 'B00068U7XQ', 'B00SIQ1DLS']

Searching index for Top 10 results...
Search Results (lower distance is better):
   product_id  _distance  is_ground_truth  \
0  B07N4QN64D   0.949832             True   
1  B088R8VC7V   1.055872             True   
2  B00HQONFVE   1.058271             True   
3  B00068U7XQ   1.136232             True   
4  B01LZV1QW1   1.194515             True   
5  B00SIQ1DLS   1.298030             True   
6  B07HH3K4SK   1.401191            False   
7  B01GOM6OUM   1.409457            False   
8  B00BHSPJC8   1.445602             True   
9  B078WZLFHG   1.452312            False   

                                       product_title  
0  Drill Stand for Hand Drill, Electric Bench Cla...  
1  Workbench Mounted Drilling Machine, 350W 5 Spe...  
2  WEN 4208T 2.3-Amp 8-Inch 5-Speed Benchtop Dril...  
3             P

## Eval

In [26]:
# Calculate average metrics
def print_avg_metrics(results, model_name):
    avg_hits = {k: np.mean([r['hits'][k] for r in results]) for k in results[0]['hits'].keys()}
    avg_mrr = np.mean([r['mrr'] for r in results])
    
    print(f"\n{model_name} Average Metrics:")
    for k, v in avg_hits.items():
        print(f"{k.upper()}: {v:.3f}")
    print(f"MRR: {avg_mrr:.3f}")
    
    return avg_hits, avg_mrr

# Store metrics for both models
tfidf_hits, tfidf_mrr = print_avg_metrics(tfidf_results, "TF-IDF")
dense_hits, dense_mrr = print_avg_metrics(dense_results, "Dense Model")

# Store individual metric components for easier access
tfidf_mrrs = [r['mrr'] for r in tfidf_results]
tfidf_recalls_at_1 = [r['hits']['hits@1'] for r in tfidf_results]
tfidf_recalls_at_5 = [r['hits']['hits@5'] for r in tfidf_results]
tfidf_recalls_at_10 = [r['hits']['hits@10'] for r in tfidf_results]

dense_mrrs = [r['mrr'] for r in dense_results]
dense_recalls_at_1 = [r['hits']['hits@1'] for r in dense_results]
dense_recalls_at_5 = [r['hits']['hits@5'] for r in dense_results]
dense_recalls_at_10 = [r['hits']['hits@10'] for r in dense_results]

print("\nPer-Query Results:")
for i, query in enumerate(unique_queries):
    print(f"\nQuery: {query}")
    print("TF-IDF:", tfidf_results[i])
    print("Dense:", dense_results[i])


TF-IDF Average Metrics:
HITS@1: 0.130
HITS@5: 0.503
HITS@10: 0.689
MRR: 0.284

Dense Model Average Metrics:
HITS@1: 0.137
HITS@5: 0.554
HITS@10: 0.748
MRR: 0.308

Per-Query Results:

Query: 6 dining chairs
TF-IDF: {'query': '6 dining chairs', 'hits': {'hits@1': 0.045454545454545456, 'hits@5': 0.22727272727272727, 'hits@10': 0.45454545454545453}, 'mrr': 0.13313492063492063}
Dense: {'query': '6 dining chairs', 'hits': {'hits@1': 0.045454545454545456, 'hits@5': 0.22727272727272727, 'hits@10': 0.4090909090909091}, 'mrr': 0.12555916305916306}

Query: a intex pool pump
TF-IDF: {'query': 'a intex pool pump', 'hits': {'hits@1': 0.16666666666666666, 'hits@5': 0.8333333333333334, 'hits@10': 1.0}, 'mrr': 0.40833333333333327}
Dense: {'query': 'a intex pool pump', 'hits': {'hits@1': 0.16666666666666666, 'hits@5': 0.8333333333333334, 'hits@10': 1.0}, 'mrr': 0.4083333333333334}

Query: activated carbon mask
TF-IDF: {'query': 'activated carbon mask', 'hits': {'hits@1': 0.1, 'hits@5': 0.5, 'hits@10': 

In [27]:
# Calculate average metrics
def print_avg_metrics(results, model_name):
    avg_hits = {k: np.mean([r['hits'][k] for r in results]) for k in results[0]['hits'].keys()}
    avg_mrr = np.mean([r['mrr'] for r in results])
    
    print(f"\n{model_name} Average Metrics:")
    for k, v in avg_hits.items():
        print(f"{k.upper()}: {v:.3f}")
    print(f"MRR: {avg_mrr:.3f}")

print_avg_metrics(tfidf_results, "TF-IDF")
print_avg_metrics(dense_results, "Dense Model")

# Print per-query results
print("\nPer-Query Results:")
for i, query in enumerate(unique_queries):
    print(f"\nQuery: {query}")
    print("TF-IDF:", tfidf_results[i])
    print("Dense:", dense_results[i])


TF-IDF Average Metrics:
HITS@1: 0.130
HITS@5: 0.503
HITS@10: 0.689
MRR: 0.284

Dense Model Average Metrics:
HITS@1: 0.137
HITS@5: 0.554
HITS@10: 0.748
MRR: 0.308

Per-Query Results:

Query: 6 dining chairs
TF-IDF: {'query': '6 dining chairs', 'hits': {'hits@1': 0.045454545454545456, 'hits@5': 0.22727272727272727, 'hits@10': 0.45454545454545453}, 'mrr': 0.13313492063492063}
Dense: {'query': '6 dining chairs', 'hits': {'hits@1': 0.045454545454545456, 'hits@5': 0.22727272727272727, 'hits@10': 0.4090909090909091}, 'mrr': 0.12555916305916306}

Query: a intex pool pump
TF-IDF: {'query': 'a intex pool pump', 'hits': {'hits@1': 0.16666666666666666, 'hits@5': 0.8333333333333334, 'hits@10': 1.0}, 'mrr': 0.40833333333333327}
Dense: {'query': 'a intex pool pump', 'hits': {'hits@1': 0.16666666666666666, 'hits@5': 0.8333333333333334, 'hits@10': 1.0}, 'mrr': 0.4083333333333334}

Query: activated carbon mask
TF-IDF: {'query': 'activated carbon mask', 'hits': {'hits@1': 0.1, 'hits@5': 0.5, 'hits@10': 

## Re-ranking



In [21]:

# ---
# Part 1: Define Evaluation Metrics (Copied from previous script)
# ---
print("\n--- Iteration 2: Re-ranking with a Cross-Encoder ---")
print("Defining evaluation metrics...")

def calculate_reciprocal_rank(retrieved_ids, ground_truth_ids):
    """Calculates the reciprocal rank for a single query."""
    ground_truth_set = set(ground_truth_ids)
    for i, p_id in enumerate(retrieved_ids):
        if p_id in ground_truth_set:
            return 1.0 / (i + 1) # Rank is i+1
    return 0.0

def calculate_recall_at_k(retrieved_ids, ground_truth_ids, k):
    """Calculates HITS@k (Recall@k) for a single query."""
    ground_truth_set = set(ground_truth_ids)
    retrieved_at_k = set(retrieved_ids[:k])
    
    hits = len(ground_truth_set.intersection(retrieved_at_k))
    
    if not ground_truth_set:
        return 0.0
        
    return hits / len(ground_truth_set)

print("Metric functions (MRR, HITS@k) defined.")

# ---
# Part 2: Load Re-ranking Model and Re-init Clients
# ---
print("\n--- Part 2: Loading Models and Clients ---")

# Load the new re-ranking model
RE_RANKER_MODEL = 'cross-encoder/ms-marco-MiniLM-L-6-v2'
print(f"Loading re-ranking model: {RE_RANKER_MODEL}...")
cross_encoder_model = CrossEncoder(RE_RANKER_MODEL)
print("Re-ranker model loaded.")



# We also still need the original dense model to create the query vectors
# (We assume MODEL_NAME is in memory from cell "3b")
print(f"Loading retrieval model: {MODEL_NAME}...")
retrieval_model = SentenceTransformer(MODEL_NAME)
print("Retrieval model loaded.")


# ---
# Part 3: Run Full Evaluation (Retrieve + Re-rank)
# ---
print("\n--- Part 3: Running Full Evaluation (Retrieve + Re-rank) ---")

# We need to retrieve *more* candidates to re-rank. Let's get the Top 50.
N_RETRIEVE = 50
N_NEIGHBORS_FINAL = 10 # Our final HITS@N will still be for the Top 10

# (We assume query_eval_set is in memory from your "Intro" cell)
# --- FIX: Create ground_truth_map and unique_queries ---
print("Creating ground truth map...")
ground_truth_map = query_eval_set.groupby('query')['product_id'].apply(list).to_dict()
unique_queries = list(ground_truth_map.keys())
# --- End of FIX ---

print(f"Found {len(unique_queries)} unique queries to evaluate.")

# Lists to store scores for the re-ranked model
rerank_mrrs = []
rerank_recalls_at_1 = []
rerank_recalls_at_5 = []
rerank_recalls_at_10 = []

# Loop through all unique queries with a progress bar
for query in tqdm(unique_queries, desc="Evaluating (Retrieve + Re-rank)"):
    ground_truth_ids = ground_truth_map[query]
    
    # --- 1. Stage 1: Retrieve (from ChromaDB) ---
    query_vector_dense = retrieval_model.encode([query]).tolist()
    
    search_results = collection.query( 
        query_embeddings=query_vector_dense,
        n_results=N_RETRIEVE, # Get 50 candidates
    )
    
    # Get the product_ids and text from the metadata
    retrieved_metadatas = search_results['metadatas'][0]
    candidate_ids = [meta['product_id'] for meta in retrieved_metadatas]
    candidate_texts = [meta['text'] for meta in retrieved_metadatas]

    # --- 2. Stage 2: Re-rank (with Cross-Encoder) ---
    
    # Create (query, document) pairs for the re-ranker
    rerank_pairs = [(query, doc_text) for doc_text in candidate_texts]
    
    # Get new scores. The model.predict() method is highly optimized.
    rerank_scores = cross_encoder_model.predict(rerank_pairs)
    
    # Combine the product IDs with their new scores
    reranked_results = list(zip(candidate_ids, rerank_scores))
    
    # Sort the results by the new score (highest first)
    reranked_results.sort(key=lambda x: x[1], reverse=True)
    
    # Get the new, re-sorted list of product IDs
    retrieved_ids_reranked = [p_id for p_id, score in reranked_results]

    # --- 3. Evaluate the NEW Top 10 ---
    # We only care about the metrics for the final Top 10 list
    
    # Calculate and store scores
    rerank_mrrs.append(calculate_reciprocal_rank(retrieved_ids_reranked, ground_truth_ids))
    rerank_recalls_at_1.append(calculate_recall_at_k(retrieved_ids_reranked, ground_truth_ids, k=1))
    rerank_recalls_at_5.append(calculate_recall_at_k(retrieved_ids_reranked, ground_truth_ids, k=5))
    rerank_recalls_at_10.append(calculate_recall_at_k(retrieved_ids_reranked, ground_truth_ids, k=N_NEIGHBORS_FINAL))

print("Re-ranking complete.")



--- Iteration 2: Re-ranking with a Cross-Encoder ---
Defining evaluation metrics...
Metric functions (MRR, HITS@k) defined.

--- Part 2: Loading Models and Clients ---
Loading re-ranking model: cross-encoder/ms-marco-MiniLM-L-6-v2...
Re-ranker model loaded.
Loading retrieval model: all-MiniLM-L6-v2...
Retrieval model loaded.

--- Part 3: Running Full Evaluation (Retrieve + Re-rank) ---
Creating ground truth map...
Found 50 unique queries to evaluate.


Evaluating (Retrieve + Re-rank): 100%|██████████| 50/50 [02:17<00:00,  2.75s/it]

Re-ranking evaluation complete.

--- FINAL QUANTIFIED METRICS (Iteration 2) ---

--- TF-IDF (Baseline) ---
  (Previous 'tfidf_mrrs' not found in memory)

--- Dense Model (Baseline) ---
  (Previous 'dense_mrrs' not found in memory)

--- Re-ranked Dense Model (Iteration 2) ---
  Mean Reciprocal Rank (MRR): 0.9519
  HITS@1 (Recall@1):          0.1381
  HITS@5 (Recall@5):          0.5641
  HITS@10 (Recall@10):        0.7858

--- Analysis ---
We are comparing the 'Re-ranked' model to the 'Dense Model (Baseline)'.
If the new metrics are higher, our secondary ranking logic was a success.





In [24]:
print("\n--- TF-IDF (Baseline) ---")
print(f"  Mean Reciprocal Rank (MRR): {np.mean(tfidf_mrrs):.4f}") 
print(f"  HITS@10 (Recall@10):        {np.mean(tfidf_recalls_at_10):.4f}")

print("\n--- Dense Model (Baseline) ---")
print(f"  Mean Reciprocal Rank (MRR): {np.mean(dense_mrrs):.4f}") 
print(f"  HITS@10 (Recall@10):        {np.mean(dense_recalls_at_10):.4f}")

print("\n--- Re-ranked Dense Model (Iteration 2) ---")
print(f"  Mean Reciprocal Rank (MRR): {np.mean(rerank_mrrs):.4f}")
print(f"  HITS@1 (Recall@1):          {np.mean(rerank_recalls_at_1):.4f}")
print(f"  HITS@5 (Recall@5):          {np.mean(rerank_recalls_at_5):.4f}")
print(f"  HITS@10 (Recall@10):        {np.mean(rerank_recalls_at_10):.4f}")

print("\n--- Analysis ---")
print("We are comparing the 'Re-ranked' model to the 'Dense Model (Baseline)'.")



--- TF-IDF (Baseline) ---
  Mean Reciprocal Rank (MRR): 0.2843
  HITS@10 (Recall@10):        0.6891

--- Dense Model (Baseline) ---
  Mean Reciprocal Rank (MRR): 0.3076
  HITS@10 (Recall@10):        0.7483

--- Re-ranked Dense Model (Iteration 2) ---
  Mean Reciprocal Rank (MRR): 0.9519
  HITS@1 (Recall@1):          0.1381
  HITS@5 (Recall@5):          0.5641
  HITS@10 (Recall@10):        0.7858

--- Analysis ---
We are comparing the 'Re-ranked' model to the 'Dense Model (Baseline)'.
