# Embedding Model Evaluation for Activity Similarity Search

This notebook evaluates the performance of an embedding model for semantic similarity search of activity descriptions. The evaluation uses a Qdrant vector database and measures how well the model can match paraphrased activities to their original versions.

## Evaluation Metrics
- **MRR (Mean Reciprocal Rank)**: Average of reciprocal ranks of first correct matches
- **Hits@K**: Proportion of queries where the correct answer is among the first K results

## Configuration
The notebook supports various noise levels to simulate text spelling errors..


## 1. Vector Database Initialization

This section sets up the components required for vector search:

- **Embedding Model**: Converts text into numerical vectors. The embedding model (EM) can be loaded using Hugging Face’s Sentence Transformers library, allowing you to easily switch between different pre-trained models for experimentation and testing.
- **Qdrant Client**: Establishes a connection to the Qdrant vector database, which efficiently stores and retrieves vector embeddings.
- **Vector Store**: Provides an interface for similarity searches, wrapping the Qdrant client to handle vector insertion, querying, and database management.
- **Noise Configuration**: You can adjust the `noise_error_rate` variable to simulate noise in the embeddings.

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from qdrant_client import QdrantClient
from langchain_qdrant import QdrantVectorStore
import json

noise_error_rate = 0.0

# Embedding model for semantic similarity search
embedding_model_used = "all-MiniLM-L6-v2"

# Create collection name based on model name
collection_name = embedding_model_used.replace("/", "-")

# Embedding Model Setup
model_kwargs = {"trust_remote_code": True}
encode_kwargs = {
    "normalize_embeddings": False,
}

# Initialize the embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name=embedding_model_used,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
)

# Qdrant Vector Database Setup
qdrant = QdrantClient(host="localhost", port=6333)
vector_store = QdrantVectorStore(
    client=qdrant,
    collection_name=collection_name,
    embedding=embedding_model
)

# Determine vector dimensions
sample_embedding = embedding_model.embed_query("Test")
vector_size = len(sample_embedding)

print(f"Vector Dimensions: {vector_size}")
print(f"Model Used: {embedding_model_used}")
print(f"Collection Name: {collection_name}")
print(f"Noise Level: {noise_error_rate}")


## 2. Similarity Search Evaluation

This section performs the main evaluation:

1. **Data Loading**: Loads ground truth data with original activities and their paraphrases
2. **Similarity Search**: Performs similarity search for each paraphrase
3. **Ranking Assessment**: Determines the position of the correct answer in search results
4. **Metrics Calculation**: Calculates MRR (Mean Reciprocal Rank) and Hits@K values

### Evaluation Logic:
- Each paraphrase is used as a query
- The vector DB returns the top-10 most similar activities
- We measure at which position (rank) the original activity is found
- A lower rank (position 1) is better than a high rank (position 10)


In [None]:
import csv
import os
from tqdm import tqdm

with open(f"../utils/activities_with_synonyms_merged_noise_{noise_error_rate}.json", "r", encoding="utf-8") as f:
    merged_data = json.load(f)

print(f"Loaded {len(merged_data)} activity groups")

total_queries = 0
reciprocal_ranks = []
hits_at_k = {1: 0, 2: 0, 3: 0, 5: 0, 10: 0}

results = []

print(f"\nStarting evaluation for model '{embedding_model_used}'...")

for entry in tqdm(merged_data, desc="Evaluating activity synonyms"):
    original = entry["original_activity"]
    paraphrases = entry["paraphrases"]

    for synonym in paraphrases:
        search_results = vector_store.similarity_search_with_score(synonym, k=10)
        matched = False
        rank = None

        # Search through results for the original activity
        for i, (doc, distance) in enumerate(search_results):
            if doc.page_content == original:
                rank = i + 1
                matched = True
                break

        total_queries += 1

        # Calculate reciprocal rank
        reciprocal_ranks.append(1 / rank if matched else 0)

        # Update Hits@K counters
        for k in hits_at_k:
            if matched and rank <= k:
                hits_at_k[k] += 1

        top_match_text = search_results[0][0].page_content
        similarity = 1 - search_results[0][1]

        results.append({
            "Original Activity": original,
            "Paraphrased Activity": synonym,
            "Found Activity": top_match_text,
            "Similarity": round(similarity, 4),
            "Exact_Match": "true" if top_match_text == original else "false"
        })

# Save results
csv_filename = f"evaluation_results/noise_{noise_error_rate}/{collection_name}.csv"
print(f"\nSaving detailed results to: {csv_filename}")

# Save all detailed results to CSV file
os.makedirs(os.path.dirname(csv_filename), exist_ok=True)
with open(csv_filename, "w", newline="", encoding="utf-8") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=[
        "Paraphrased Activity", "Found Activity", "Original Activity", "Similarity", "Exact_Match"
    ])
    writer.writeheader()
    writer.writerows(results)

mrr = sum(reciprocal_ranks) / total_queries

print(f"\nEvaluation Results for '{embedding_model_used}':")
print(f"Total Queries: {total_queries}")
print(f"MRR (Mean Reciprocal Rank): {mrr:.4f}")
print("\nHits@K Metrics:")
for k in sorted(hits_at_k):
    accuracy = hits_at_k[k] / total_queries
    print(f"   Hits@{k}: {hits_at_k[k]}/{total_queries} = {accuracy:.4f} ({accuracy*100:.2f}%)")

print(f"\nDetailed results saved: {csv_filename}")


## 3. Export Summary Statistics

This final section exports a compact summary of all important metrics to a CSV file. This enables easy comparison between different models and configurations.

### Update Logic:
- If results for the current model already exist, they are updated
- New models are added to the existing summary
- The CSV file serves as a central overview of all evaluations

### Exported Metrics:
- Model name and vector dimensions
- Total number of tested paraphrases
- MRR and all Hits@K values


In [None]:
import os

summary_csv_path = f"evaluation_results_noise_{noise_error_rate}.csv"
print(f"Summary file: {summary_csv_path}")

# Create dictionary with all important metrics for current model
summary_row = {
    "Model Name": embedding_model_used,
    "Dimension": vector_size,
    "Total Synonyms": total_queries,
    "MRR": round(mrr, 4),
    "Hits@1": round(hits_at_k[1] / total_queries, 4),
    "Hits@2": round(hits_at_k[2] / total_queries, 4),
    "Hits@3": round(hits_at_k[3] / total_queries, 4),
    "Hits@5": round(hits_at_k[5] / total_queries, 4),
    "Hits@10": round(hits_at_k[10] / total_queries, 4),
}

summary_fieldnames = list(summary_row.keys())
summary_data = []
if os.path.exists(summary_csv_path):
    print("Loading existing summary...")
    with open(summary_csv_path, "r", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        summary_data = list(reader)
    print(f"Found {len(summary_data)} existing entries")
else:
    print("Creating new summary file...")

# Update or add model entry
updated = False
for i, row in enumerate(summary_data):
    # Search for existing entry for current model
    if row["Model Name"] == embedding_model_used:
        print(f"Updating existing entry for '{embedding_model_used}'")
        summary_data[i] = summary_row  # Replace existing entry
        updated = True
        break

# If no existing entry found, add new one
if not updated:
    print(f"Adding new entry for '{embedding_model_used}'")
    summary_data.append(summary_row)

# Save updated summary
print(f"Saving updated summary...")
with open(summary_csv_path, "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=summary_fieldnames)
    writer.writeheader()
    writer.writerows(summary_data)

print(f"Summary successfully updated: {summary_csv_path}")
print(f"Total models in summary: {len(summary_data)}")

# Final summary
print(f"Evaluation for '{embedding_model_used}' completed!")
print(f"Best metric (Hits@1): {hits_at_k[1] / total_queries:.2%}")
print(f"MRR Score: {mrr:.4f}")