# Semantic Search Vector Database Testing

This notebook helps you evaluate how well our movie data index is performing
## What We Will Test
- How do we measure search quality
- How quickly do users find what they expect?
- Are the best results at the top?

### 1. Load environment variables and configure setup

In [1]:
import os
import openai
import logging
import numpy as np
from pinecone import Pinecone, ServerlessSpec
from dotenv import load_dotenv
from setup_pinecone_index import get_embedding, test_index

# Load any global variables
load_dotenv()

# OpenAI configuration
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
ENC_MODEL = "text-embedding-3-small"  # OpenAI embedding model
EMB_DIM = 512  # Embedding dimensions (512 for efficiency, 1536 for max quality)

# Pinecone configuration
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
INDEX_NAME = 'semantic-search-movie-demo'  # Name of your Pinecone index
CLOUD_PROVIDER = 'aws'  # Cloud provider (aws, gcp, or azure)
REGION = 'us-east-1'  # Region for serverless deployment

# Processing configuration
BATCH_SIZE = 100  # Number of vectors to upsert at once
MAX_MOVIES = 1000  # Limit dataset to first N movies (None for all ~1M movies)

# Initialize OpenAI
openai.api_key = OPENAI_API_KEY

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

  from .autonotebook import tqdm as notebook_tqdm


### 2. Initialize Pinecone client and connect to index

In [2]:
pc = Pinecone(api_key=PINECONE_API_KEY)

# Check if index already exists
existing_indexes = [p['name'] for p in pc.list_indexes()]

logging.info("\n\t1.Setting up Pinecone index...")
if INDEX_NAME in existing_indexes:
    logging.info(f"\n\t2.Index '{INDEX_NAME}' already exists. Loading index.")
else:

    logging.info(f"\n\t2.Creating new index '{INDEX_NAME}'...")

    # Configure serverless spec
    spec = ServerlessSpec(cloud=CLOUD_PROVIDER, region=REGION)

    # Create index
    pc.create_index(
        name=INDEX_NAME,
        dimension=EMB_DIM,
        metric='cosine',
        spec=spec
    )

    logging.info(f"\n\t\tIndex '{INDEX_NAME}' created successfully!")

# Connect to index
logging.info("\n\t3.Connecting to index...")
index = pc.Index(INDEX_NAME)

# Display current index stats
stats = index.describe_index_stats()
logging.info(f"\nCurrent index stats: \n{stats}")

2025-11-14 15:58:53,524 - INFO - 
	1.Setting up Pinecone index...
2025-11-14 15:58:53,525 - INFO - 
	2.Index 'semantic-search-movie-demo' already exists. Loading index.
2025-11-14 15:58:53,526 - INFO - 
	3.Connecting to index...
2025-11-14 15:58:55,641 - INFO - 
Current index stats: 
{'dimension': 512,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {'': {'vector_count': 530}},
 'total_vector_count': 530,
 'vector_type': 'dense'}


### 3. Test Index with a few sample queries

In [3]:
test_queries = [
    "What was the movie titled Whiplash's rating?",
    "What are the best sci-fi movies",
    "Show me movies like The Wailing",
    "What horror movies do you recomend?",
    "What year was 22 Jumpstreet released on?"
]

test_index(index, test_queries=None, top_k=5)

2025-11-14 15:58:55,651 - INFO - 
2025-11-14 15:58:55,652 - INFO - TESTING INDEX
2025-11-14 15:58:55,656 - INFO - 
Query: What was the movie titled Whiplash's rating?
2025-11-14 15:58:55,658 - INFO - --------------------------------------------------------------------------------
2025-11-14 15:58:56,400 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-11-14 15:58:56,487 - INFO - Top 5 results:
2025-11-14 15:58:56,488 - INFO -   2. Whiplash (2014) (Rating: 5.0, Similarity: 0.787)
2025-11-14 15:58:56,488 - INFO -   3. Momentum (2015) (Rating: 2.0, Similarity: 0.576)
2025-11-14 15:58:56,489 - INFO -   4. The Wailing (2016) (Rating: 3.0, Similarity: 0.576)
2025-11-14 15:58:56,490 - INFO -   5. Good Kill (2014) (Rating: 3.5, Similarity: 0.572)
2025-11-14 15:58:56,490 - INFO -   6. 23 Blast (2014) (Rating: 3.0, Similarity: 0.569)
2025-11-14 15:58:56,491 - INFO - 
Filtered results (rating >= 4.0):
2025-11-14 15:58:56,524 - INFO -   1. Whiplash (2014) (Ra




2025-11-14 15:58:56,808 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-11-14 15:58:56,875 - INFO - Top 5 results:
2025-11-14 15:58:56,876 - INFO -   2. Eurocrime! The Italian Cop and Gangster Films That Ruled the '70s (2012) (Rating: 5.0, Similarity: 0.348)
2025-11-14 15:58:56,877 - INFO -   3. Kung Fury (2015) (Rating: 5.0, Similarity: 0.304)
2025-11-14 15:58:56,877 - INFO -   4. Absolutely Fabulous: The Movie (2016) (Rating: 3.5, Similarity: 0.302)
2025-11-14 15:58:56,878 - INFO -   5. Power Rangers (2017) (Rating: 3.0, Similarity: 0.288)
2025-11-14 15:58:56,879 - INFO -   6. Pacific Rim: Uprising (2018) (Rating: 3.5, Similarity: 0.280)
2025-11-14 15:58:56,879 - INFO - 
Filtered results (rating >= 4.0):
2025-11-14 15:58:56,916 - INFO -   1. Eurocrime! The Italian Cop and Gangster Films That Ruled the '70s (2012) (Rating: 5.0, Similarity: 0.348)
2025-11-14 15:58:56,917 - INFO -   2. Kung Fury (2015) (Rating: 5.0, Similarity: 0.304)
2025-11-14 1




2025-11-14 15:58:57,721 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-11-14 15:58:57,754 - INFO - Top 5 results:
2025-11-14 15:58:57,755 - INFO -   2. Star Trek Beyond (2016) (Rating: 1.5, Similarity: 0.527)
2025-11-14 15:58:57,755 - INFO -   3. Leviathan (2014) (Rating: 4.5, Similarity: 0.519)
2025-11-14 15:58:57,756 - INFO -   4. The Cloverfield Paradox (2018) (Rating: 3.5, Similarity: 0.514)
2025-11-14 15:58:57,757 - INFO -   5. Alien: Covenant (2017) (Rating: 4.5, Similarity: 0.507)
2025-11-14 15:58:57,758 - INFO -   6. Hidden Figures (2016) (Rating: 3.5, Similarity: 0.506)
2025-11-14 15:58:57,758 - INFO - 
Filtered results (rating >= 4.0):
2025-11-14 15:58:57,791 - INFO -   1. Leviathan (2014) (Rating: 4.5, Similarity: 0.519)
2025-11-14 15:58:57,792 - INFO -   2. Alien: Covenant (2017) (Rating: 4.5, Similarity: 0.507)
2025-11-14 15:58:57,793 - INFO -   3. Guardians of the Galaxy (2014) (Rating: 5.0, Similarity: 0.500)
2025-11-14 15:58:57,7




### 4. Evaluate Performance

#### a. Manually Labeled Evaluation

In [4]:
# ============================================
# QUERY TEST SET
# ============================================

test_queries = [
    {
        "query": "Find science fiction movies with a high rating",
        "min_rating": 4.0,
        # Expected movies
        "relevant_titles": [
            "The Matrix",
            "Inception",
            "Interstellar"
        ],
        
        # How relevant is each movie?
        "relevance_scores": {
            "The Matrix": 5,
            "Inception": 5,
            "Interstellar": 5,
            "Star Wars": 4,
            "Blade Runner": 4
        }
    },
    {
        "query": "Show me classic horror movies",
        "min_rating": 0.0,
        
        "relevant_titles": [
            "The Exorcist",
            "Psycho",
            "The Shining",
            "The Conjuring"
        ],
        
        "relevance_scores": {
            "The Exorcist": 5,
            "Psycho": 5,
            "The Shining": 5,
            "Halloween": 4,
            "The Conjuring": 4
        }
    },
]
print(f"{len(test_queries)} test queries defined")

2 test queries defined


##### Metrics Explained

##### 1. Reciprocal Rank (RR)
**Question:** How quickly do users find what they need?

**Example:**
```
Query: "Best sci-fi movies"
Results:
  1. Romance Movie ❌
  2. Drama Movie ❌
  3. The Matrix ✅ ← First relevant at position 3

RR = 1/3 = 0.333
```

**Interpretation:**
- 1.0 = Perfect!
- 0.5 = Good (found at position 2)
- 0.1 = Poor (found at position 10)
- 0.0 = Not found

###### *Note: low values might mean we need more movies, better embeddings, or more metadata*
---

##### 2. NDCG (Normalized Discounted Cumulative Gain)
**Question:** Are the BEST results at the TOP?

**Example:**
```
Your ranking:
  1. Good Movie (relevance: 3)
  2. Bad Movie (relevance: 0)
  3. GREAT Movie (relevance: 5) ← Should be #1!

NDCG ≈ 0.75 (decent but not perfect)

Perfect ranking:
  1. GREAT Movie (5)
  2. Good Movie (3)
  3. Bad Movie (0)

NDCG = 1.0 (perfect!)
```

**Interpretation:**
- 0.8-1.0 = Excellent
- 0.6-0.8 = Good
- 0.4-0.6 = Fair
- <0.4 = Needs work

###### *Note: low values might mean we need to improve our embedding or if we are filtering too hard on the rating filter*

##### Run Evaluation

In [5]:
from utils import evaluate_query, save_results

# Run all queries
all_results = []
for query_data in test_queries:
    result = evaluate_query(index, query_data, k=10)
    all_results.append(result)

save_results(all_results, "human_judge_results.json")

2025-11-14 15:58:57,952 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"



QUERY: Find science fiction movies with a high rating

Top 10 Results:
--------------------------------------------------------------------------------
  2. ✗ Guardians of the Galaxy (2014) (5.0)
  3. ✗ Arrival (2016) (5.0)
  4. ✗ Alien: Covenant (2017) (4.5)
  5. ✗ Leviathan (2014) (4.5)
  6. ✗ Spectre (2015) (4.5)
  7. ✗ Kung Fury (2015) (5.0)
  8. ✗ Kubo and the Two Strings (2016) (5.0)
  9. ✗ Predestination (2014) (4.5)
  10. ✗ Sicario (2015) (5.0)
  11. ✗ Bird Box (2018) (4.0)

METRICS:
--------------------------------------------------------------------------------
  Reciprocal Rank: 0.000

  NDCG@10: 0.000
    Needs improvement

  Hit Rate@10: 0.0

QUERY: Show me classic horror movies


2025-11-14 15:58:58,101 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"



Top 10 Results:
--------------------------------------------------------------------------------
  2. ✗ Fright Night 2: New Blood (2013) (2.0)
  3. ✗ Scare Campaign (2016) (3.0)
  4. ✗ Fear, Inc. (2016) (1.5)
  5. ✗ The Fear of 13 (2015) (3.0)
  6. ✗ The Terror Live (2013) (3.5)
  7. ✗ V/H/S (2012) (2.0)
  8. ✗ The Witch (2015) (4.0)
  9. ✗ Paranormal Activity 4 (2012) (5.0)
  10. ✓ The Conjuring 2 (2016) (3.5, Rel:4/5)
  11. ✗ It Follows (2014) (3.0)

METRICS:
--------------------------------------------------------------------------------
  Reciprocal Rank: 0.111
    → First relevant at position 9

  NDCG@10: 0.058
    Needs improvement

  Hit Rate@10: 1.0

Results saved to human_judge_results.json


#### b. LLM Labeled Evaluation

Q: What if we don't have any truth values?
- ANS: Use an LLM as our evaluator

In [7]:
from utils import evaluate_with_llm_judge

# Define your test queries
test_queries = [
    "Find science fiction movies with a high rating",
    "Best comedies released after 2010",
    "Classic horror movies",
    "Action movies with car chases",
    "Romantic movies set in Paris"
]
# Cheaper model, still good quality
model = "gpt-4o-mini"  

# Run the evaluation with top 10 results
results = evaluate_with_llm_judge(
    index=index,
    queries=test_queries,
    get_embedding_func=get_embedding,
    k=5,  # Evaluate top 10 results
    model=model,
    add_delay=True  
)

# Optionally save results
save_results(results, "llm_judge_results.json")


LLM-as-Judge Evaluation (Model: gpt-4o-mini)



Evaluating Queries:   0%|          | 0/5 [00:00<?, ?it/s]2025-11-14 15:59:38,578 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-11-14 15:59:40,416 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  1. ✗ [1.0/5] Flatliners (2017)
      Rating: 2.0 | Similarity: 0.514
      The movie "Flatliners" (2017) is a science fiction film, but it has a low user rating of 2.0/5.0, making it barely relevant to the query for high-rated science fiction movies.


2025-11-14 15:59:42,156 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  2. ✗ [2.0/5] The Cloverfield Paradox (2018)
      Rating: 3.5 | Similarity: 0.507
      While "The Cloverfield Paradox" is a science fiction movie, its user rating of 3.5/5.0 does not qualify as a "high rating," making it only slightly relevant to the query.


2025-11-14 15:59:43,894 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  3. ✗ [1.0/5] Star Trek Beyond (2016)
      Rating: 1.5 | Similarity: 0.480
      The movie "Star Trek Beyond" is a science fiction film, but its low user rating of 1.5/5.0 does not meet the query's requirement for a high rating.


2025-11-14 15:59:45,618 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  4. ✓ [5.0/5] Guardians of the Galaxy (2014)
      Rating: 5.0 | Similarity: 0.479
      "Guardians of the Galaxy" is a science fiction movie with a perfect user rating of 5.0/5.0, making it highly relevant to the query for high-rated science fiction films.


2025-11-14 15:59:48,200 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Evaluating Queries:  20%|██        | 1/5 [00:11<00:44, 11.03s/it]

  5. ✗ [1.0/5] Annihilation (2018)
      Rating: 2.0 | Similarity: 0.479
      The movie "Annihilation" is a science fiction film, but its low user rating of 2.0/5.0 does not meet the query's requirement for a high rating.

  Query Metrics:
    Relevant Results: 1/5
    NDCG@5: 0.501
    Reciprocal Rank: 0.250


2025-11-14 15:59:48,920 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-11-14 15:59:50,969 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  1. ✓ [4.0/5] 22 Jump Street (2014)
      Rating: 4.5 | Similarity: 0.370
      "22 Jump Street" is a comedy released after 2010 and has a high user rating, making it a very relevant match for the query.


2025-11-14 15:59:52,360 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  2. ✓ [4.0/5] Ted 2 (2015)
      Rating: 4.0 | Similarity: 0.363
      Ted 2 is a comedy released after 2010 and has a good user rating, making it a very relevant match for the query.


2025-11-14 15:59:54,345 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  3. ✗ [2.0/5] Dumb and Dumber To (2014)
      Rating: 1.5 | Similarity: 0.363
      While "Dumb and Dumber To" is a comedy released after 2010, its low user rating suggests it may not be considered one of the best comedies, making it only slightly relevant to the query.


2025-11-14 15:59:56,290 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  4. ✓ [4.0/5] Barbershop: The Next Cut (2016)
      Rating: 4.0 | Similarity: 0.358
      "Barbershop: The Next Cut" is a comedy released after 2010 and has a good user rating, making it a very relevant match for the query about the best comedies from that period.


2025-11-14 15:59:58,020 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Evaluating Queries:  40%|████      | 2/5 [00:20<00:30, 10.32s/it]

  5. ✓ [4.0/5] Mike & Dave Need Wedding Dates (2016)
      Rating: 3.0 | Similarity: 0.352
      The movie "Mike & Dave Need Wedding Dates" is a comedy released after 2010, making it a good match for the query about the best comedies from that time period, despite its average user rating.

  Query Metrics:
    Relevant Results: 4/5
    NDCG@5: 0.966
    Reciprocal Rank: 1.000


2025-11-14 15:59:58,612 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-11-14 15:59:59,878 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  1. ✗ [2.0/5] Scare Campaign (2016)
      Rating: 3.0 | Similarity: 0.440
      "Scare Campaign" is a horror movie, but it is not considered a classic, making its relevance to the query weak.


2025-11-14 16:00:02,030 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  2. ✗ [1.0/5] Fright Night 2: New Blood (2013)
      Rating: 2.0 | Similarity: 0.435
      "Fright Night 2: New Blood" is a modern film and does not fit the classic horror movie category, making it barely relevant to the search query.


2025-11-14 16:00:03,772 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  3. ✗ [1.0/5] Fear, Inc. (2016)
      Rating: 1.5 | Similarity: 0.416
      "Fear, Inc." is a modern horror film that does not fit the classic horror genre, making it only minimally relevant to the search query.


2025-11-14 16:00:05,816 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  4. ✗ [1.0/5] The Terror Live (2013)
      Rating: 3.5 | Similarity: 0.409
      "The Terror Live" is not a classic horror movie; it is a contemporary thriller, making it only minimally relevant to the search query.


2025-11-14 16:00:07,456 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Evaluating Queries:  60%|██████    | 3/5 [00:30<00:19,  9.92s/it]

  5. ✗ [1.0/5] Death House (2018)
      Rating: 0.5 | Similarity: 0.403
      "Death House" is a modern film with a low user rating and does not fit the classic horror genre typically associated with the search query.

  Query Metrics:
    Relevant Results: 0/5
    NDCG@5: 1.000
    Reciprocal Rank: 0.000


2025-11-14 16:00:10,476 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-11-14 16:00:12,371 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  1. ✓ [5.0/5] Furious 7 (2015)
      Rating: 5.0 | Similarity: 0.430
      Furious 7 is an action movie that prominently features car chases, making it a perfect match for the search query.


2025-11-14 16:00:14,110 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  2. ✗ [2.0/5] Sicario (2015)
      Rating: 5.0 | Similarity: 0.380
      While "Sicario" contains action elements and some intense sequences, it is not primarily focused on car chases, making it only slightly relevant to the query.


2025-11-14 16:00:16,158 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  3. ✗ [2.0/5] Sicario: Day of the Soldado (2018)
      Rating: 3.0 | Similarity: 0.378
      While "Sicario: Day of the Soldado" is an action movie, it is not primarily focused on car chases, making its connection to the query weak.


2025-11-14 16:00:17,726 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  4. ✓ [4.0/5] Taken 3 (2015)
      Rating: 5.0 | Similarity: 0.376
      "Taken 3" features action sequences and car chases, making it a good match for the query about action movies with car chases.


2025-11-14 16:00:19,742 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Evaluating Queries:  80%|████████  | 4/5 [00:42<00:10, 10.85s/it]

  5. ✓ [4.0/5] The Transporter Refuelled (2015)
      Rating: 3.5 | Similarity: 0.369
      The Transporter Refuelled features significant car chases and action sequences, making it a very relevant match for the query about action movies with car chases.

  Query Metrics:
    Relevant Results: 3/5
    NDCG@5: 0.925
    Reciprocal Rank: 1.000


2025-11-14 16:00:20,051 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-11-14 16:00:21,656 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  1. ✗ [2.0/5] Montparnasse Bienvenüe (2017)
      Rating: 3.5 | Similarity: 0.394
      While "Montparnasse Bienvenüe" is set in Paris, it is not primarily a romantic movie, which weakens its relevance to the search query.


2025-11-14 16:00:23,204 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  2. ✗ [2.0/5] Past, The (Le passé) (2013)
      Rating: 4.0 | Similarity: 0.376
      While "Past, The (Le passé)" is set in Paris, it is not primarily a romantic movie, which weakens its connection to the search query.


2025-11-14 16:00:25,186 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  3. ✗ [1.0/5] Two Days, One Night (Deux jours, une nuit) (2014)
      Rating: 3.5 | Similarity: 0.371
      "Two Days, One Night" is not a romantic movie and does not focus on a romantic storyline set in Paris, making it barely relevant to the query.


2025-11-14 16:00:25,975 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 502 Bad Gateway"
2025-11-14 16:00:25,977 - INFO - Retrying request to /chat/completions in 0.392828 seconds
2025-11-14 16:00:31,157 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


  4. ✗ [2.0/5] Frances Ha (2012)
      Rating: 3.5 | Similarity: 0.343
      While "Frances Ha" features a character living in Paris, it is not primarily a romantic movie, making its connection to the query weak.


2025-11-14 16:00:33,444 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Evaluating Queries: 100%|██████████| 5/5 [00:56<00:00, 11.26s/it]

  5. ✗ [2.0/5] Before Midnight (2013)
      Rating: 1.0 | Similarity: 0.340
      While "Before Midnight" is part of a romantic trilogy and has elements of romance, it is not set in Paris, which significantly weakens its relevance to the search query.

  Query Metrics:
    Relevant Results: 0/5
    NDCG@5: 0.972
    Reciprocal Rank: 0.000

AGGREGATE METRICS (Across All Queries)
Mean Reciprocal Rank (MRR): 0.450
Average NDCG@5: 0.873

Results saved to llm_judge_results.json



