# Movie Semantic Search Assignment

This notebook demonstrates the implementation of a semantic search engine for movie plots using SentenceTransformers (`all-MiniLM-L6-v2`).

In [20]:
# Install dependencies inside Jupyter
!pip install -r requirements.txt

Defaulting to user installation because normal site-packages is not writeable


DEPRECATION: Loading egg at c:\program files\python312\lib\site-packages\vboxapi-1.0-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330

[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [13]:
import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

In [14]:
# Loading the Dataset
df = pd.read_csv("movies.csv")

In [15]:
df

Unnamed: 0,title,plot
0,Spy Movie,A spy navigates intrigue in Paris to stop a te...
1,Romance in Paris,A couple falls in love in Paris under romantic...
2,Action Flick,A high-octane chase through New York with expl...


In [16]:
# Creating Embeddings

# Loading SentenceTransformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Encoding all movie plots
embeddings = model.encode(df['plot'].tolist(), convert_to_tensor=False)

print("Embeddings created for", len(embeddings), "movies.")

Embeddings created for 3 movies.


In [17]:
# Defining the Search Function

def search_movies(query, top_n=5):
    """
    Search for the most relevant movies given a natural language query.
    """
    query_embedding = model.encode([query], convert_to_tensor=False)
    similarities = cosine_similarity(query_embedding, embeddings)[0]

    results = df.copy()
    results['similarity'] = similarities
    results = results.sort_values(by='similarity', ascending=False).head(top_n)
    return results.head(top_n).reset_index(drop=True)

In [18]:
# Example query
search_movies("spy thriller in Paris", top_n=3)

Unnamed: 0,title,plot,similarity
0,Spy Movie,A spy navigates intrigue in Paris to stop a te...,0.769684
1,Romance in Paris,A couple falls in love in Paris under romantic...,0.38803
2,Action Flick,A high-octane chase through New York with expl...,0.256777


In [19]:
# Running unit tests
!python -m unittest tests.test_movie_search -v

test_search_movies_output_format (tests.test_movie_search.TestMovieSearch.test_search_movies_output_format)
Test if search_movies returns a DataFrame with correct columns. ... ok
test_search_movies_relevance (tests.test_movie_search.TestMovieSearch.test_search_movies_relevance)
Test if returned movies are relevant to the query. ... ok
test_search_movies_similarity_range (tests.test_movie_search.TestMovieSearch.test_search_movies_similarity_range)
Test if similarity scores are between 0 and 1. ... ok
test_search_movies_top_n (tests.test_movie_search.TestMovieSearch.test_search_movies_top_n)
Test if search_movies returns the correct number of results. ... ok

----------------------------------------------------------------------
Ran 4 tests in 4.300s

OK
