# Movie Semantic Search — Solution Notebook

Follow the numbered sections as per the assignment.

## 1) Install & Import Libraries

In [1]:
# If running locally and packages are missing, uncomment the following lines:
# !pip install -q -U sentence-transformers pandas scikit-learn numpy

import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer

from movie_search import load_movies, build_encoder, compute_embeddings, search_movies


  from .autonotebook import tqdm as notebook_tqdm


## 2) Load `movies.csv` into a DataFrame

In [2]:
# Replace with your actual path if needed
CSV_PATH = "movies.csv"

df = load_movies(CSV_PATH)
df.head()


Unnamed: 0,title,plot
0,Spy Movie,A spy navigates intrigue in Paris to stop a te...
1,Romance in Paris,A couple falls in love in Paris under romantic...
2,Action Flick,A high-octane chase through New York with expl...


## 3) Create embeddings with `all-MiniLM-L6-v2`

In [3]:
model = build_encoder("sentence-transformers/all-MiniLM-L6-v2")
embeddings = compute_embeddings(model, df["plot"], batch_size=64, normalize=True)
embeddings.shape


(3, 384)

## 4) Implement `search_movies(query, top_n)`

In [4]:
# The function is implemented in movie_search.py and imported above.
# You can open that file to review the code and its docstrings.
help(search_movies)


Help on function search_movies in module movie_search:

search_movies(query: 'str', top_n: 'int', model: 'SentenceTransformer', df: 'pd.DataFrame', embeddings: 'np.ndarray', return_cols: 'Sequence[str]' = ('title', 'plot')) -> 'pd.DataFrame'
    Return a DataFrame of the top_n movies most similar to the query.
    The output includes the requested columns plus a 'similarity' column (float, descending).



## 5) Test with query `'spy thriller in Paris'`

In [5]:
query = "spy thriller in Paris"
top_n = 5
results = search_movies(query, top_n, model, df, embeddings, return_cols=("title","plot"))
results


Unnamed: 0,title,plot,similarity
0,Spy Movie,A spy navigates intrigue in Paris to stop a te...,0.769684
1,Romance in Paris,A couple falls in love in Paris under romantic...,0.38803
2,Action Flick,A high-octane chase through New York with expl...,0.256777


## Notes & Troubleshooting
- If the unit tests expect specific column names, open `tests/test_movie_search.py` and confirm the expected output schema.
- If your dataset uses `overview` instead of `plot`, `load_movies` auto-detects and renames it.
- Re-run cells if you change code in `movie_search.py` (Kernel → Restart & Run All).


In [8]:
!C:\ai\1\movie-search-assignment\venv\Scripts\python.exe -m pytest -v tests/test_movie_search.py


C:\ai\1\movie-search-assignment\venv\Scripts\python.exe: No module named pytest


In [9]:
!C:\ai\1\movie-search-assignment\venv\Scripts\python.exe -m pip install pytest


Collecting pytest
  Downloading pytest-8.4.1-py3-none-any.whl (365 kB)
     -------------------------------------- 365.5/365.5 kB 2.5 MB/s eta 0:00:00
Collecting iniconfig>=1
  Downloading iniconfig-2.1.0-py3-none-any.whl (6.0 kB)
Collecting pluggy<2,>=1.5
  Downloading pluggy-1.6.0-py3-none-any.whl (20 kB)
Installing collected packages: pluggy, iniconfig, pytest
Successfully installed iniconfig-2.1.0 pluggy-1.6.0 pytest-8.4.1



[notice] A new release of pip available: 22.3 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [11]:

!pytest -v tests/test_movie_search.py


platform win32 -- Python 3.11.0, pytest-8.4.1, pluggy-1.6.0 -- C:\ai\1\movie-search-assignment\venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: C:\ai\1\movie-search-assignment
[1mcollecting ... [0mcollected 4 items

tests/test_movie_search.py::TestMovieSearch::test_search_movies_output_format [32mPASSED[0m[32m [ 25%][0m
tests/test_movie_search.py::TestMovieSearch::test_search_movies_relevance [32mPASSED[0m[32m [ 50%][0m
tests/test_movie_search.py::TestMovieSearch::test_search_movies_similarity_range [32mPASSED[0m[32m [ 75%][0m
tests/test_movie_search.py::TestMovieSearch::test_search_movies_top_n [32mPASSED[0m[32m [100%][0m

