# Movie Semantic Search Assignment

This notebook is my solution for the movie semantic search assignment. I'm trying to learn step by step.

## (a) Install and Import Libraries

In [None]:
# I need pandas for handling data, sklearn for similarity, and sentence-transformers for embeddings
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer

## (b) Load movies.csv into a pandas DataFrame

In [None]:
# Loading the movie dataset
df = pd.read_csv("movies.csv")
df.head()

## (c) Create embeddings using all-MiniLM-L6-v2

In [None]:
# I load the model which converts text into vectors
model = SentenceTransformer("all-MiniLM-L6-v2")

# Now I encode all the movie plots into embeddings
movie_embeddings = model.encode(df['plot'].tolist(), convert_to_tensor=True)

## (d) Implement search_movies function

In [None]:
# Function to search movies similar to a query
def search_movies(query, top_n=5):
    # encode the query into embedding
    query_embedding = model.encode([query], convert_to_tensor=True)
    
    # calculate cosine similarity between query and all movies
    similarity_scores = cosine_similarity(query_embedding.cpu(), movie_embeddings.cpu())[0]
    
    # get top n indices
    top_indices = similarity_scores.argsort()[-top_n:][::-1]
    
    # return DataFrame with results
    results = df.iloc[top_indices].copy()
    results["similarity"] = similarity_scores[top_indices]
    return results

## (e) Test with query 'spy thriller in Paris'

In [None]:
# Let's test my function
search_movies("spy thriller in Paris", top_n=3)