## **Implementation**

In [None]:
# Import all necessary libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
# Dataset: The 500 best movies - IMDB

# Link: https://www.kaggle.com/datasets/moazeldsokyx/the-500-best-movies-imdb

'''
Description: 
The dataset contains information on IMDb's top movies, including titles,
ratings, runtimes, genres, Metascores, plots, directors, stars, votes, gross 
earnings, and IMDb links.
'''

df = pd.read_csv("Top_Movies.csv")
df.head()

In [None]:
# Remove duplicate movies based on 'Movie Name' to ensure unique recommendations
df = df.drop_duplicates(subset="Movie Name", keep="first").reset_index(drop=True)

In [None]:
# Combine 'Plot' and 'Genre' into a single text feature  
# Repeating 'Genre' gives it more weight in the vectorization process  
df["combined_text"] = (df["Genre"] + " | ") * 2 + df["Plot"]  

# Initialize a TF-IDF Vectorizer with stop words removed  
vectorizer = TfidfVectorizer(stop_words="english")  

# Transform the combined text into TF-IDF vectors for numerical representation  
tfidf_matrix = vectorizer.fit_transform(df["combined_text"])  

In [None]:
def recommend_movies(user_query, top_N):
    # Transform user query into TF-IDF vector
    user_vector = vectorizer.transform([user_query])

    # Compute cosine similarity between user input and dataset
    similarity_scores = cosine_similarity(user_vector, tfidf_matrix).flatten()

    # Get indices of top N matches
    top_indices = similarity_scores.argsort()[::-1][:top_N]

    # Display results
    recommendation_df = df.iloc[top_indices][['Movie Name', 'Genre', 'Plot']]

    # Reset index to ensure proper indexing after filtering duplicates
    recommendation_df = recommendation_df.reset_index(drop=True)

    return recommendation_df

In [None]:
# Generate movie recommendations based on the user's input  
user_input = "I love action-packed thrillers with a Sci-Fi theme."  
recommendation_df = recommend_movies(user_input, 5)  

In [None]:
# Display the recommended movies based on the user's input  
recommendation_df  

## **Salary Expectation**

For this entry-level internship position, my expected salary is \$1,000 - \$2,000 per month.
I am open to discussing compensation based on the scope of work, responsibilities, and learning opportunities.