## **Implementation**

In [1]:
# Import all necessary libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
# Dataset: The 500 best movies - IMDB

# Link: https://www.kaggle.com/datasets/moazeldsokyx/the-500-best-movies-imdb

'''
Description: 
The dataset contains information on IMDb's top movies, including titles,
ratings, runtimes, genres, Metascores, plots, directors, stars, votes, gross 
earnings, and IMDb links.
'''

df = pd.read_csv("Top_Movies.csv")
df.head()

Unnamed: 0,Movie Name,Rating,Runtime,Genre,Metascore,Plot,Directors,Stars,Votes,Gross,Link
0,The Godfather,9.2,175 min,"Crime, Drama",100.0,"Don Vito Corleone, head of a mafia family, dec...",Francis Ford Coppola,"Marlon Brando, Al Pacino, James Caan, Diane Ke...",1914405,134966411,https://www.imdb.com/title/tt0068646/
1,The Shawshank Redemption,9.3,142 min,Drama,82.0,"Over the course of several years, two convicts...",Frank Darabont,"Tim Robbins, Morgan Freeman, Bob Gunton, Willi...",2751997,28341469,https://www.imdb.com/title/tt0111161/
2,Shichinin no samurai,8.6,207 min,"Action, Drama",98.0,Farmers from a village exploited by bandits hi...,Akira Kurosawa,"Toshirô Mifune, Takashi Shimura, Keiko Tsushim...",353392,269061,https://www.imdb.com/title/tt0047478/
3,Cidade de Deus,8.6,130 min,"Crime, Drama",79.0,"In the slums of Rio, two kids' paths diverge a...",Fernando Meirelles,"Kátia Lund, Alexandre Rodrigues, Leandro Firmi...",772169,7563397,https://www.imdb.com/title/tt0317248/
4,The Godfather Part II,9.0,202 min,"Crime, Drama",90.0,The early life and career of Vito Corleone in ...,Francis Ford Coppola,"Al Pacino, Robert De Niro, Robert Duvall, Dian...",1303664,57300000,https://www.imdb.com/title/tt0071562/


In [3]:
# Remove duplicate movies based on 'Movie Name' to ensure unique recommendations
df = df.drop_duplicates(subset="Movie Name", keep="first").reset_index(drop=True)

In [4]:
# Combine 'Plot' and 'Genre' into a single text feature  
# Repeating 'Genre' gives it more weight in the vectorization process  
df["combined_text"] = (df["Genre"] + " | ") * 2 + df["Plot"]  

# Initialize a TF-IDF Vectorizer with stop words removed  
vectorizer = TfidfVectorizer(stop_words="english")  

# Transform the combined text into TF-IDF vectors for numerical representation  
tfidf_matrix = vectorizer.fit_transform(df["combined_text"])  

In [5]:
def recommend_movies(user_query, top_N):
    # Transform user query into TF-IDF vector
    user_vector = vectorizer.transform([user_query])

    # Compute cosine similarity between user input and dataset
    similarity_scores = cosine_similarity(user_vector, tfidf_matrix).flatten()

    # Get indices of top N matches
    top_indices = similarity_scores.argsort()[::-1][:top_N]

    # Display results
    recommendation_df = df.iloc[top_indices][['Movie Name', 'Genre', 'Plot']]

    # Reset index to ensure proper indexing after filtering duplicates
    recommendation_df = recommendation_df.reset_index(drop=True)

    return recommendation_df

In [6]:
# Generate movie recommendations based on the user's input  
user_input = "I love action-packed thrillers with a Sci-Fi theme."  
recommendation_df = recommend_movies(user_input, 5)  

In [7]:
# Display the recommended movies based on the user's input  
recommendation_df  

Unnamed: 0,Movie Name,Genre,Plot
0,Jurassic Park,"Action, Adventure, Sci-Fi",A pragmatic paleontologist touring an almost c...
1,Blade Runner,"Action, Drama, Sci-Fi",A blade runner must pursue and terminate four ...
2,V for Vendetta,"Action, Drama, Sci-Fi","In a future British dystopian society, a shado..."
3,The Avengers,"Action, Sci-Fi",Earth's mightiest heroes must come together an...
4,RoboCop,"Action, Crime, Sci-Fi","In a dystopic and crime-ridden Detroit, a term..."


## **Salary Expectation**

For this entry-level internship position, my expected salary is \$1,000 - \$2,000 per month.
I am open to discussing compensation based on the scope of work, responsibilities, and learning opportunities.