---
title: "Movie-Recommendation Model"
format:
    html: 
        code-fold: true
        code-tools: true
jupyter: python3
---

# Code

Code for this webpage can be found [here.](https://github.com/dcorc7/Movie-Recommendation-Model/recommender.ipynb)

In [25]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
from tabulate import tabulate
from IPython.display import display, HTML
import ipywidgets as widgets
import panel as pn

## Load the Dataset

In [26]:
# Load the cleaned movies dataframe
movies_df = pd.read_csv("./data/processed-data/movies_cleaned.csv")

pd.set_option("display.max_columns", None)
movies_df.head(3)


Unnamed: 0,IMDB_ID,Title,Year,Release_Date,Release_Month,Age_Rating,Overview,Keywords,Genre,Director,Actors,Runtime,Metascore_Rating,IMDB_Rating,Rotten_Tomatoes_Rating,TMDB_Rating,Average_Rating,Won_Award,Oscar_Wins,Oscar_Nominations,Budget,Budget_Normalized,Revenue,Revenue_Normalized,Return_On_Investment,Popularity
0,tt0097499,henry v,1989,1989-10-05,October,pg-13,gritty adaption william shakespeares play engl...,['france kingdom theater play based on true st...,war,kenneth branagh,['kenneth branagh derek jacobi simon shepherd'],137,8.3,7.5,9.8,7.2,8.2,True,1,0,9000000,-0.873465,10200000,-0.801446,1.133333,18.771
1,tt1320253,the expendables,2010,2010-08-03,August,r,barney ross leads band highly skilled mercenar...,['rescue sniper island martial arts tattoo esc...,thriller,sylvester stallone,['sylvester stallone jason statham jet li'],103,4.5,6.4,4.2,6.2,5.325,False,0,0,80000000,0.317499,274470394,0.18825,3.43088,74.573
2,tt1025100,gemini man,2019,2019-10-02,October,pg-13,henry brogan elite 51 year assassin whos ready...,['hitman clone'],thriller,ang lee,['will smith mary elizabeth winstead clive owen'],117,3.8,5.7,2.7,6.3,4.625,False,0,0,140000000,1.323948,173469516,-0.189999,1.239068,27.266


## Clean the Dataset

In [27]:
# Remove brakcets and apostrophes from the Actors column
movies_df["Actors"] = movies_df["Actors"].str.replace("[", "", regex = False).str.replace("]", "", regex = False).str.replace("'", "", regex = False)
movies_df["Keywords"] = movies_df["Keywords"].str.replace("[", "", regex = False).str.replace("]", "", regex = False).str.replace("'", "", regex = False)


# Drop columns that won't be included in the cosine similarity calculation
columns_to_drop = ["IMDB_ID", "Keywords", "Won_Award", "Release_Date", "Release_Month", "Age_Rating", "Budget", "Revenue"]
filtered_movies_df = movies_df.drop(columns = columns_to_drop)

# PReview the new dataframe
filtered_movies_df.head(3)

Unnamed: 0,Title,Year,Overview,Genre,Director,Actors,Runtime,Metascore_Rating,IMDB_Rating,Rotten_Tomatoes_Rating,TMDB_Rating,Average_Rating,Oscar_Wins,Oscar_Nominations,Budget_Normalized,Revenue_Normalized,Return_On_Investment,Popularity
0,henry v,1989,gritty adaption william shakespeares play engl...,war,kenneth branagh,kenneth branagh derek jacobi simon shepherd,137,8.3,7.5,9.8,7.2,8.2,1,0,-0.873465,-0.801446,1.133333,18.771
1,the expendables,2010,barney ross leads band highly skilled mercenar...,thriller,sylvester stallone,sylvester stallone jason statham jet li,103,4.5,6.4,4.2,6.2,5.325,0,0,0.317499,0.18825,3.43088,74.573
2,gemini man,2019,henry brogan elite 51 year assassin whos ready...,thriller,ang lee,will smith mary elizabeth winstead clive owen,117,3.8,5.7,2.7,6.3,4.625,0,0,1.323948,-0.189999,1.239068,27.266


## Compute TD-IDF and Cosine Similarity Scores for Text Data

In [28]:
# Combine all text features of each movie into one value of a new column
filtered_movies_df["combined_text_features"] = filtered_movies_df["Overview"] + " " + filtered_movies_df["Genre"] + " " + filtered_movies_df["Director"] + " " + filtered_movies_df["Actors"]

# Create a TF-IDF matrix to vectorize words for each movie's text features
vectorizer = TfidfVectorizer(max_features = 5000)
tfidf_matrix = vectorizer.fit_transform(filtered_movies_df["combined_text_features"])

# Calculate textual cosine similarity scores for each movie
text_cos_similarity = cosine_similarity(tfidf_matrix)

## Compute Cosine Similarity Scores for Numerical Data

In [29]:
# Filter the df to only include numerical columns
numerical_features = ["Runtime", "Metascore_Rating", "IMDB_Rating", "Rotten_Tomatoes_Rating", "TMDB_Rating", "Average_Rating", 
                      "Oscar_Wins", "Return_On_Investment", "Budget_Normalized", "Revenue_Normalized", "Popularity"]

# Scale the values so that one column does not have an extreme bias towards the cosine similarity scores
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(filtered_movies_df[numerical_features])

# Calculate numerical cosine similarity scores for each movie
numerical_cos_similarity = cosine_similarity(scaled_features)

## Determine Cosine Similarity Score Weights for Each Datatype

In [48]:
# Set weights for each cosine similarity scores to determine whether text or numerical data has more say in the recommendations
text_weight = 0.25
numerical_weight = 0.75

# Create a combined cosine similarity score that uses both text and numerical features
combined_similarity = text_weight * text_cos_similarity + numerical_weight * numerical_cos_similarity

# Function to take in a movie and genreate 10 movies that are most similar to it
def recommend_movies(movie_title, top_n = 10):    
    # Obtain the index of the given movie
    selected_movie_index = filtered_movies_df[filtered_movies_df["Title"] == movie_title].index[0]

    # Obtain the similarity scores for the selected movie and place them in a list, along with each movie's index
    sim_scores = list(enumerate(combined_similarity[selected_movie_index]))

    # Sort movies based on similarity scores
    sim_scores = sorted(sim_scores, key = lambda x: x[1], reverse = True)

    # Filter the list down to n movies with the highest similiarty scores (excluding the first index/selected movie)
    sim_scores = sim_scores[1:11]

    # Get indices of the top-n similar movies
    movie_indices = [i[0] for i in sim_scores]
    movie_scores = [i[1].round(4) for i in sim_scores]
    
    # Create a new recommended movie df with selected features of the top movies by mathcing the indeces of the recommended movies
    columns_to_keep = ["IMDB_ID", "Title", "Year", "Age_Rating", "Keywords", "Director", "Actors", "Average_Rating", "Revenue", "Budget", "Oscar_Wins"]

    recommendations_df = movies_df[columns_to_keep]
    recommendations_df = recommendations_df.iloc[movie_indices]
    recommendations_df["Similarity_Score"] = movie_scores

    # Return the top-n similar movies
    return recommendations_df

## Example: Django Unchained

In [52]:
selected_movie = "pirates of the caribbean the curse of the black pearl"
recommendations = recommend_movies(selected_movie)
display(HTML(f"<h1 style='color: black;'>Movie Recommendations For: {selected_movie.upper()}</h1>"))
display(recommendations)

Unnamed: 0,IMDB_ID,Title,Year,Age_Rating,Keywords,Director,Actors,Average_Rating,Revenue,Budget,Oscar_Wins,Similarity_Score
69,tt0449088,pirates of the caribbean at worlds end,2007,pg-13,east india company exotic island strong woman ...,gore verbinski,johnny depp orlando bloom keira knightley,5.925,961000000,300000000,0,0.7958
258,tt0114898,waterworld,1995,pg-13,sailboat tattoo based on novel or book diving ...,kevin reynolds,kevin costner jeanne tripplehorn dennis hopper,5.675,264218220,175000000,0,0.7655
12,tt2126355,san andreas,2015,pg-13,california san francisco california looting ea...,brad peyton,dwayne johnson carla gugino alexandra daddario,5.35,473990832,110000000,0,0.7654
1620,tt7975244,jumanji the next level,2019,pg-13,magic animal attack body swap adventurer super...,jake kasdan,dwayne johnson jack black kevin hart,6.65,801693929,125000000,0,0.7651
265,tt0332452,troy,2004,r,epic adultery sibling relationship hostility b...,wolfgang petersen,brad pitt eric bana orlando bloom,6.35,497400000,175000000,0,0.7644
176,tt2873282,red sparrow,2018,r,central intelligence agency cia sexual abuse b...,francis lawrence,jennifer lawrence joel edgerton matthias schoe...,5.725,151572634,69000000,0,0.7614
1508,tt0251075,evolution,2001,pg-13,governor fire engine giant monster grand canyo...,ivan reitman,david duchovny orlando jones julianne moore,5.15,98376292,80000000,0,0.7604
633,tt0236493,the mexican,2001,r,mexico kidnapping romantic comedy road trip th...,gore verbinski,brad pitt julia roberts james gandolfini,5.425,147845033,57000000,0,0.7595
250,tt0320661,kingdom of heaven,2005,r,epic blacksmith muslim crusade jerusalem catap...,ridley scott,orlando bloom eva green liam neeson,6.125,218100001,130000000,0,0.7594
1594,tt5220122,hotel transylvania 3 summer vacation,2018,pg,monster vampire vacation cruise ship summer va...,genndy tartakovsky,adam sandler andy samberg selena gomez,6.2,528600000,80000000,0,0.7592


## Recommended Movies For User-Inputted Titles

You can input any movie title that exists in the provided movie database to receive recommendations for the top 10 most similar movies. Please note that all movie titles in the database are stored in lowercase format, so ensure that your input matches this format exactly. For example, instead of typing "Django Unchained," enter "django unchained." The recommendations will include detailed information about each similar movie, such as its title, release year, director, actors, average rating, and other features. If the movie you enter does not exist in the database, you will receive an error, so double-check your input for accuracy.

**Example Movies:**

- django unchained
- inception
- titanic
- avatar
- interstellar

In [51]:
pn.extension()

# Panel-based user input
movie_input = pn.widgets.TextInput(name = "Enter Movie Title", placeholder = "Type a movie name...")
button = pn.widgets.Button(name = "Recommend Movies")
output = pn.pane.DataFrame(width = 800)

def on_button_click(event):
    movie_title = movie_input.value
    try:
        # Call recommend_movies function and get recommendations
        recommendations = recommend_movies(movie_title)
        
        # Set the output pane to display the recommendations DataFrame
        output.object = recommendations
    except Exception as e:
        # Handle errors by displaying them in the output
        output.object = f"Error: {e}"

button.on_click(on_button_click)

layout = pn.Column(
    "# Movie Recommendation System",
    movie_input,
    button,
    output
)

layout

BokehModel(combine_events=True, render_bundle={'docs_json': {'b58ac9e1-92d5-47e9-84ed-f3f1e6ee136a': {'version…