In [2]:
import sys
import pandas as pd

sys.path.append('../src')

In [3]:
file_path = '../data/cleaned_netflix_titles.csv'
title = 'NCIS'
no_of_recommendations = 10

# 1. CountVectorizer-based Recommender System

This system uses a bag-of-words (BoW) approach to convert text data into feature vectors. It counts the occurrence of each word in the text and represents each document as a vector of word counts. The similarity between documents is then computed using cosine similarity, which measures the cosine of the angle between two vectors.

**Steps:**
1. Combine relevant features (title, director, cast, country, listed_in, description, release_year, rating) into a single string for each title.
2. Use CountVectorizer to convert the combined feature strings into a matrix of token counts.
3. Compute the cosine similarity matrix from the count matrix.
4. Get recommendations based on the similarity to a given title.

**Advantages:**
- Simple and easy to implement.
- Fast computation.

**Disadvantages:**
- Does not capture semantic meaning or context of words.
- Sensitive to the vocabulary size and the frequency of words.

In [4]:
from count_vectorizer_recommender import ContentBasedRecommendationSystem

count_vec_rec = ContentBasedRecommendationSystem(file_path)

In [5]:
count_vec_rec.get_movie_details(title)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,combined_features
4798,s4799,TV Show,NCIS,Unknown,"Mark Harmon, Michael Weatherly, Pauley Perrett...",United States,2018-07-01,2017,TV-14,15 Seasons,"Crime TV Shows, TV Dramas, TV Mysteries",Follow the quirky agents of the NCIS â the N...,"NCIS Unknown Mark Harmon, Michael Weatherly, P..."


In [6]:
count_vec_rec_results = count_vec_rec.get_recommendations(title, num_recommendations=no_of_recommendations)
count_vec_rec_results

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,combined_features,similarity_score
3581,s3582,TV Show,MINDHUNTER,Unknown,"Jonathan Groff, Holt McCallany, Anna Torv, Cot...",United States,2019-08-16,2019,TV-MA,2 Seasons,"Crime TV Shows, TV Dramas, TV Mysteries",In the late 1970s two FBI agents expand crimin...,"MINDHUNTER Unknown Jonathan Groff, Holt McCall...",0.421435
5038,s5039,TV Show,Re:Mind,Unknown,Keyakizaka46,Japan,2018-02-15,2017,TV-MA,1 Season,"International TV Shows, TV Dramas, TV Mysteries","Eleven high school classmates awaken, restrain...",Re:Mind Unknown Keyakizaka46 Japan Internation...,0.418763
6832,s6842,TV Show,Get Shorty,Unknown,"Ray Romano, Chris O'Dowd",United States,2018-11-01,2017,TV-MA,1 Season,"Crime TV Shows, TV Comedies, TV Dramas",Organized crime enforcer Miles Daly strives to...,"Get Shorty Unknown Ray Romano, Chris O'Dowd Un...",0.417029
5412,s5413,TV Show,Criminal Minds,Unknown,"Mandy Patinkin, Joe Mantegna, Thomas Gibson, S...","United States, Canada",2017-06-30,2017,TV-14,12 Seasons,"Crime TV Shows, TV Dramas, TV Mysteries",This intense police procedural follows a group...,"Criminal Minds Unknown Mandy Patinkin, Joe Man...",0.411809
4673,s4674,TV Show,Inside the Criminal Mind,Unknown,Unknown,"United States, Czech Republic",2018-08-31,2018,TV-MA,1 Season,"Crime TV Shows, Docuseries, International TV S...",Explore the psychological machinations and imm...,Inside the Criminal Mind Unknown Unknown Unite...,0.410112
657,s658,TV Show,The Mole,Unknown,Anderson Cooper,United States,2021-06-21,2001,TV-14,2 Seasons,"Reality TV, TV Action & Adventure, TV Mysteries","In this competition show, contestants try to e...",The Mole Unknown Anderson Cooper United States...,0.410112
4632,s4633,TV Show,The Good Cop,Unknown,"Tony Danza, Josh Groban, Monica Barbaro, Isiah...",United States,2018-09-21,2018,TV-14,1 Season,"Crime TV Shows, TV Comedies, TV Dramas","When he's not solving murders, a pathologicall...","The Good Cop Unknown Tony Danza, Josh Groban, ...",0.408248
6736,s6743,TV Show,Father Brown,Ian Barber,"Mark Williams, Sorcha Cusack, Nancy Carroll, A...",United Kingdom,2018-03-31,2017,TV-14,6 Seasons,"British TV Shows, Crime TV Shows, TV Dramas","A modest, compassionate priest doubles as an e...","Father Brown Ian Barber Mark Williams, Sorcha ...",0.407495
5157,s5158,TV Show,Argon,Unknown,"Woo-hee Chun, Joo-hyuk Kim, Won-sang Park",South Korea,2017-11-22,2017,TV-MA,1 Season,"International TV Shows, Korean TV Shows, TV Dr...",In a world filled with provocative (and often ...,"Argon Unknown Woo-hee Chun, Joo-hyuk Kim, Won-...",0.404085
1623,s1624,TV Show,The Bachelorette,Unknown,Unknown,United States,2020-12-01,2010,TV-14,1 Season,"Reality TV, Romantic TV Shows",Beloved âBachelorâ contestant Ali Fedotows...,The Bachelorette Unknown Unknown United States...,0.403891


---------------------

# 2. TF-IDF with SVD Recommender System

This system enhances the CountVectorizer approach by considering the importance of words. TF-IDF (Term Frequency-Inverse Document Frequency) assigns weights to words based on their frequency in a document and their rarity across the corpus. To further reduce dimensionality and capture latent semantic relationships, Truncated SVD (Singular Value Decomposition) is applied.

**Steps:**
1. Combine relevant features into a single string for each title.
2. Use TF-IDF Vectorizer to convert the combined feature strings into a matrix of TF-IDF features.
3. Apply Truncated SVD to reduce the dimensionality of the TF-IDF matrix.
4. Compute the cosine similarity matrix from the reduced TF-IDF matrix.
5. Get recommendations based on the similarity to a given title.

**Advantages:**
- Captures the importance of words and reduces noise.
- Latent semantic relationships are identified through dimensionality reduction.

**Disadvantages:**
- Choosing the number of components in SVD can be challenging.



In [7]:
from tfidf_svd_recommender import ContentBasedRecommendationSystemWithSVD

tfidf_svd_rec = ContentBasedRecommendationSystemWithSVD(file_path)

In [8]:
tfidf_svd_rec.get_movie_details(title)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,combined_features
4798,s4799,TV Show,NCIS,Unknown,"Mark Harmon, Michael Weatherly, Pauley Perrett...",United States,2018-07-01,2017,TV-14,15 Seasons,"Crime TV Shows, TV Dramas, TV Mysteries",Follow the quirky agents of the NCIS â the N...,"NCIS Unknown Mark Harmon, Michael Weatherly, P..."


In [9]:
tfidf_svd_rec_results = tfidf_svd_rec.get_recommendations(title, num_recommendations=no_of_recommendations)
tfidf_svd_rec_results

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,combined_features,similarity_score
1585,s1586,TV Show,Manhunt: Deadly Games,Unknown,"Cameron Britton, Jack Huston, Judith Light, Ca...",Unknown,2020-12-07,2020,TV-14,1 Season,"Crime TV Shows, TV Dramas, TV Mysteries","Despite his heroics, security guard Richard Je...","Manhunt: Deadly Games Unknown Cameron Britton,...",0.614126
5412,s5413,TV Show,Criminal Minds,Unknown,"Mandy Patinkin, Joe Mantegna, Thomas Gibson, S...","United States, Canada",2017-06-30,2017,TV-14,12 Seasons,"Crime TV Shows, TV Dramas, TV Mysteries",This intense police procedural follows a group...,"Criminal Minds Unknown Mandy Patinkin, Joe Man...",0.612921
5641,s5643,Movie,Coin Heist,Emily Hagins,"Sasha Pieterse, Alexis G. Zall, Alex Saxon, Ja...",United States,2017-01-06,2017,TV-14,98 min,"Children & Family Movies, Dramas",When a crisis threatens to destroy their high ...,"Coin Heist Emily Hagins Sasha Pieterse, Alexis...",0.571949
4663,s4664,TV Show,Quantico,Unknown,"Priyanka Chopra, Josh Hopkins, Jake McLaughlin...",United States,2018-09-02,2018,TV-14,3 Seasons,"Crime TV Shows, TV Dramas, TV Mysteries",When evidence in a deadly terrorist attack imp...,"Quantico Unknown Priyanka Chopra, Josh Hopkins...",0.562055
3581,s3582,TV Show,MINDHUNTER,Unknown,"Jonathan Groff, Holt McCallany, Anna Torv, Cot...",United States,2019-08-16,2019,TV-MA,2 Seasons,"Crime TV Shows, TV Dramas, TV Mysteries",In the late 1970s two FBI agents expand crimin...,"MINDHUNTER Unknown Jonathan Groff, Holt McCall...",0.54672
4804,s4805,TV Show,Somewhere Between,Unknown,"Paula Patton, Devon Sawa, JR Bourne, Aria Birc...",United States,2018-06-30,2017,TV-14,1 Season,"Crime TV Shows, TV Dramas, TV Mysteries",A local news producer is given one chance to r...,"Somewhere Between Unknown Paula Patton, Devon ...",0.524554
8294,s8312,TV Show,The Fosters,Unknown,"Teri Polo, Sherri Saum, Jake T. Austin, Hayden...",United States,2017-10-05,2017,TV-14,5 Seasons,TV Dramas,This offbeat drama charts the ups and downs of...,"The Fosters Unknown Teri Polo, Sherri Saum, Ja...",0.511336
4410,s4411,TV Show,Damnation,Unknown,"Logan Marshall-Green, Killian Scott, Sarah Jon...",United States,2018-11-07,2017,TV-MA,1 Season,"Crime TV Shows, TV Dramas","During the Great Depression, a stranger with a...","Damnation Unknown Logan Marshall-Green, Killia...",0.510655
6258,s6265,TV Show,Beauty & the Beast,Unknown,"Kristin Kreuk, Jay Ryan, Max Brown, Austin Bas...",Canada,2016-09-19,2016,TV-14,4 Seasons,"Crime TV Shows, Romantic TV Shows, TV Dramas",A homicide detective and a veteran who has bee...,"Beauty & the Beast Unknown Kristin Kreuk, Jay ...",0.506418
3937,s3938,TV Show,Imposters,Unknown,"Inbar Lavi, Rob Heaps, Parker Young, Marianne ...",United States,2019-04-05,2017,TV-MA,2 Seasons,"Crime TV Shows, International TV Shows, TV Com...","Supported by a team of fellow thieves, a con a...","Imposters Unknown Inbar Lavi, Rob Heaps, Parke...",0.505053


---------------------

# 3. Weighted Autoencoder Recommender System

This system uses a neural network to learn an efficient representation (encoding) of the input data. It assigns different weights to features to emphasize their importance. The autoencoder compresses the input into a lower-dimensional space and then reconstructs it. The encoded representations are then used to compute similarity scores.

**Steps:**
1. Combine relevant features into a single string for each title with specified weights.
2. Use TF-IDF Vectorizer to convert the combined feature strings into a matrix of TF-IDF features.
3. Apply Truncated SVD to reduce the dimensionality of the TF-IDF matrix.
4. Build and train an autoencoder to learn compressed representations.
5. Compute the cosine similarity matrix from the encoded representations.
6. Get recommendations based on the similarity to a given title.

**Advantages:**
- Can capture complex patterns and relationships in the data.
- The weighting mechanism allows for emphasizing important features.

**Disadvantages:**
- Requires more computational resources and time to train.
- Tuning the model parameters (e.g., encoding dimension, learning rate) can be challenging.

In [10]:
from autoencoder_recommender import WeightedAutoencoderRecommendationSystem

autoencoder_rec = WeightedAutoencoderRecommendationSystem(file_path)

# Uncomment to reload the trained model
# autoencoder_rec.load_model('./autoencoder_recommender_model.h5')

# Uncomment to train a new model
# history = autoencoder_rec.train_autoencoder(epochs=10)
# autoencoder_rec.save_model('./autoencoder_recommender_model.h5')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


  saving_api.save_model(


In [11]:
autoencoder_rec_results = autoencoder_rec.get_recommendations(title)
autoencoder_rec_results

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,combined_features,similarity_score
3964,s3965,TV Show,Osmosis,Unknown,"Agathe Bonitzer, Hugo Becker, GaÃ«l Kamilindi,...",France,2019-03-29,2019,TV-MA,1 Season,"International TV Shows, TV Dramas, TV Mysteries","In near-future Paris, two brilliant siblings u...",osmosis osmosis osmosis osmosis osmosis agat...,0.990445
3577,s3578,TV Show,Better Than Us,Unknown,"Paulina Andreeva, Kirill KÃ¤ro, Aleksandr Usty...",Russia,2019-08-16,2019,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Dramas",A family on the brink of splitting up become t...,better than us better than us better than us b...,0.988784
5412,s5413,TV Show,Criminal Minds,Unknown,"Mandy Patinkin, Joe Mantegna, Thomas Gibson, S...","United States, Canada",2017-06-30,2017,TV-14,12 Seasons,"Crime TV Shows, TV Dramas, TV Mysteries",This intense police procedural follows a group...,criminal minds criminal minds criminal minds c...,0.98878
8493,s8511,TV Show,The Sniffer,Unknown,"Kirill KÃ¤ro, Ivan Oganesyan, Mariya Anikanova...",Ukraine,2018-06-01,2017,TV-MA,3 Seasons,"Crime TV Shows, International TV Shows, TV Dramas",An extraordinary sense of smell gives a crime ...,the sniffer the sniffer the sniffer the sniffe...,0.988687
3010,s3011,TV Show,Ares,Unknown,"Jade Olieberg, Tobias Kersloot, Lisa Smit, Fri...",Netherlands,2020-01-17,2020,TV-MA,1 Season,"International TV Shows, TV Dramas, TV Horror","Aiming to become part of Amsterdam's elite, an...","ares ares ares ares ares jade olieberg, tobi...",0.988574
4741,s4742,TV Show,Switched,Unknown,"Daiki Shigeoka, Tomohiro Kamiyama, Kaya Kiyoha...",Japan,2018-08-01,2018,TV-MA,1 Season,"International TV Shows, TV Dramas, TV Mysteries",High schooler Ayumi's perfect world evaporates...,switched switched switched switched switched ...,0.988431
5083,s5084,Movie,Milada,David Mrnka,"Ayelet Zurer, Robert Gant, Vica Kerekes, AÅa ...","Czech Republic, United States",2018-01-12,2017,TV-14,124 min,"Dramas, International Movies",Politician and human rights campaigner Milada ...,milada milada milada milada milada david mrnk...,0.988379
5721,s5723,TV Show,Case,Unknown,"Steinunn ÃlÃ­na ÃorsteinsdÃ³ttir, MagnÃºs JÃ...",Iceland,2016-11-09,2015,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Dramas",A smart lawyer whose drinking and recklessness...,case case case case case steinunn ãlã­na ã...,0.988163
2568,s2569,TV Show,Hangar 1: The UFO Files,Unknown,Unknown,Unknown,2020-05-02,2015,TV-PG,1 Season,Docuseries,Researchers add context and clarity to UFO mys...,hangar 1: the ufo files hangar 1: the ufo file...,0.987989
1880,s1881,TV Show,To the Lake,Pavel Kostomarov,"Viktoriya Isakova, Kirill KÃ¤ro, Aleksandr Rob...",Russia,2020-10-07,2020,TV-MA,1 Season,"International TV Shows, TV Dramas, TV Mysteries",Facing the end of civilization when a terrifyi...,to the lake to the lake to the lake to the lak...,0.987916


# 4. Comparison of the Recommendation Systems

In [12]:
# Create DataFrames for each recommender system with rank
count_vec_rec_results['Rank'] = range(1, len(count_vec_rec_results) + 1)
tfidf_svd_rec_results['Rank'] = range(1, len(tfidf_svd_rec_results) + 1)
autoencoder_rec_results['Rank'] = range(1, len(autoencoder_rec_results) + 1)

count_vec_rec_results = count_vec_rec_results.rename(columns={'title': 'CountVectorizer'})
tfidf_svd_rec_results = tfidf_svd_rec_results.rename(columns={'title': 'TF-IDF with SVD'})
autoencoder_rec_results = autoencoder_rec_results.rename(columns={'title': 'Weighted Autoencoder'})

comparison_df = pd.merge(count_vec_rec_results[['CountVectorizer', 'Rank']],
                         tfidf_svd_rec_results[['TF-IDF with SVD', 'Rank']],
                         on='Rank', how='outer')
comparison_df = pd.merge(comparison_df,
                         autoencoder_rec_results[['Weighted Autoencoder', 'Rank']],
                         on='Rank', how='outer')

comparison_df = comparison_df.sort_values(by='Rank').reset_index(drop=True)
comparison_df = comparison_df[['Rank', 'CountVectorizer', 'TF-IDF with SVD', 'Weighted Autoencoder']]

print("Comparison of Recommendations:")
comparison_df

Comparison of Recommendations:


Unnamed: 0,Rank,CountVectorizer,TF-IDF with SVD,Weighted Autoencoder
0,1,MINDHUNTER,Manhunt: Deadly Games,Osmosis
1,2,Re:Mind,Criminal Minds,Better Than Us
2,3,Get Shorty,Coin Heist,Criminal Minds
3,4,Criminal Minds,Quantico,The Sniffer
4,5,Inside the Criminal Mind,MINDHUNTER,Ares
5,6,The Mole,Somewhere Between,Switched
6,7,The Good Cop,The Fosters,Milada
7,8,Father Brown,Damnation,Case
8,9,Argon,Beauty & the Beast,Hangar 1: The UFO Files
9,10,The Bachelorette,Imposters,To the Lake


In [13]:
count_vec_titles = count_vec_rec_results['CountVectorizer'].tolist()
tfidf_svd_titles = tfidf_svd_rec_results['TF-IDF with SVD'].tolist()
autoencoder_titles = autoencoder_rec_results['Weighted Autoencoder'].tolist()

count_vec_set = set(count_vec_titles)
tfidf_svd_set = set(tfidf_svd_titles)
autoencoder_set = set(autoencoder_titles)

# Movies recommended by all three systems
all_three = count_vec_set & tfidf_svd_set & autoencoder_set

# Movies recommended by exactly two systems
count_vec_and_tfidf_svd = count_vec_set & tfidf_svd_set - autoencoder_set
count_vec_and_autoencoder = count_vec_set & autoencoder_set - tfidf_svd_set
tfidf_svd_and_autoencoder = tfidf_svd_set & autoencoder_set - count_vec_set
exactly_two = count_vec_and_tfidf_svd | count_vec_and_autoencoder | tfidf_svd_and_autoencoder

# Movies recommended by only one system
only_count_vec = count_vec_set - tfidf_svd_set - autoencoder_set
only_tfidf_svd = tfidf_svd_set - count_vec_set - autoencoder_set
only_autoencoder = autoencoder_set - count_vec_set - tfidf_svd_set
only_one = only_count_vec | only_tfidf_svd | only_autoencoder

all_three_df = pd.DataFrame(all_three, columns=['Titles'])
exactly_two_df = pd.DataFrame(exactly_two, columns=['Titles'])
only_one_df = pd.DataFrame(only_one, columns=['Titles'])

print(f"\nMovies recommended by all three systems ({len(all_three_df)}):")
all_three_df


Movies recommended by all three systems (1):


Unnamed: 0,Titles
0,Criminal Minds


In [14]:
print(f"\nMovies recommended by exactly two systems ({len(exactly_two_df)}):")
exactly_two_df


Movies recommended by exactly two systems (1):


Unnamed: 0,Titles
0,MINDHUNTER


In [15]:
print(f"\nMovies recommended by only one system ({len(only_one_df)}):")
only_one_df


Movies recommended by only one system (25):


Unnamed: 0,Titles
0,Coin Heist
1,Switched
2,Case
3,Beauty & the Beast
4,Quantico
5,Re:Mind
6,Better Than Us
7,Argon
8,To the Lake
9,Somewhere Between


After implementing and testing three different recommender systems, we can compare their outputs and evaluate their performance based on the diversity and overlap of the recommendations.

#### Summary

- **Diversity of Recommendations**: The number of unique recommendations from each system shows that each algorithm captures different features and provides diverse suggestions.
- **Overlap and Agreement**: The limited number of movies recommended by all three or exactly two systems indicates that while there is some common ground, each system brings a unique perspective to the recommendations.
- **System Strengths**: Each system has its strengths, and leveraging these through a hybrid approach could provide more comprehensive recommendations.
