#StreamSense - Netflix Recommendation Engine using Content-Based Filtering

# 1. Introduction

The goal of this project is to develop a content-based recommendation engine for movies and TV shows on Netflix. I will compare two different methods:

1. Using *cast, director, country, rating and genres* as features.
2. Using the words in the movie/TV show *descriptions* as features.

# 2. Imports

In [1]:
import numpy as np
import pandas as pd
import re

import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
from nltk.tokenize import word_tokenize

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# 3. Loading data

In [2]:
data = pd.read_csv('/content/netflix_titles.csv')
data.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,"August 14, 2020",2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",2008,PG-13,123 min,Dramas,A brilliant group of students become card-coun...


In [3]:
data.groupby('type').count()

Unnamed: 0_level_0,show_id,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Movie,5377,5377,5214,4951,5147,5377,5377,5372,5377,5377,5377
TV Show,2410,2410,184,2118,2133,2400,2410,2408,2410,2410,2410


In [4]:
data = data.dropna(subset=['cast', 'country', 'rating'])

# 4. Developing Recommendation Engine using *cast, director, country, rating and genres*

In [5]:
movies = data[data['type'] == 'Movie'].reset_index()
movies = movies.drop(['index', 'show_id', 'type', 'date_added', 'release_year', 'duration', 'description'], axis=1)
movies.head()

Unnamed: 0,title,director,cast,country,rating,listed_in
0,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,TV-MA,"Dramas, International Movies"
1,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,R,"Horror Movies, International Movies"
2,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,PG-13,"Action & Adventure, Independent Movies, Sci-Fi..."
3,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,PG-13,Dramas
4,122,Yasir Al Yasiri,"Amina Khalil, Ahmed Dawood, Tarek Lotfy, Ahmed...",Egypt,TV-MA,"Horror Movies, International Movies"


In [6]:
tv = data[data['type'] == 'TV Show'].reset_index()
tv = tv.drop(['index', 'show_id', 'type', 'date_added', 'release_year', 'duration', 'description'], axis=1)
tv.head()

Unnamed: 0,title,director,cast,country,rating,listed_in
0,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,TV-MA,"International TV Shows, TV Dramas, TV Sci-Fi &..."
1,46,Serdar Akar,"Erdal Beşikçioğlu, Yasemin Allen, Melis Birkan...",Turkey,TV-MA,"International TV Shows, TV Dramas, TV Mysteries"
2,1983,,"Robert Więckiewicz, Maciej Musiał, Michalina O...","Poland, United States",TV-MA,"Crime TV Shows, International TV Shows, TV Dramas"
3,​SAINT SEIYA: Knights of the Zodiac,,"Bryson Baugus, Emily Neves, Blake Shepard, Pat...",Japan,TV-14,"Anime Series, International TV Shows"
4,#blackAF,,"Kenya Barris, Rashida Jones, Iman Benson, Genn...",United States,TV-MA,TV Comedies


In [7]:
actors = []

for i in movies['cast']:
    actor = re.split(r', \s*', i)
    actors.append(actor)

flat_list = []
for sublist in actors:
    for item in sublist:
        flat_list.append(item)

actors_list = sorted(set(flat_list))

binary_actors = [[0] * 0 for i in range(len(set(flat_list)))]

for i in movies['cast']:
    k = 0
    for j in actors_list:
        if j in i:
            binary_actors[k].append(1.0)
        else:
            binary_actors[k].append(0.0)
        k+=1

binary_actors = pd.DataFrame(binary_actors).transpose()

directors = []

for i in movies['director']:
    if pd.notna(i):
        director = re.split(r', \s*', i)
        directors.append(director)

flat_list2 = []
for sublist in directors:
    for item in sublist:
        flat_list2.append(item)

directors_list = sorted(set(flat_list2))

binary_directors = [[0] * 0 for i in range(len(set(flat_list2)))]

for i in movies['director']:
    k = 0
    for j in directors_list:
        if pd.isna(i):
            binary_directors[k].append(0.0)
        elif j in i:
            binary_directors[k].append(1.0)
        else:
            binary_directors[k].append(0.0)
        k+=1

binary_directors = pd.DataFrame(binary_directors).transpose()

countries = []

for i in movies['country']:
    country = re.split(r', \s*', i)
    countries.append(country)

flat_list3 = []
for sublist in countries:
    for item in sublist:
        flat_list3.append(item)

countries_list = sorted(set(flat_list3))

binary_countries = [[0] * 0 for i in range(len(set(flat_list3)))]

for i in movies['country']:
    k = 0
    for j in countries_list:
        if j in i:
            binary_countries[k].append(1.0)
        else:
            binary_countries[k].append(0.0)
        k+=1

binary_countries = pd.DataFrame(binary_countries).transpose()

genres = []

for i in movies['listed_in']:
    genre = re.split(r', \s*', i)
    genres.append(genre)

flat_list4 = []
for sublist in genres:
    for item in sublist:
        flat_list4.append(item)

genres_list = sorted(set(flat_list4))

binary_genres = [[0] * 0 for i in range(len(set(flat_list4)))]

for i in movies['listed_in']:
    k = 0
    for j in genres_list:
        if j in i:
            binary_genres[k].append(1.0)
        else:
            binary_genres[k].append(0.0)
        k+=1

binary_genres = pd.DataFrame(binary_genres).transpose()

ratings = []

for i in movies['rating']:
    ratings.append(i)

ratings_list = sorted(set(ratings))

binary_ratings = [[0] * 0 for i in range(len(set(ratings_list)))]

for i in movies['rating']:
    k = 0
    for j in ratings_list:
        if j in i:
            binary_ratings[k].append(1.0)
        else:
            binary_ratings[k].append(0.0)
        k+=1

binary_ratings = pd.DataFrame(binary_ratings).transpose()

In [8]:
binary = pd.concat([binary_actors, binary_directors, binary_countries, binary_genres], axis=1,ignore_index=True)
binary

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,26570,26571,26572,26573,26574,26575,26576,26577,26578,26579
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4756,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4757,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4758,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4759,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [9]:
actors2 = []

for i in tv['cast']:
    actor2 = re.split(r', \s*', i)
    actors2.append(actor2)

flat_list5 = []
for sublist in actors2:
    for item in sublist:
        flat_list5.append(item)

actors_list2 = sorted(set(flat_list5))

binary_actors2 = [[0] * 0 for i in range(len(set(flat_list5)))]

for i in tv['cast']:
    k = 0
    for j in actors_list2:
        if j in i:
            binary_actors2[k].append(1.0)
        else:
            binary_actors2[k].append(0.0)
        k+=1

binary_actors2 = pd.DataFrame(binary_actors2).transpose()


countries2 = []

for i in tv['country']:
    country2 = re.split(r', \s*', i)
    countries2.append(country2)

flat_list6 = []
for sublist in countries2:
    for item in sublist:
        flat_list6.append(item)

countries_list2 = sorted(set(flat_list6))

binary_countries2 = [[0] * 0 for i in range(len(set(flat_list6)))]

for i in tv['country']:
    k = 0
    for j in countries_list2:
        if j in i:
            binary_countries2[k].append(1.0)
        else:
            binary_countries2[k].append(0.0)
        k+=1

binary_countries2 = pd.DataFrame(binary_countries2).transpose()

genres2 = []

for i in tv['listed_in']:
    genre2 = re.split(r', \s*', i)
    genres2.append(genre2)

flat_list7 = []
for sublist in genres2:
    for item in sublist:
        flat_list7.append(item)

genres_list2 = sorted(set(flat_list7))

binary_genres2 = [[0] * 0 for i in range(len(set(flat_list7)))]

for i in tv['listed_in']:
    k = 0
    for j in genres_list2:
        if j in i:
            binary_genres2[k].append(1.0)
        else:
            binary_genres2[k].append(0.0)
        k+=1

binary_genres2 = pd.DataFrame(binary_genres2).transpose()

ratings2 = []

for i in tv['rating']:
    ratings2.append(i)

ratings_list2 = sorted(set(ratings2))

binary_ratings2 = [[0] * 0 for i in range(len(set(ratings_list2)))]

for i in tv['rating']:
    k = 0
    for j in ratings_list2:
        if j in i:
            binary_ratings2[k].append(1.0)
        else:
            binary_ratings2[k].append(0.0)
        k+=1

binary_ratings2 = pd.DataFrame(binary_ratings2).transpose()

In [10]:
binary2 = pd.concat([binary_actors2, binary_countries2, binary_genres2], axis=1, ignore_index=True)
binary2

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,12741,12742,12743,12744,12745,12746,12747,12748,12749,12750
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1886,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1887,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
1888,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [11]:
def recommender(search):
    cs_list = []
    binary_list = []
    if search in movies['title'].values:
        idx = movies[movies['title'] == search].index.item()
        for i in binary.iloc[idx]:
            binary_list.append(i)
        point1 = np.array(binary_list).reshape(1, -1)
        point1 = [val for sublist in point1 for val in sublist]
        for j in range(len(movies)):
            binary_list2 = []
            for k in binary.iloc[j]:
                binary_list2.append(k)
            point2 = np.array(binary_list2).reshape(1, -1)
            point2 = [val for sublist in point2 for val in sublist]
            dot_product = np.dot(point1, point2)
            norm_1 = np.linalg.norm(point1)
            norm_2 = np.linalg.norm(point2)
            cos_sim = dot_product / (norm_1 * norm_2)
            cs_list.append(cos_sim)
        movies_copy = movies.copy()
        movies_copy['cos_sim'] = cs_list
        results = movies_copy.sort_values('cos_sim', ascending=False)
        results = results[results['title'] != search]
        top_results = results.head(5)
        return(top_results)
    elif search in tv['title'].values:
        idx = tv[tv['title'] == search].index.item()
        for i in binary2.iloc[idx]:
            binary_list.append(i)
        point1 = np.array(binary_list).reshape(1, -1)
        point1 = [val for sublist in point1 for val in sublist]
        for j in range(len(tv)):
            binary_list2 = []
            for k in binary2.iloc[j]:
                binary_list2.append(k)
            point2 = np.array(binary_list2).reshape(1, -1)
            point2 = [val for sublist in point2 for val in sublist]
            dot_product = np.dot(point1, point2)
            norm_1 = np.linalg.norm(point1)
            norm_2 = np.linalg.norm(point2)
            cos_sim = dot_product / (norm_1 * norm_2)
            cs_list.append(cos_sim)
        tv_copy = tv.copy()
        tv_copy['cos_sim'] = cs_list
        results = tv_copy.sort_values('cos_sim', ascending=False)
        results = results[results['title'] != search]
        top_results = results.head(5)
        return(top_results)
    else:
        return("Title not in dataset. Please check spelling.")

## 4.1. Recommending Movies

In [12]:
recommender('The Conjuring')

Unnamed: 0,title,director,cast,country,rating,listed_in,cos_sim
1868,Insidious,James Wan,"Patrick Wilson, Rose Byrne, Lin Shaye, Ty Simp...","United States, Canada, United Kingdom",PG-13,"Horror Movies, Thrillers",0.388922
968,Creep,Patrick Brice,"Mark Duplass, Patrick Brice",United States,R,"Horror Movies, Independent Movies, Thrillers",0.377964
1844,In the Tall Grass,Vincenzo Natali,"Patrick Wilson, Laysla De Oliveira, Avery Whit...","Canada, United States",TV-MA,"Horror Movies, Thrillers",0.370625
969,Creep 2,Patrick Brice,"Mark Duplass, Desiree Akhavan, Karan Soni",United States,TV-MA,"Horror Movies, Independent Movies, Thrillers",0.356348
1077,Desolation,Sam Patton,"Jaimi Paige, Alyshia Ochse, Toby Nichols, Clau...",United States,TV-MA,"Horror Movies, Thrillers",0.356348


In [14]:
recommender("Child's Play")

Unnamed: 0,title,director,cast,country,rating,listed_in,cos_sim
975,Cult of Chucky,Don Mancini,"Fiona Dourif, Michael Therriault, Adam Hurtig,...",United States,R,Horror Movies,0.3125
4669,Wildling,Fritz Böhm,"Bel Powley, Brad Dourif, Liv Tyler, Collin Kel...",United States,R,"Horror Movies, Independent Movies, Sci-Fi & Fa...",0.301511
3592,Stephanie,Akiva Goldsman,"Shree Cooks, Frank Grillo, Anna Torv",United States,R,Horror Movies,0.283473
789,Candyman,Bernard Rose,"Virginia Madsen, Tony Todd, Xander Berkeley, K...","United States, United Kingdom",R,"Cult Movies, Horror Movies",0.267261
1316,Family Blood,Sonny Mallhi,"James Ransone, Vinessa Shaw, Ajiona Alexus, Co...",United States,TV-MA,Horror Movies,0.265165


## 4.2. Recommending TV shows

In [15]:
recommender('After Life')

Unnamed: 0,title,director,cast,country,rating,listed_in,cos_sim
468,Extras,,"Ricky Gervais, Stephen Merchant, Ashley Jensen...","United Kingdom, United States",TV-MA,"British TV Shows, TV Comedies",0.526235
51,Africa,,David Attenborough,United Kingdom,TV-PG,"British TV Shows, Docuseries, International TV...",0.452911
1488,The Code,,Marcus du Sautoy,United Kingdom,TV-PG,"British TV Shows, Docuseries, International TV...",0.452911
463,Everyday Miracles,,Mark Miodownik,United Kingdom,TV-PG,"British TV Shows, Docuseries, International TV...",0.452911
1468,The Blue Planet: A Natural History of the Oceans,Alastair Fothergill,David Attenborough,United Kingdom,TV-G,"British TV Shows, Docuseries, International TV...",0.452911


In [16]:
recommender('Anne with an E')

Unnamed: 0,title,director,cast,country,rating,listed_in,cos_sim
254,Can You Hear Me?,,"Mélissa Bédard, Ève Landry, Florence Longpré",Canada,TV-MA,"International TV Shows, TV Comedies, TV Dramas",0.46225
1229,Restaurants on the Edge,,"Nick Liberato, Karin Bohn, Dennis Prescott",Canada,TV-14,"International TV Shows, Reality TV",0.392232
185,Bitten,,"Laura Vandervoort, Greyston Holt, Greg Bryk, S...",Canada,TV-MA,"International TV Shows, TV Dramas, TV Horror",0.384615
645,Hip-Hop Evolution,,Shad Kabango,Canada,TV-MA,"Docuseries, International TV Shows",0.372104
622,Heavy Rescue: 401,,Dave Pettitt,Canada,TV-MA,"International TV Shows, Reality TV",0.372104


# 5. Developing Recommendation Engine using *Movie/TV show descriptions*

In [17]:
movies_des = data[data['type'] == 'Movie'].reset_index()
movies_des = movies_des[['title', 'description']]
movies_des.head()

Unnamed: 0,title,description
0,7:19,After a devastating earthquake hits Mexico Cit...
1,23:59,"When an army recruit is found dead, his fellow..."
2,9,"In a postapocalyptic world, rag-doll robots hi..."
3,21,A brilliant group of students become card-coun...
4,122,"After an awful accident, a couple admitted to ..."


In [18]:
tv_des = data[data['type'] == 'TV Show'].reset_index()
tv_des = tv_des[['title', 'description']]
tv_des.head()

Unnamed: 0,title,description
0,3%,In a future where the elite inhabit an island ...
1,46,A genetics professor experiments with a treatm...
2,1983,"In this dark alt-history thriller, a naïve law..."
3,​SAINT SEIYA: Knights of the Zodiac,Seiya and the Knights of the Zodiac rise again...
4,#blackAF,Kenya Barris and his family navigate relations...


In [24]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [25]:
filtered_movies = []
movies_words = []

for text in movies_des['description']:
    text_tokens = word_tokenize(text)
    tokens_without_sw = [word.lower() for word in text_tokens if not word in stopwords.words()]
    movies_words.append(tokens_without_sw)
    filtered = (" ").join(tokens_without_sw)
    filtered_movies.append(filtered)

movies_words = [val for sublist in movies_words for val in sublist]
movies_words = sorted(set(movies_words))
movies_des['description_filtered'] = filtered_movies
movies_des.head()

Unnamed: 0,title,description,description_filtered
0,7:19,After a devastating earthquake hits Mexico Cit...,after devastating earthquake hits mexico city ...
1,23:59,"When an army recruit is found dead, his fellow...","when army recruit found dead , fellow soldiers..."
2,9,"In a postapocalyptic world, rag-doll robots hi...","in postapocalyptic world , rag-doll robots hid..."
3,21,A brilliant group of students become card-coun...,a brilliant group students card-counting exper...
4,122,"After an awful accident, a couple admitted to ...","after awful accident , couple admitted grisly ..."


In [26]:
filtered_tv = []
tv_words = []

for text in tv_des['description']:
    text_tokens = word_tokenize(text)
    tokens_without_sw = [word.lower() for word in text_tokens if not word in stopwords.words()]
    tv_words.append(tokens_without_sw)
    filtered = (" ").join(tokens_without_sw)
    filtered_tv.append(filtered)

tv_words = [val for sublist in tv_words for val in sublist]
tv_words = sorted(set(tv_words))
tv_des['description_filtered'] = filtered_tv
tv_des.head()

Unnamed: 0,title,description,description_filtered
0,3%,In a future where the elite inhabit an island ...,in future elite inhabit island paradise crowde...
1,46,A genetics professor experiments with a treatm...,a genetics professor experiments treatment com...
2,1983,"In this dark alt-history thriller, a naïve law...","in dark alt-history thriller , naïve law stude..."
3,​SAINT SEIYA: Knights of the Zodiac,Seiya and the Knights of the Zodiac rise again...,seiya knights zodiac rise protect reincarnatio...
4,#blackAF,Kenya Barris and his family navigate relations...,"kenya barris family navigate relationships , r..."


In [27]:
movie_word_binary = [[0] * 0 for i in range(len(set(movies_words)))]

for des in movies_des['description_filtered']:
    k = 0
    for word in movies_words:
        if word in des:
            movie_word_binary[k].append(1.0)
        else:
            movie_word_binary[k].append(0.0)
        k+=1

movie_word_binary = pd.DataFrame(movie_word_binary).transpose()

In [28]:
tv_word_binary = [[0] * 0 for i in range(len(set(tv_words)))]

for des in tv_des['description_filtered']:
    k = 0
    for word in tv_words:
        if word in des:
            tv_word_binary[k].append(1.0)
        else:
            tv_word_binary[k].append(0.0)
        k+=1

tv_word_binary = pd.DataFrame(tv_word_binary).transpose()

In [29]:
def recommender2(search):
    cs_list = []
    binary_list = []
    if search in movies_des['title'].values:
        idx = movies_des[movies_des['title'] == search].index.item()
        for i in movie_word_binary.iloc[idx]:
            binary_list.append(i)
        point1 = np.array(binary_list).reshape(1, -1)
        point1 = [val for sublist in point1 for val in sublist]
        for j in range(len(movies_des)):
            binary_list2 = []
            for k in movie_word_binary.iloc[j]:
                binary_list2.append(k)
            point2 = np.array(binary_list2).reshape(1, -1)
            point2 = [val for sublist in point2 for val in sublist]
            dot_product = np.dot(point1, point2)
            norm_1 = np.linalg.norm(point1)
            norm_2 = np.linalg.norm(point2)
            cos_sim = dot_product / (norm_1 * norm_2)
            cs_list.append(cos_sim)
        movies_copy = movies_des.copy()
        movies_copy['cos_sim'] = cs_list
        results = movies_copy.sort_values('cos_sim', ascending=False)
        results = results[results['title'] != search]
        top_results = results.head(5)
        return(top_results)
    elif search in tv_des['title'].values:
        idx = tv_des[tv_des['title'] == search].index.item()
        for i in tv_word_binary.iloc[idx]:
            binary_list.append(i)
        point1 = np.array(binary_list).reshape(1, -1)
        point1 = [val for sublist in point1 for val in sublist]
        for j in range(len(tv)):
            binary_list2 = []
            for k in tv_word_binary.iloc[j]:
                binary_list2.append(k)
            point2 = np.array(binary_list2).reshape(1, -1)
            point2 = [val for sublist in point2 for val in sublist]
            dot_product = np.dot(point1, point2)
            norm_1 = np.linalg.norm(point1)
            norm_2 = np.linalg.norm(point2)
            cos_sim = dot_product / (norm_1 * norm_2)
            cs_list.append(cos_sim)
        tv_copy = tv_des.copy()
        tv_copy['cos_sim'] = cs_list
        results = tv_copy.sort_values('cos_sim', ascending=False)
        results = results[results['title'] != search]
        top_results = results.head(5)
        return(top_results)
    else:
        return("Title not in dataset. Please check spelling.")

## 5.1. Recommending Movies

In [30]:
pd.options.display.max_colwidth = 300
recommender2('The Conjuring')

Unnamed: 0,title,description,description_filtered,cos_sim
1632,Hard Lessons,"This drama based on real-life events tells the story of George McKenna, the tough, determined new principal of a notorious Los Angeles high school.","this drama based real-life events tells story george mckenna , tough , determined principal notorious los angeles high school .",0.489767
227,Adrishya,A family’s harmonious existence is interrupted when the young son begins showing symptoms of anxiety that seem linked to disturbing events at home.,a family ’ harmonious existence interrupted young begins showing symptoms anxiety linked disturbing events home .,0.473757
3335,Sat Sri Akal,"Based on true events, this moving story centers on a Punjabi family whose celebration of their faith endures in the face of conflicting attitudes.","based events , moving story centers punjabi family celebration faith endures conflicting attitudes .",0.472805
765,Bye Bye London,"In this satirical play, a wealthy Kuwaiti businessman travels to London for pleasure and encounters a host of eccentric characters.","in satirical play , wealthy kuwaiti businessman travels london pleasure encounters host eccentric characters .",0.460195
3910,The Eyes of My Mother,"At the remote farmhouse where she once witnessed a traumatic childhood event, a young woman develops a grisly fascination with violence.","at remote farmhouse witnessed traumatic childhood event , young woman develops grisly fascination violence .",0.456211


In [31]:
recommender2("Child's Play")

Unnamed: 0,title,description,description_filtered,cos_sim
1550,Good People,A struggling couple can't believe their luck when they find a stash of money in the apartment of a neighbor who was recently murdered.,a struggling couple n't believe luck find stash money apartment neighbor recently murdered .,0.431373
3172,Rehmataan,"As unemployment, drug addiction and corruption plague a society, this drama depicts the people who believe there’s still good in the world.","as unemployment , addiction corruption plague society , drama depicts believe ’ world .",0.4222
4439,Trip to Bhangarh: Asia's Most Haunted Place,"To amuse themselves, six college friends decide to pay a visit to a fortress believed by some to be the most haunted place in Asia.","to amuse , college friends decide pay visit fortress believed haunted place asia .",0.420084
2467,Material,"A dutiful son must hide his pursuit of stand-up comedy from his staunch father, who expects him to inherit his store and uphold their Muslim beliefs.","a dutiful hide pursuit stand-up comedy staunch father , expects inherit store uphold muslim beliefs .",0.419219
1299,"Extremely Wicked, Shockingly Evil and Vile",Single mother Liz falls for Ted Bundy and refuses to believe the truth about his crimes for years. A drama based on a true story.,single mother liz falls ted bundy refuses believe truth crimes years . a drama based story .,0.411665


## 5.2. Recommending TV shows

In [32]:
recommender2('After Life')

Unnamed: 0,title,description,description_filtered,cos_sim
1628,The Paper,A construction magnate takes over a struggling newspaper and attempts to wield editorial influence for power and personal gain.,a construction magnate takes struggling newspaper attempts wield editorial influence power personal .,0.500568
858,Longmire,This contemporary crime thriller focuses on a Wyoming sheriff who's rebuilding his life and career following the death of his wife.,this contemporary crime thriller focuses wyoming sheriff 's rebuilding life career death wife .,0.466252
1821,Welcome to the Family,"When an evicted single mom's estranged father dies, she and his second wife cover up his death after learning they've been written out of his will.","when evicted single mom 's estranged father , wife cover death learning 've written .",0.45
1848,Winter Sun,"Years after ruthless businessmen kill his father and order the death of his twin brother, a modest fisherman adopts a new persona to exact revenge.","years ruthless businessmen kill father order death twin brother , modest fisherman adopts persona exact revenge .",0.442807
534,Gentlemen and Gangsters,"Now on the run, a writer relates his previous year's escapades when he got sucked into the thrilling, sordid orbit of boxer and jazz man Henry Morgan.","now run , writer relates previous year 's escapades sucked thrilling , sordid orbit boxer jazz henry morgan .",0.433614


In [33]:
recommender2('Anne with an E')

Unnamed: 0,title,description,description_filtered,cos_sim
147,Bat Pat,"A curious and talkative bat finds spooky fun on the streets of Fogville, a town that's not as quiet as it seems, with a plucky girl and her brothers.","a curious talkative finds spooky fun streets fogville , town 's quiet , plucky girl brothers .",0.447722
519,From Dusk Till Dawn,Bank-robbing brothers encounter vengeful lawmen and demons south of the border in this original series based on Robert Rodriguez' cult horror film.,bank-robbing brothers encounter vengeful lawmen demons south border original series based robert rodriguez ' cult horror film .,0.42339
461,Eugenie Nights,"In 1940s Port Said, Kariman finds comfort and solace in the arms of an unhappily married man, who also happens to be her abusive husband's brother.","in 1940s port said , kariman finds comfort solace arms unhappily married , abusive husband 's brother .",0.411844
1196,Queen Sono,South African spy Queen Sono finds herself in a nefarious web of business and politics as she seeks to uncover the truth behind her mother's death.,south african spy queen sono finds nefarious web business politics seeks uncover truth mother 's death .,0.40794
683,Ice Fantasy,"The Ice Tribe prince journeys to a sacred mountain to vanquish evil but soon finds himself at war with the Fire Tribe, led by his long-lost brother.","the ice tribe prince journeys sacred mountain vanquish evil finds fire tribe , led long-lost brother .",0.402492
