# <font color ='Firebrick'><center> Netflix Recommendation system </center>

__Netflix is a subscription-based streaming platform that allows users to watch movies and TV shows without advertisements. One of the reasons behind the popularity of Netflix is its recommendation system. Its recommendation system recommends movies and TV shows based on the user’s interestpIn this project, the main objective is to build a recommendation system.__



# <font color='dimgray'> <center> I.Read & Understand Dataset

In [1]:
#importing dependencies

import pandas as pd #for dataframe operations
import numpy as np #for numerical operations

#for graphs
import matplotlib.pyplot as plt 
import seaborn as sns 
sns.set(style= "darkgrid")

#to handle the warnings
from warnings import filterwarnings
filterwarnings('ignore')


In [2]:
# Load dataset

df = pd.read_csv("netflix_titles.csv")

In [3]:
df.head(3)   #top 3 records

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,81145628,Movie,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby","Alan Marriott, Andrew Toth, Brian Dobson, Cole...","United States, India, South Korea, China","September 9, 2019",2019,TV-PG,90 min,"Children & Family Movies, Comedies",Before planning an awesome wedding for his gra...
1,80117401,Movie,Jandino: Whatever it Takes,,Jandino Asporaat,United Kingdom,"September 9, 2016",2016,TV-MA,94 min,Stand-Up Comedy,Jandino Asporaat riffs on the challenges of ra...
2,70234439,TV Show,Transformers Prime,,"Peter Cullen, Sumalee Montano, Frank Welker, J...",United States,"September 8, 2018",2013,TV-Y7-FV,1 Season,Kids' TV,"With the help of three human allies, the Autob..."


In [4]:
df.tail(3)   #bottom 3 records

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
6231,80116008,Movie,Little Baby Bum: Nursery Rhyme Friends,,,,,2016,,60 min,Movies,Nursery rhymes and original music for children...
6232,70281022,TV Show,A Young Doctor's Notebook and Other Stories,,"Daniel Radcliffe, Jon Hamm, Adam Godley, Chris...",United Kingdom,,2013,TV-MA,2 Seasons,"British TV Shows, TV Comedies, TV Dramas","Set during the Russian Revolution, this comic ..."
6233,70153404,TV Show,Friends,,"Jennifer Aniston, Courteney Cox, Lisa Kudrow, ...",United States,,2003,TV-14,10 Seasons,"Classic & Cult TV, TV Comedies",This hit sitcom follows the merry misadventure...


In [5]:
def missing_values_analysis(df):
    na_columns = [col for col in df.columns if df[col].isnull().sum() > 0]
    n_miss =df[na_columns].isnull().sum().sort_values (ascending=True)
    ratio = (df[na_columns].isnull().sum() / df.shape[0] * 100).sort_values (ascending=True)
    missing_df = pd.concat([n_miss, np.round(ratio, 2)], axis =1, keys=['Missing Values', 'Ratio']) 
    missing_df = pd.DataFrame(missing_df)
    return missing_df

def overview(df, head=5, tail= 5):
    print('\033[1;36mINITIAL DATASET OVERVIEW\033[0m')
    print("\033[1;3mSHAPE\033[0m".center(82,'-'))
    print('\033[1;3mRows:\033[0m {}'.format(df.shape[0]))
    print('\033[1;3mcolumns:\033[0m {}'.format(df.shape[1]))
    print("\033[1;3mTYPES\033[0m".center(82,'-'))
    print(df.dtypes)
    print("\033[1;3mMissing Values\033[0m".center(82,'-'))
    print(missing_values_analysis (df))
    print("\033[1;3mUnique Values\033[0m".center(82,'-'))
    print( df.nunique())

In [6]:
overview(df)

[1;36mINITIAL DATASET OVERVIEW[0m
---------------------------------[1;3mSHAPE[0m----------------------------------
[1;3mRows:[0m 6234
[1;3mcolumns:[0m 12
---------------------------------[1;3mTYPES[0m----------------------------------
show_id          int64
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
rating          object
duration        object
listed_in       object
description     object
dtype: object
-----------------------------[1;3mMissing Values[0m-----------------------------
            Missing Values  Ratio
rating                  10   0.16
date_added              11   0.18
country                476   7.64
cast                   570   9.14
director              1969  31.58
-----------------------------[1;3mUnique Values[0m------------------------------
show_id         6234
type               2
title           6172
director        3301
cast       

In [8]:
col = ['type','rating']

for c in col:
    print('\033[1;3m',c,'\033[0m')
    print(df[c].unique())
    print("".center(82,'-' ))

[1;3m type [0m
['Movie' 'TV Show']
----------------------------------------------------------------------------------
[1;3m rating [0m
['TV-PG' 'TV-MA' 'TV-Y7-FV' 'TV-Y7' 'TV-14' 'R' 'TV-Y' 'NR' 'PG-13' 'TV-G'
 'PG' 'G' nan 'UR' 'NC-17']
----------------------------------------------------------------------------------


**The dataset contains the data of both Tv shows as well as movies.**

# <font color='grey'><center> II. DATA PREPROCESSING

In [9]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

__Let’s select the columns that we can use to build a Netflix recommendation system:__</div>

## Feature selection

In [22]:
data = df[["title", "description", "type", "listed_in"]]
data.head() 

Unnamed: 0,title,description,type,listed_in
0,Norm of the North: King Sized Adventure,Before planning an awesome wedding for his gra...,Movie,"Children & Family Movies, Comedies"
1,Jandino: Whatever it Takes,Jandino Asporaat riffs on the challenges of ra...,Movie,Stand-Up Comedy
2,Transformers Prime,"With the help of three human allies, the Autob...",TV Show,Kids' TV
3,Transformers: Robots in Disguise,When a prison ship crash unleashes hundreds of...,TV Show,Kids' TV
4,#realityhigh,When nerdy high schooler Dani finally attracts...,Movie,Comedies


__As the title indicates:__

- Netflix titles for movies and TV series are listed in the title column.
- The TV series and movies' stories are described in the description column.
- We can detect if it's a movie or a TV show from the Content Type field.
- All of the TV show and movie genres are listed in the Genre column.

__Now let’s check if there are any missing values :__

In [11]:
data.isnull().sum() #checking null values

title          0
description    0
type           0
listed_in      0
dtype: int64

__We can observe we don't have any null values in the features we are going to consider.__

Let's check some sample of the titles

In [37]:
print(data.title.sample(10)) 

3586    The Original Kings of Comedy
4109                         Shikari
3074                        Scream 3
1717                     It's Bruno!
4797      The Laws of Thermodynamics
1793                 The Jungle Book
1924                       Apollo 18
2670                   Savage Raghda
5358                        Lunatics
2317       The Spiderwick Chronicles
Name: title, dtype: object


# <font color='grey'><center> III.Building the recommendation system

Now I will use the Genres column as the feature to recommend similar content to the user.
Cosine similarity is a method used in building machine learning applications such as recommender systems. It is a technique to find the similarities between the two documents

# <font color ='Firebrick'>Recommendation based on similar Genre

In [38]:
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity

In [39]:
feature = data["listed_in"].tolist()
tfidf = text.TfidfVectorizer(input=feature, stop_words="english")
tfidf_matrix = tfidf.fit_transform(feature)
similarity = cosine_similarity(tfidf_matrix)

__Now Let's set the Title column as an index so that we can find similar content by giving the title of the movie or TV show as an input:__



In [40]:
indices = pd.Series(data.index, index=data['title']).drop_duplicates()

__User defined function to recommend Movies and TV shows on Netflix:__

In [89]:
def recommend(title, similarity = similarity):
    index = indices[title]
    similarity_scores = list(enumerate(similarity[index]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    similarity_scores = similarity_scores[0:10]
    movieindices = [i[0] for i in similarity_scores]
    rec = list(data['title'].iloc[movieindices])
    for m in rec:
        print('➤',m)
    return ''

In [98]:
def user_input():
    choice = input('Enter the Tv show or movie name: ')
    print('\n \033[1;31mHere are the top 10 recommendations, If you watched this',choice,':\033[0m\n')
    print('\n',recommend(choice))    

In [99]:
user_input()

Enter the Tv show or movie name: Narcos

 [1;31mHere are the top 10 recommendations, If you watched this Narcos :[0m

➤ Narcos: Mexico
➤ Altered Carbon
➤ Marvel's The Defenders
➤ Marvel's Iron Fist
➤ Gotham
➤ Person of Interest
➤ Narcos
➤ Queen of the South
➤ Marvel's Luke Cage
➤ Shooter

 


In [100]:
user_input()

Enter the Tv show or movie name: Stranger Things

 [1;31mHere are the top 10 recommendations, If you watched this Stranger Things :[0m

➤ Helix
➤ Nightflyers
➤ Stranger Things
➤ Chilling Adventures of Sabrina
➤ The Messengers
➤ The Vampire Diaries
➤ The 4400
➤ Zoo
➤ The OA
➤ Sense8

 


In [101]:
user_input()

Enter the Tv show or movie name: Sanju

 [1;31mHere are the top 10 recommendations, If you watched this Sanju :[0m

➤ The Mayor
➤ TUNA GIRL
➤ 5CM
➤ King of Boys
➤ Sarah's Key
➤ Mad World
➤ Miss Julie
➤ Cardboard Gangsters
➤ Gie
➤ ​Maj Rati ​​Keteki

 


In [102]:
user_input()

Enter the Tv show or movie name: Breaking Bad

 [1;31mHere are the top 10 recommendations, If you watched this Breaking Bad :[0m

➤ The Assassination of Gianni Versace
➤ The Lizzie Borden Chronicles
➤ The Blacklist
➤ Designated Survivor
➤ Ozark
➤ Breaking Bad
➤ Unbelievable
➤ Damnation
➤ When They See Us
➤ American Crime Story: The People v. O.J. Simpson

 


# <div class="alert alert-danger"><center>END</div>