The recommendation system of Netflix shows you movies and TV shows according to your interests.
Netflix has a lot of data because of its user base. Its recommendation system predicts a personalised catalogue for you based on factors like:
your viewing history,the viewing history of other users with similar tastes and preferences as yours and
genres, category, description, and more information about the content that you watched in the past. This notebook shows how the system is built.

In [44]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction import text
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
data = pd.read_csv("C:/Users/admin/Downloads/archive/netflixData.csv")
data.head()

Unnamed: 0,Show Id,Title,Description,Director,Genres,Cast,Production Country,Release Date,Rating,Duration,Imdb Score,Content Type,Date Added
0,cc1b6ed9-cf9e-4057-8303-34577fb54477,(Un)Well,This docuseries takes a deep dive into the luc...,,Reality TV,,United States,2020.0,TV-MA,1 Season,6.6/10,TV Show,
1,e2ef4e91-fb25-42ab-b485-be8e3b23dedb,#Alive,"As a grisly virus rampages a city, a lone man ...",Cho Il,"Horror Movies, International Movies, Thrillers","Yoo Ah-in, Park Shin-hye",South Korea,2020.0,TV-MA,99 min,6.2/10,Movie,"September 8, 2020"
2,b01b73b7-81f6-47a7-86d8-acb63080d525,#AnneFrank - Parallel Stories,"Through her diary, Anne Frank's story is retol...","Sabina Fedeli, Anna Migotto","Documentaries, International Movies","Helen Mirren, Gengher Gatti",Italy,2019.0,TV-14,95 min,6.4/10,Movie,"July 1, 2020"
3,b6611af0-f53c-4a08-9ffa-9716dc57eb9c,#blackAF,Kenya Barris and his family navigate relations...,,TV Comedies,"Kenya Barris, Rashida Jones, Iman Benson, Genn...",United States,2020.0,TV-MA,1 Season,6.6/10,TV Show,
4,7f2d4170-bab8-4d75-adc2-197f7124c070,#cats_the_mewvie,This pawesome documentary explores how our fel...,Michael Margolis,"Documentaries, International Movies",,Canada,2020.0,TV-14,90 min,5.1/10,Movie,"February 5, 2020"


In [45]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5967 entries, 0 to 5966
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Show Id             5967 non-null   object 
 1   Title               5967 non-null   object 
 2   Description         5967 non-null   object 
 3   Director            3903 non-null   object 
 4   Genres              5967 non-null   object 
 5   Cast                5437 non-null   object 
 6   Production Country  5408 non-null   object 
 7   Release Date        5964 non-null   float64
 8   Rating              5963 non-null   object 
 9   Duration            5964 non-null   object 
 10  Imdb Score          5359 non-null   object 
 11  Content Type        5967 non-null   object 
 12  Date Added          4632 non-null   object 
dtypes: float64(1), object(12)
memory usage: 606.2+ KB


In [46]:
data.isnull().sum()

Show Id                  0
Title                    0
Description              0
Director              2064
Genres                   0
Cast                   530
Production Country     559
Release Date             3
Rating                   4
Duration                 3
Imdb Score             608
Content Type             0
Date Added            1335
dtype: int64

In [47]:
data = data.dropna()
data.head(10)

Unnamed: 0,Show Id,Title,Description,Director,Genres,Cast,Production Country,Release Date,Rating,Duration,Imdb Score,Content Type,Date Added
1,e2ef4e91-fb25-42ab-b485-be8e3b23dedb,#Alive,"As a grisly virus rampages a city, a lone man ...",Cho Il,"Horror Movies, International Movies, Thrillers","Yoo Ah-in, Park Shin-hye",South Korea,2020.0,TV-MA,99 min,6.2/10,Movie,"September 8, 2020"
2,b01b73b7-81f6-47a7-86d8-acb63080d525,#AnneFrank - Parallel Stories,"Through her diary, Anne Frank's story is retol...","Sabina Fedeli, Anna Migotto","Documentaries, International Movies","Helen Mirren, Gengher Gatti",Italy,2019.0,TV-14,95 min,6.4/10,Movie,"July 1, 2020"
5,c293788a-41f7-49a3-a7fc-005ea33bce2b,#FriendButMarried,"Pining for his high school crush for years, a ...",Rako Prijanto,"Dramas, International Movies, Romantic Movies","Adipati Dolken, Vanesha Prescilla, Rendi Jhon,...",Indonesia,2018.0,TV-G,102 min,7.0/10,Movie,"May 21, 2020"
6,0555e67e-f624-4a05-93e4-55c117d0056d,#FriendButMarried 2,As Ayu and Ditto finally transition from best ...,Rako Prijanto,"Dramas, International Movies, Romantic Movies","Adipati Dolken, Mawar de Jongh, Sari Nila, Von...",Indonesia,2020.0,TV-G,104 min,7.0/10,Movie,"June 28, 2020"
7,c844460f-6178-4f87-929e-80816c74ca35,#realityhigh,When nerdy high schooler Dani finally attracts...,Fernando Lebrija,Comedies,"Nesta Cooper, Kate Walsh, John Michael Higgins...",United States,2017.0,TV-14,99 min,5.1/10,Movie,"September 8, 2017"
9,6da2fc83-1546-4e9d-bf2e-9b472a059c18,#Selfie,"Two days before their final exams, three teen ...",Cristina Jacob,"Comedies, Dramas, International Movies","Flavia Hojda, Crina Semciuc, Olimpia Melinte, ...",Romania,2014.0,TV-MA,125 min,5.8/10,Movie,"June 21, 2021"
10,2aa7f08b-0321-4398-baa9-4a138f8cb9e9,#Selfie 69,"After a painful breakup, a trio of party-lovin...",Cristina Jacob,"Comedies, Dramas, International Movies","Maia Morgenstern, Olimpia Melinte, Crina Semci...",Romania,2016.0,TV-MA,119 min,6.0/10,Movie,"June 21, 2021"
11,ea94d1e2-dfb5-4fb0-b941-ca4b1ade98c1,10 Days in Sun City,After his girlfriend wins the Miss Nigeria pag...,Adze Ugah,"Comedies, International Movies, Romantic Movies","Ayo Makun, Adesua Etomi, Richard Mofe-Damijo, ...","South Africa, Nigeria",2017.0,TV-14,87 min,5.1/10,Movie,"October 18, 2019"
12,eb2818f5-d01b-46fe-8797-931a767d5831,10 jours en or,When a carefree bachelor is unexpectedly left ...,Nicolas Brossette,"Comedies, Dramas, International Movies","Franck Dubosc, Claude Rich, Marie Kremer, Math...",France,2012.0,TV-14,97 min,6.1/10,Movie,"July 1, 2017"
15,94f39a7c-7d6e-47bc-9dc2-3db2c81c0e73,100 Meters,A man who is diagnosed with multiple sclerosis...,Marcel Barrena,"Dramas, International Movies, Sports Movies","Dani Rovira, Karra Elejalde, Alexandra Jiménez...","Portugal, Spain",2016.0,TV-MA,109 min,7.6/10,Movie,"March 10, 2017"


The title column contains the titles of movies and TV shows on Netflix
Description column describes the plot of the TV shows and movies
The Content Type column tells us if it’s a movie or a TV show
The Genre column contains all the genres of the TV show or the movie

In [48]:
import nltk
import re
import string

def clean_text(text):
    text = re.sub(r'\[.*?\]', '', text)
    text = re.sub(r'https?://\S+|www\.\S+', '', text)
    text = re.sub(r'<.*?>+', '', text)
    text = re.sub(f'[{re.escape(string.punctuation)}]', '', text)
    text = text.split()
    text = [re.sub(r'\n', '', word) for word in text]
    text = [re.sub(r'\w*\d\w*', '', word) for word in text]
    text = " ".join(text)
    
    return text
data["Title"] = data["Title"].apply(clean_text)
data.Title.sample(10)

2682    LEGO Marvel Super Heroes Maximum Overload
850                                       BuyBust
1706                          Free State of Jones
4813                       The Figurine Araromire
4536                  Take the Ball Pass the Ball
651                                  Bibi Tina II
2106                                    Honeytrap
2728                                Lingua Franca
5888                                       Xtreme
5731                               We Are Legends
Name: Title, dtype: object

In [60]:
feature = data["Production Country"].tolist()

tfidf = TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf.fit_transform(feature)

similarity = cosine_similarity(tfidf_matrix)

def movie_recommendation(Title, similarity=similarity):
    idx = data.index[data['Title'] == Title].tolist()[0]
    sim_scores = list(enumerate(similarity[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    movie_indices = [i[0] for i in sim_scores if i[0] != idx]
    return data['Title'].iloc[movie_indices[:5]]
    
print(movie_recommendation("Honeytrap"))


18     Rupee Note
24         August
32         States
41          Kille
46         Idiots
Name: Title, dtype: object
