# Netflix Originals Recommendation Engine
The aims of this notebook include:
- Adequately pre-processing the Netflix Originals data
- Develop a recommendation engine for the the titles within the dataset.
- Make interaction with the function User Friendly and lenient with user-error.
- Implement the function into a interface widget(end of notebook).

In [2]:
import pandas as pd
import numpy as np
import os
import requests
pd.set_option('display.max_columns', None)

#### NOTE: Please save your unique API KEY as an enivironment variable.

In [2]:
os.chdir(r"C:\Users\oskar\Documents\Projects\Portfolio Projects\Netflix-Originals")
netflix = pd.read_csv("completed_netflix.csv") #Load in Dataset from previous notebook, titled "Dataset Creation"
netflix = netflix.drop("Unnamed: 0", axis=1) 

In [3]:
import json #needed for working with json data
#Using OMDb API to scrape the long version of the plot instead of the short version.

netflix['Plot'] = None      # Add a column to store the Plot in netflix dataframe.

api_key = os.getenv('API_KEY') #retrieve the api key from the environment variable.

for idx, row in netflix.iterrows():
    title = row['Title'] #Extracts the title for the given row and stores it in "title"
    response = requests.get(f"http://www.omdbapi.com/?apikey={api_key}&t={title.replace(' ', '+')}&plot=full")
    #sends a http GET request to retrieve movie information. Spaces in the title are replaced by "+" and plot is set to full.

    if response.status_code == 200: #indicates a successful connection
        try:
            movie_data = response.json() #Decode the json response and stores it in movie_data
            if 'Plot' in movie_data: #Checking to see if Plot even exists in the original json response.
                netflix.loc[idx, 'Plot'] = movie_data['Plot'] #If plot exists, the plot column is updated at the current index.
            else:
                print(f"Movie not found: {title}") 
        except json.JSONDecodeError: #If json error ecode the print line will let us know.
            print(f"Failed to decode JSON for {title}: {response.text}")
    else:
        print(f"Failed to get data for {title}: {response.status_code}") #if response is not 200, print <-

#Note: Can take upwards of 1 hour to complete.

Failed to decode JSON for 7 años: {"Title":"7 Años de Matrimonio","Year":"2013","Rated":"N/A","Released":"25 Jan 2013","Runtime":"100 min","Genre":"Comedy, Romance","Director":"Joel Núñez Arocha","Writer":"Natalia Armienta, Ragnar Conde","Actors":"Ximena Herrera, Víctor González, Roberto Palazuelos","Plot":"Conocer la pareja de tu vida es sólo obra del destino! Pero una cosa es casarse y otra, controlarse. Así que para sobrevivir la crisis del matrimonio, te diremos cómo lograrlo.Esta divertida comedia te enseñará que en el amor; TODO SE VALE!	Knowing the partner of your life is only the work of fate! But ... one thing is to get married and another controlled. So to survive the crisis of marriage, we will tell you how.This hilarious comedy will teach you that in love, EVERYTHING IS WORTH IT!","Language":"Spanish","Country":"Mexico","Awards":"1 win","Poster":"https://m.media-amazon.com/images/M/MV5BMjIxODAwOTkzMF5BMl5BanBnXkFtZTcwMDcyMjA1OQ@@._V1_SX300.jpg","Ratings":[{"Source":"Interne

In [4]:
netflix.to_csv("netflix_long_plot.csv") #saving here so above code doesnt need to be rerun each time.


## All Data Imported

In [5]:
netflix = pd.read_csv("netflix_long_plot.csv") #read in the above csv to save time if code run before.

netflix = netflix.drop("Description", axis=1)
netflix = netflix.rename(columns={"Plot":"Description"}) #column naming clarity.

netflix.isna().sum() #There are 2 NA values in the description column which will need to be removed
netflix = netflix.dropna()
netflix.isna().sum() #checking no NA values remaining.

Unnamed: 0                       0
Title                            0
Genre                            0
Rated                            0
Running Time(mins)               0
Release Date                     0
Director                         0
Cast                             0
Primary Country                  0
International                    0
Lead Production Company          0
Multiple Production Companies    0
Wins                             0
Nominations                      0
IMDb Votes                       0
IMDb Score                       0
Description                      0
dtype: int64

## Feature Preparation (To-do List)
- Title : No Changes
- Description: NLP - TF-IDF
- Genre : One Hot Encoding
- Rated : One Hot Encoding
- Running Time : Normalise
- Release Date : Convert to Year
- Director : Sparse?
- Cast : Sparse?
- Primary Country: One Hot Encoding
- International : Convert numeric/boolean
- Lead Production Company : One Hot Encoding
- Multiple Production Companies : Boolean
- Wins: Normalise
- Nominations: Normalise
- IMDb Votes: Normalise
- IMDb Score: No Changes 

### Genre

In [6]:
from sklearn.preprocessing import MultiLabelBinarizer
print("Missing Values: ", netflix["Genre"].isna().sum())
netflix[["Genre"]].head() # currently genre is represented by a string, where genres are separated by commas

Missing Values:  0


Unnamed: 0,Genre
0,"Drama, War"
1,"Action, Adventure, Comedy"
2,"Comedy, Family"
3,Comedy
4,"Action, Adventure, Comedy"


In [7]:
netflix["Genre"] = netflix["Genre"].apply(lambda x: x.split(", ") if isinstance(x, str) else x) #convert string to a list, splitting at commas.

mlb = MultiLabelBinarizer() # Initiate MultiLabelBinarizer
netflix_genres = mlb.fit_transform(netflix["Genre"])
display(mlb.classes_)

netflix_genres = netflix_genres.astype(int) #converting 1 and 0 to integer rather than string type.
netflix_genres = pd.DataFrame(netflix_genres, columns=mlb.classes_) #creating a new df containing only binarised genres.

# Join the new DataFrame with the original DataFrame
netflix.reset_index(drop=True, inplace=True)
netflix_genres.reset_index(drop=True, inplace=True) #resetting indexes to make sure they are alligned when they get joined together.

netflix = pd.concat([netflix, netflix_genres], axis=1)

netflix = netflix[['Title','Description', 'Genre', 'Rated', 'Running Time(mins)', 'Release Date',
       'Director', 'Cast', 'Primary Country',
       'Lead Production Company', 'Wins',
       'Nominations', 'IMDb Votes', 'IMDb Score','Multiple Production Companies','International' ,'Action',
       'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime', 'Documentary',
       'Drama', 'Family', 'Fantasy', 'Film-Noir', 'History', 'Horror', 'Music',
       'Musical', 'Mystery', 'News', 'Reality-TV', 'Romance', 'Sci-Fi',
       'Short', 'Sport', 'Thriller', 'War', 'Western']] #Reorganising Column Order

cols_to_convert = netflix.columns[netflix.columns.get_loc("Multiple Production Companies"):] #boolean columns were showing up as float, needed to convert to int.
#get.loc() returns the index of a column label
netflix[cols_to_convert] = netflix[cols_to_convert].fillna(0).astype(int) #fill na values with 0. Columns with NA values cannot be classed as integers.
netflix.head(3)
netflix.isna().sum()

array(['Action', 'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime',
       'Documentary', 'Drama', 'Family', 'Fantasy', 'Film-Noir',
       'History', 'Horror', 'Music', 'Musical', 'Mystery', 'News',
       'Reality-TV', 'Romance', 'Sci-Fi', 'Short', 'Sport', 'Thriller',
       'War', 'Western'], dtype=object)

Title                            0
Description                      0
Genre                            0
Rated                            0
Running Time(mins)               0
Release Date                     0
Director                         0
Cast                             0
Primary Country                  0
Lead Production Company          0
Wins                             0
Nominations                      0
IMDb Votes                       0
IMDb Score                       0
Multiple Production Companies    0
International                    0
Action                           0
Adventure                        0
Animation                        0
Biography                        0
Comedy                           0
Crime                            0
Documentary                      0
Drama                            0
Family                           0
Fantasy                          0
Film-Noir                        0
History                          0
Horror              

---
## Rated

In [8]:
netflix["Rated"] = pd.Categorical(netflix["Rated"]) #converts ratings to categorical dtype

netflix_ratings = pd.get_dummies(netflix["Rated"], prefix = "Rated")
netflix = pd.concat([netflix, netflix_ratings], axis=1) #Adds dummies to original dataframe.

---
## Director

In [9]:
vc_director = netflix["Director"].value_counts()
vc_director.head(10)

Kunle Afolayan    4
Chris Smith       4
Tyler Perry       3
Steve Brill       3
Tyler Spindel     3
Leigh Janiak      3
Vince Marcello    3
Bruno Garotti     3
McG               3
Mike Rohl         2
Name: Director, dtype: int64

- #### This variable will store directors who have had more than one production associated with NETFLIX. Due to the sparse nature of the directors column (Most directors only have 1 prior production associated with Netflix), grouping the recurring directors may be of use when computing similarity between directors down the line.

In [10]:

netflix_director = pd.DataFrame(netflix["Director"].value_counts()) #converting the value counts table to a dataframe.
netflix_director.reset_index(inplace=True)  # Resetting index 

netflix_director = netflix_director[netflix_director["Director"] > 1] #Only directors who have had more than 1 netflix production are now stored in netflix_director.
netflix_director.reset_index(drop=True, inplace=True) 

netflix_directors = []  # Initialise list
[netflix_directors.append(name) for name in netflix_director["index"]]#List comprehension to collect director names.

netflix["Netflix_director"] = netflix["Director"].apply(lambda x: 1 if x in netflix_directors else 0)#anonymous function to input binary into new column.
netflix.head(2)

Unnamed: 0,Title,Description,Genre,Rated,Running Time(mins),Release Date,Director,Cast,Primary Country,Lead Production Company,Wins,Nominations,IMDb Votes,IMDb Score,Multiple Production Companies,International,Action,Adventure,Animation,Biography,Comedy,Crime,Documentary,Drama,Family,Fantasy,Film-Noir,History,Horror,Music,Musical,Mystery,News,Reality-TV,Romance,Sci-Fi,Short,Sport,Thriller,War,Western,Rated_General Audiences,Rated_Mature Audiences,Rated_Not Rated,Rated_Parental Guidance,Netflix_director
0,Beasts of No Nation,"Follows the journey of a young boy, Agu, who i...","[Drama, War]",Mature Audiences,138,2015-09-03,Cary Joji Fukunaga,"Abraham Attah, Emmanuel Affadzi, Ricky Adelayitor",United States,Participant Media,31,59,84555.0,7.7,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0
1,The Ridiculous 6,"A white man, Tommy, raised by Indians is appro...","[Action, Adventure, Comedy]",Mature Audiences,120,2015-12-11,Frank Coraci,"Adam Sandler, Terry Crews, Jorge Garcia",United States,Happy Madison Productions,0,0,51701.0,4.8,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
2,Pee-wee's Big Holiday,A fateful meeting with a mysterious stranger i...,"[Comedy, Family]",Parental Guidance,89,2016-03-17,John Lee,"Paul Reubens, Jordan Black, Doug Cox",United States,Pee-wee Pictures,0,0,8470.0,6.1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
3,Special Correspondents,A New York radio reporter and his sound engine...,[Comedy],Mature Audiences,100,2016-04-22,Ricky Gervais,"Ricky Gervais, Eric Bana, Vera Farmiga",Canada,Bron Studios,0,0,25488.0,5.9,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
4,The Do-Over,Max (Adam Sandler) and Charlie (David Spade) o...,"[Action, Adventure, Comedy]",Mature Audiences,108,2016-05-16,Steve Brill,"Adam Sandler, David Spade, Paula Patton",United States,Happy Madison Productions,0,0,49243.0,5.7,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1


---
## Cast
- This section will create Cast weights which will assign each actor a weight based on how often they appear in the dataset.
- Following this, each film will be assigned an overall total weight based on the actors that are part of the cast for that film.

In [11]:
from collections import Counter
netflix['Cast'] = netflix['Cast'].apply(lambda x: x.split(", ") if isinstance(x, str) else x) #Turns actor string into comma separated list.

netflix["Cast"].head()

0    [Abraham Attah, Emmanuel Affadzi, Ricky Adelay...
1            [Adam Sandler, Terry Crews, Jorge Garcia]
2               [Paul Reubens, Jordan Black, Doug Cox]
3             [Ricky Gervais, Eric Bana, Vera Farmiga]
4            [Adam Sandler, David Spade, Paula Patton]
Name: Cast, dtype: object

In [12]:
actor_list = [actor for sublist in netflix['Cast'] for actor in sublist] #separates each individual actor in cast for each film.
#for each list, extracts each actor name, and appends to actor_list
actor_freq = Counter(actor_list) #Counts actors in above list.
#print(actor_freq)
sorted_actor_freq = sorted(actor_freq.items(), key=lambda x: x[1], reverse=True) #Adam Sandler is the most popular actor in Netflix Original Productions, other than N/A's.
#sorted_actor_freq[:2]

In [13]:
total_movies=len(netflix) #Finds length of dataframe
actor_weights = {actor: count / total_movies for actor, count in actor_freq.items()}  #The number of movies an actor has taken part in divided by the total length of the dataset. Each actor allocated a weight.
#print(actor_weights)

In [14]:
def film_actor_weight(cast_list, actor_weights):         #This function will determine the total weight associated with actors for each given film.
    weight=0
    for actor in cast_list: #runs over each actor in cast_list, which is a list of actors for a given film.
        weight+= actor_weights.get(actor,0) #Fetches the weight from the actor weights dictionairy and returns it and adds it to weight.
    return weight #returns the total weight

In [15]:
netflix['Actor_Weight'] = netflix['Cast'].apply(lambda x: film_actor_weight(x, actor_weights) if isinstance(x, list) else 0)  #Creating a new column based on the function above.

---
### Primary Country

In [16]:
netflix["Primary Country"] = pd.Categorical(netflix["Primary Country"])

netflix_primary_country = pd.get_dummies(netflix["Primary Country"], prefix = "Primary_Country")  #One Hot Encoding Primary Country
netflix = pd.concat([netflix, netflix_primary_country], axis=1)
netflix.head()

Unnamed: 0,Title,Description,Genre,Rated,Running Time(mins),Release Date,Director,Cast,Primary Country,Lead Production Company,Wins,Nominations,IMDb Votes,IMDb Score,Multiple Production Companies,International,Action,Adventure,Animation,Biography,Comedy,Crime,Documentary,Drama,Family,Fantasy,Film-Noir,History,Horror,Music,Musical,Mystery,News,Reality-TV,Romance,Sci-Fi,Short,Sport,Thriller,War,Western,Rated_General Audiences,Rated_Mature Audiences,Rated_Not Rated,Rated_Parental Guidance,Netflix_director,Actor_Weight,Primary_Country_Argentina,Primary_Country_Australia,Primary_Country_Austria,Primary_Country_Belgium,Primary_Country_Brazil,Primary_Country_Cambodia,Primary_Country_Canada,Primary_Country_Chile,Primary_Country_China,Primary_Country_Czech Republic,Primary_Country_Denmark,Primary_Country_France,Primary_Country_Georgia,Primary_Country_Germany,Primary_Country_Greece,Primary_Country_Hong Kong,Primary_Country_Hungary,Primary_Country_Iceland,Primary_Country_India,Primary_Country_Indonesia,Primary_Country_Ireland,Primary_Country_Israel,Primary_Country_Italy,Primary_Country_Japan,Primary_Country_Mexico,Primary_Country_Netherlands,Primary_Country_New Zealand,Primary_Country_Nigeria,Primary_Country_Norway,Primary_Country_Pakistan,Primary_Country_Peru,Primary_Country_Philippines,Primary_Country_Poland,Primary_Country_South Africa,Primary_Country_South Korea,Primary_Country_Spain,Primary_Country_Sweden,Primary_Country_Thailand,Primary_Country_Turkey,Primary_Country_United Arab Emirates,Primary_Country_United Kingdom,Primary_Country_United States
0,Beasts of No Nation,"Follows the journey of a young boy, Agu, who i...","[Drama, War]",Mature Audiences,138,2015-09-03,Cary Joji Fukunaga,"[Abraham Attah, Emmanuel Affadzi, Ricky Adelay...",United States,Participant Media,31,59,84555.0,7.7,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0.003484,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,The Ridiculous 6,"A white man, Tommy, raised by Indians is appro...","[Action, Adventure, Comedy]",Mature Audiences,120,2015-12-11,Frank Coraci,"[Adam Sandler, Terry Crews, Jorge Garcia]",United States,Happy Madison Productions,0,0,51701.0,4.8,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0.012776,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Pee-wee's Big Holiday,A fateful meeting with a mysterious stranger i...,"[Comedy, Family]",Parental Guidance,89,2016-03-17,John Lee,"[Paul Reubens, Jordan Black, Doug Cox]",United States,Pee-wee Pictures,0,0,8470.0,6.1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0.003484,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,Special Correspondents,A New York radio reporter and his sound engine...,[Comedy],Mature Audiences,100,2016-04-22,Ricky Gervais,"[Ricky Gervais, Eric Bana, Vera Farmiga]",Canada,Bron Studios,0,0,25488.0,5.9,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0.004646,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,The Do-Over,Max (Adam Sandler) and Charlie (David Spade) o...,"[Action, Adventure, Comedy]",Mature Audiences,108,2016-05-16,Steve Brill,"[Adam Sandler, David Spade, Paula Patton]",United States,Happy Madison Productions,0,0,49243.0,5.7,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0.013937,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


---
### Lead Production Company

- Similar process as with directors
- Take the top production companies (those who have has more than 2 productions as the lead company)
- If unlisted, or doesnt have more than 2 production, the company will not be appended to "top_companies"

In [17]:
vc_lpc =netflix["Lead Production Company"].value_counts()
vc_lpc.head()

No Production Company Listed    118
Netflix                          54
Happy Madison Productions        12
MPCA                              9
RSVP Movies                       7
Name: Lead Production Company, dtype: int64

In [18]:
netflix_company = pd.DataFrame(netflix["Lead Production Company"].value_counts())     #Index jumbled in this code 
netflix_company.reset_index(inplace=True)
netflix_company = netflix_company[netflix_company["Lead Production Company"]>2] #this code selects those companies with more than 2 productions.
netflix_company.reset_index(inplace=True, drop =True)
netflix_company.head()


Unnamed: 0,index,Lead Production Company
0,No Production Company Listed,118
1,Netflix,54
2,Happy Madison Productions,12
3,MPCA,9
4,RSVP Movies,7


In [47]:
top_companies = []

for company in netflix_company["index"]:
    if company != "No Production Company Listed": #Do not want to include these renamed NA's as top companies
        top_companies.append(company)

top_companies[:5] #top 5 companies

['Netflix', 'Happy Madison Productions', 'MPCA', 'RSVP Movies', 'Likely Story']

In [20]:
netflix["Netflix_popular_company"] = netflix["Lead Production Company"].apply(lambda x: 1 if x in top_companies else 0) #Manually encoding the top production companies

---
## Date

- Creating a new column for Year based on Release Date. The year of release will be useful for similarity as different eras of film have thematic differences.

In [21]:
netflix["Release Date"] = pd.to_datetime(netflix["Release Date"]) #converting columns to datetime
netflix["Release Date"].dtype
netflix['Release Year'] = netflix['Release Date'].dt.year #extracting the year element from the datetime and storing in a new column.

---
# Numerical Columns
## Running Time, Wins, Nominations, IMDb Votes, IMDb Score
- It is important to scale data otherwise some variables will have significantly more influence than others. Eg IMDb score is 1-10, whereas IMDb Votes is 0 - 100,000+
- Numerical Columns will be standardised with MinMaxScaler


In [22]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

cols_to_scale = ["Running Time(mins)", "Wins", "Nominations","IMDb Votes", "IMDb Score", "Release Year"] #These are the numerical variables that need to be scaled.

for col in cols_to_scale:
    netflix[col + "_scaled"] = scaler.fit_transform(netflix[[col]]) #creating new scaled columns with "_scaled" as a suffix.

In [23]:
# All Features present
netflix.to_csv("netflix_recommendation_EDA.csv")  #Large number of columns here. Can be used for EDA + IMDb score prediction project in upcoming notebooks.

In [3]:
netflix = pd.read_csv("netflix_recommendation_EDA.csv", index_col=None)
netflix.drop("Unnamed: 0", axis=1, inplace=True)
netflix.dtypes.head()

Title                 object
Description           object
Genre                 object
Rated                 object
Running Time(mins)     int64
dtype: object

---
## Building the Recommendation System

In [4]:
netflix = netflix[['Title', 'Description', #re-organising column order
       'Rated_General Audiences', 'Rated_Mature Audiences', 'Rated_Not Rated',
       'Rated_Parental Guidance', 'Netflix_director', 'Actor_Weight',
       'Netflix_popular_company', 'Running Time(mins)_scaled', 'Wins_scaled',
       'Nominations_scaled', 'IMDb Votes_scaled', 'IMDb Score_scaled',"Release Year_scaled",
       'Multiple Production Companies', 'International',
       'Release Year','Action', 'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime',
       'Documentary', 'Drama', 'Family', 'Fantasy', 'Film-Noir', 'History',
       'Horror', 'Music', 'Musical', 'Mystery', 'News', 'Reality-TV',
       'Romance', 'Sci-Fi', 'Short', 'Sport', 'Thriller', 'War', 'Western','Primary_Country_Argentina', 'Primary_Country_Australia',
       'Primary_Country_Austria', 'Primary_Country_Belgium',
       'Primary_Country_Brazil', 'Primary_Country_Cambodia',
       'Primary_Country_Canada', 'Primary_Country_Chile',
       'Primary_Country_China', 'Primary_Country_Czech Republic',
       'Primary_Country_Denmark', 'Primary_Country_France',
       'Primary_Country_Georgia', 'Primary_Country_Germany',
       'Primary_Country_Greece', 'Primary_Country_Hong Kong',
       'Primary_Country_Hungary', 'Primary_Country_Iceland',
       'Primary_Country_India', 'Primary_Country_Indonesia',
       'Primary_Country_Ireland', 'Primary_Country_Israel',
       'Primary_Country_Italy', 'Primary_Country_Japan',
       'Primary_Country_Mexico', 'Primary_Country_Netherlands',
       'Primary_Country_New Zealand', 'Primary_Country_Nigeria',
       'Primary_Country_Norway', 'Primary_Country_Pakistan',
       'Primary_Country_Peru', 'Primary_Country_Philippines',
       'Primary_Country_Poland', 'Primary_Country_South Africa',
       'Primary_Country_South Korea', 'Primary_Country_Spain',
       'Primary_Country_Sweden', 'Primary_Country_Thailand',
       'Primary_Country_Turkey', 'Primary_Country_United Arab Emirates',
       'Primary_Country_United Kingdom', 'Primary_Country_United States']]

In [5]:
netflix.head(3)

Unnamed: 0,Title,Description,Rated_General Audiences,Rated_Mature Audiences,Rated_Not Rated,Rated_Parental Guidance,Netflix_director,Actor_Weight,Netflix_popular_company,Running Time(mins)_scaled,Wins_scaled,Nominations_scaled,IMDb Votes_scaled,IMDb Score_scaled,Release Year_scaled,Multiple Production Companies,International,Release Year,Action,Adventure,Animation,Biography,Comedy,Crime,Documentary,Drama,Family,Fantasy,Film-Noir,History,Horror,Music,Musical,Mystery,News,Reality-TV,Romance,Sci-Fi,Short,Sport,Thriller,War,Western,Primary_Country_Argentina,Primary_Country_Australia,Primary_Country_Austria,Primary_Country_Belgium,Primary_Country_Brazil,Primary_Country_Cambodia,Primary_Country_Canada,Primary_Country_Chile,Primary_Country_China,Primary_Country_Czech Republic,Primary_Country_Denmark,Primary_Country_France,Primary_Country_Georgia,Primary_Country_Germany,Primary_Country_Greece,Primary_Country_Hong Kong,Primary_Country_Hungary,Primary_Country_Iceland,Primary_Country_India,Primary_Country_Indonesia,Primary_Country_Ireland,Primary_Country_Israel,Primary_Country_Italy,Primary_Country_Japan,Primary_Country_Mexico,Primary_Country_Netherlands,Primary_Country_New Zealand,Primary_Country_Nigeria,Primary_Country_Norway,Primary_Country_Pakistan,Primary_Country_Peru,Primary_Country_Philippines,Primary_Country_Poland,Primary_Country_South Africa,Primary_Country_South Korea,Primary_Country_Spain,Primary_Country_Sweden,Primary_Country_Thailand,Primary_Country_Turkey,Primary_Country_United Arab Emirates,Primary_Country_United Kingdom,Primary_Country_United States
0,Beasts of No Nation,"Follows the journey of a young boy, Agu, who i...",0,1,0,0,0,0.003484,0,0.645,0.112727,0.16573,0.114974,0.828571,0.918367,1,0,2015,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,The Ridiculous 6,"A white man, Tommy, raised by Indians is appro...",0,1,0,0,0,0.012776,1,0.555,0.0,0.0,0.070294,0.414286,0.918367,0,0,2015,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Pee-wee's Big Holiday,A fateful meeting with a mysterious stranger i...,0,0,0,1,0,0.003484,0,0.4,0.0,0.0,0.011502,0.6,0.928571,1,0,2016,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


- This will be a content based recommendation system NOT a collaborative filtering system, which relies on information collected from many users.
- Rather than looking at who else may be interested in a movie, the engine looks only at the attributes of the movie itself.

- The first engine that I will build will only incorporate 5 variables other than the title: Description, Genre, Release Year, Rating, and Primary Country.
- The weights will be assigned as follows:
    - Description: High weight
    - Genre: High weight
    - Release Year: Medium weight
    - Rating: Medium weight
    - Primary Country: Low weight
    
 - Once the inital system is created, more features will be added to see if improvements can be made.

### Feature Weights
- Initial model was built with 5 features holding equal weight. Weights were adjusted as the model was developed.
- Pre Process the Description column

In [6]:
import re #necessary for working with regular expressions

# Remove special characters and convert to lower case
netflix['Cleaned_Description'] = netflix['Description'].apply(lambda x: re.sub(r'[^\w\s]', '', x.lower())) #replacing non-characters and whitespace with "" (pattern, replace, string)

display(netflix["Cleaned_Description"])

netflix['Cleaned_Description'] = netflix['Cleaned_Description'].apply(lambda x: ' '.join(x.split()))# Remove extra white spaces

netflix["Cleaned_Description"].head()

0      follows the journey of a young boy agu who is ...
1      a white man tommy raised by indians is approac...
2      a fateful meeting with a mysterious stranger i...
3      a new york radio reporter and his sound engine...
4      max adam sandler and charlie david spade old s...
                             ...                        
856    when renowned crime novelist harlan thrombey c...
857    in the funeral of the famous british journalis...
858    the match is a contemporary romantic comedy se...
859    1930s pittsburgh a brother comes home to claim...
860    a mysterious place an indescribable prison a d...
Name: Cleaned_Description, Length: 861, dtype: object

0    follows the journey of a young boy agu who is ...
1    a white man tommy raised by indians is approac...
2    a fateful meeting with a mysterious stranger i...
3    a new york radio reporter and his sound engine...
4    max adam sandler and charlie david spade old s...
Name: Cleaned_Description, dtype: object

In [7]:
#tokenise
netflix['Token_Description'] = netflix['Cleaned_Description'].apply(lambda x: x.split())
netflix["Token_Description"].head()

0    [follows, the, journey, of, a, young, boy, agu...
1    [a, white, man, tommy, raised, by, indians, is...
2    [a, fateful, meeting, with, a, mysterious, str...
3    [a, new, york, radio, reporter, and, his, soun...
4    [max, adam, sandler, and, charlie, david, spad...
Name: Token_Description, dtype: object

In [8]:
#stem words 
from nltk.stem import PorterStemmer
stemmer=PorterStemmer() #the stemmer converts words such as running or ran, to run.

netflix["Stem_Description"] = netflix["Token_Description"].apply(lambda x: [stemmer.stem(word) for word in x])
netflix["Stem_Description"].head(8)

0    [follow, the, journey, of, a, young, boy, agu,...
1    [a, white, man, tommi, rais, by, indian, is, a...
2    [a, fate, meet, with, a, mysteri, stranger, in...
3    [a, new, york, radio, report, and, hi, sound, ...
4    [max, adam, sandler, and, charli, david, spade...
5    [a, writer, paul, rudd, retir, after, a, perso...
6    [thi, homag, to, 1980, teen, sex, comedi, foll...
7    [a, whitecollar, suburban, father, kyle, fran,...
Name: Stem_Description, dtype: object

- tf-idf measures the importance of a word in a document relative to a whole collection of documents.
- Considers how often a word occurs in one document, whilst accounting for the fact that some words occur mamny times in some documents and not at all in others.
- TFIDF can represent documents as vectors in a multiple dimensional space.

In [9]:
#TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer
netflix["Stem_Description_String"] = netflix["Stem_Description"].apply(lambda x: " ".join(x)) #converts each list into a string.

tfidf = TfidfVectorizer(stop_words='english') #initialise vectoriser + stop words

desc_tfidf_matrix = tfidf.fit_transform(netflix['Stem_Description_String']) #apply the vectoriser to our stemmed column of strings.
print(desc_tfidf_matrix.shape)

(861, 7558)


#### Similarity Calculations

In [10]:
from sklearn.metrics.pairwise import linear_kernel
cosine_similarity = linear_kernel(desc_tfidf_matrix, desc_tfidf_matrix) #linear kernel computationally faster than cosine_similarity function. It computes the linear kernels between 2 sets of vectors
print(cosine_similarity)
#each ith row corresponds to the ith column
indices = pd.Series(netflix.index, index = netflix["Title"]) #creating a Pandas Series where the index is a title
print(indices)
indices = indices.to_dict()
print(indices) #This step allows use to look up the index of a title.

[[1.         0.03898671 0.01518731 ... 0.02952931 0.01257144 0.        ]
 [0.03898671 1.         0.02202261 ... 0.05920771 0.02711961 0.        ]
 [0.01518731 0.02202261 1.         ... 0.02252238 0.03288291 0.02641262]
 ...
 [0.02952931 0.05920771 0.02252238 ... 1.         0.03484038 0.00392705]
 [0.01257144 0.02711961 0.03288291 ... 0.03484038 1.         0.00973903]
 [0.         0.         0.02641262 ... 0.00392705 0.00973903 1.        ]]
Title
Beasts of No Nation         0
The Ridiculous 6            1
Pee-wee's Big Holiday       2
Special Correspondents      3
The Do-Over                 4
                         ... 
Knives Out                856
Scoop                     857
The Match                 858
The Piano Lesson          859
The Platform              860
Length: 861, dtype: int64
{'Beasts of No Nation': 0, 'The Ridiculous 6': 1, "Pee-wee's Big Holiday": 2, 'Special Correspondents': 3, 'The Do-Over': 4, 'The Fundamentals of Caring': 5, 'Brahman Naman': 6, 'Rebirth': 7, 'T

In [11]:
#function takes movie title as input, finds index, get list of similarity scores for that movie, descending order, indices of top N movies, translate back to movie titles and present as movie recommendations.
def similar_film(title):
    index = indices[title] #fetches the corresponding index from the indices dictionairy.
    similarity_scores = cosine_similarity[index] #fetches the row of similarity scores for the given index
    sorted_similarity_scores = np.argsort(similarity_scores) #sorts the INDICES by order of similarity
    top_5_score = sorted_similarity_scores[-6:-1][::-1] #slices the last 5 INDEXES of the most similar films(not including itself) and then flips it to get the 5 most similar films.

    top_5_films=[]
    for idx in top_5_score: #Using indexes as a lookup to find film titles.
        top_5_films.append(netflix.iloc[idx]["Title"]) #title lookup + appeneded to top_5_films list.
    return top_5_films

In [12]:
similar_film("Beasts of No Nation") #test 1

['War Machine',
 'Ali & Ratu Ratu Queens',
 'Jung_E',
 'Sand Castle',
 'The Siege of Jadotville']

### Integrating other features
- Adding Features not included in first version of search engine

In [13]:
feature_columns = ['Action', 'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime',
       'Documentary', 'Drama', 'Family', 'Fantasy', 'Film-Noir', 'History','Horror', 'Music', 'Musical', 'Mystery', 'News', 'Reality-TV','Romance', 'Sci-Fi', 'Short', 'Sport', 'Thriller', 'War', 'Western',
       "Release Year_scaled",'Rated_General Audiences','Rated_Mature Audiences', 'Rated_Not Rated', 'Rated_Parental Guidance','Primary_Country_Argentina', 'Primary_Country_Australia',
       'Primary_Country_Austria', 'Primary_Country_Belgium',
       'Primary_Country_Brazil', 'Primary_Country_Cambodia',
       'Primary_Country_Canada', 'Primary_Country_Chile',
       'Primary_Country_China', 'Primary_Country_Czech Republic',
       'Primary_Country_Denmark', 'Primary_Country_France',
       'Primary_Country_Georgia', 'Primary_Country_Germany',
       'Primary_Country_Greece', 'Primary_Country_Hong Kong',
       'Primary_Country_Hungary', 'Primary_Country_Iceland',
       'Primary_Country_India', 'Primary_Country_Indonesia',
       'Primary_Country_Ireland', 'Primary_Country_Israel',
       'Primary_Country_Italy', 'Primary_Country_Japan',
       'Primary_Country_Mexico', 'Primary_Country_Netherlands',
       'Primary_Country_New Zealand', 'Primary_Country_Nigeria',
       'Primary_Country_Norway', 'Primary_Country_Pakistan',
       'Primary_Country_Peru', 'Primary_Country_Philippines',
       'Primary_Country_Poland', 'Primary_Country_South Africa',
       'Primary_Country_South Korea', 'Primary_Country_Spain',
       'Primary_Country_Sweden', 'Primary_Country_Thailand',
       'Primary_Country_Turkey', 'Primary_Country_United Arab Emirates',
       'Primary_Country_United Kingdom', 'Primary_Country_United States']

In [14]:
from sklearn.metrics.pairwise import linear_kernel
from scipy.sparse import hstack #used for stacking sparse matrices on top of each other.

combined_features = hstack([desc_tfidf_matrix, netflix[feature_columns].values], format='csr') #combines the tfidf matrix with the additional features that we just added. csr = compressed sparse row format

linear_sim_combined = linear_kernel(combined_features, combined_features) #calculating new similarity matrix

### Updated Recommendation System (Ratings, Genre, Country, Year accounted for as well as Description)

- Same process as with V1

In [15]:
#function takes movie title as input, finds index, get list of similarity scores for that movie, descending order, indices of top N movies, translate back to movie titles and present as movie recommendations.
def similar_film_v2(title):
    index = indices[title]
    similarity_scores = linear_sim_combined[index]
    sorted_similarity_scores = np.argsort(similarity_scores) #sorts the indices by order of similarity
    top_5_score = sorted_similarity_scores[-6:-1][::-1]

    top_5_films=[]
    for idx in top_5_score:
        top_5_films.append(netflix.iloc[idx]["Title"])
    return top_5_films

In [16]:
similar_film_v2("Beasts of No Nation")

['War Machine',
 'Sand Castle',
 'Da 5 Bloods',
 'Father Soldier Son',
 'All Quiet on the Western Front']

### Integrating Weighting System
- In a recommendation engine, it is important to weight different variables according to their importance in determining similarity between films.

In [17]:
#Weights are based on what I thought would be the most important factors for finding similar films. Weightings adjusted through trial and erorr until near-optimum responses found.

high_weight = ['Action', 'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime',
       'Documentary', 'Drama', 'Family', 'Fantasy', 'Film-Noir', 'History','Horror', 'Music', 'Musical', 'Mystery', 'News', 'Reality-TV','Romance', 'Sci-Fi', 'Short', 'Sport', 'Thriller', 'War', 'Western']

medium_weight=["Release Year_scaled",'Rated_General Audiences','Rated_Mature Audiences', 'Rated_Not Rated', 'Rated_Parental Guidance']

low_weight=['Primary_Country_Argentina', 'Primary_Country_Australia',
       'Primary_Country_Austria', 'Primary_Country_Belgium',
       'Primary_Country_Brazil', 'Primary_Country_Cambodia',
       'Primary_Country_Canada', 'Primary_Country_Chile',
       'Primary_Country_China', 'Primary_Country_Czech Republic',
       'Primary_Country_Denmark', 'Primary_Country_France',
       'Primary_Country_Georgia', 'Primary_Country_Germany',
       'Primary_Country_Greece', 'Primary_Country_Hong Kong',
       'Primary_Country_Hungary', 'Primary_Country_Iceland',
       'Primary_Country_India', 'Primary_Country_Indonesia',
       'Primary_Country_Ireland', 'Primary_Country_Israel',
       'Primary_Country_Italy', 'Primary_Country_Japan',
       'Primary_Country_Mexico', 'Primary_Country_Netherlands',
       'Primary_Country_New Zealand', 'Primary_Country_Nigeria',
       'Primary_Country_Norway', 'Primary_Country_Pakistan',
       'Primary_Country_Peru', 'Primary_Country_Philippines',
       'Primary_Country_Poland', 'Primary_Country_South Africa',
       'Primary_Country_South Korea', 'Primary_Country_Spain',
       'Primary_Country_Sweden', 'Primary_Country_Thailand',
       'Primary_Country_Turkey', 'Primary_Country_United Arab Emirates',
       'Primary_Country_United Kingdom', 'Primary_Country_United States']


from sklearn.metrics.pairwise import linear_kernel
from scipy.sparse import hstack

weighted_high_features = 2.5 * netflix[high_weight].values   #Assigning weights
weighted_medium_features = 2 * netflix[medium_weight].values
weighted_low_features = 1 * netflix[low_weight].values

weighted_desc_tfidf_matrix = 2.5 * desc_tfidf_matrix

#combining all weighted features horizontally
combined_features = hstack([weighted_desc_tfidf_matrix,weighted_high_features,weighted_medium_features,weighted_low_features],format='csr')

# Compute similarity
linear_sim_combined = linear_kernel(combined_features, combined_features)

In [18]:
#function takes movie title as input, finds index, get list of similarity scores for that movie,descending order, indices of top N movies, translate back to movie titles and present as movie recommendations.
def similar_film_v3(title):
    index = indices[title]
    similarity_scores = linear_sim_combined[index]
    sorted_similarity_scores = np.argsort(similarity_scores) #sorts the indices by order of similarity
    top_5_score = sorted_similarity_scores[-11:-1][::-1]

    top_5_films=[]
    for idx in top_5_score:
        top_5_films.append(netflix.iloc[idx]["Title"])
    return top_5_films

In [19]:
similar_film_v3("Blood Brothers: Malcolm X & Muhammad Ali")

['Circus of Books',
 'The Great Hack',
 'Descendant',
 'A Love Song for Latasha',
 'The Martha Mitchell Effect',
 'Zion',
 'Dick Johnson Is Dead',
 'Becoming',
 'Secrets of the Saqqara Tomb',
 'Schumacher']

## Final Recommendation System (Integrating all features + weighting + fine tuning)

In [20]:
high_weight = ['Action', 'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime',
       'Documentary', 'Drama', 'Family', 'Fantasy', 'Film-Noir', 'History','Horror', 'Music', 'Musical', 'Mystery', 'News', 'Reality-TV','Romance', 'Sci-Fi', 'Short', 'Sport', 'Thriller', 'War', 'Western']

medium_weight=["Release Year_scaled",'Rated_General Audiences','Rated_Mature Audiences', 'Rated_Not Rated', 'Rated_Parental Guidance', 'Netflix_director',"Actor_Weight","Running Time(mins)_scaled"]

low_weight=['Primary_Country_Argentina', 'Primary_Country_Australia',
       'Primary_Country_Austria', 'Primary_Country_Belgium',
       'Primary_Country_Brazil', 'Primary_Country_Cambodia',
       'Primary_Country_Canada', 'Primary_Country_Chile',
       'Primary_Country_China', 'Primary_Country_Czech Republic',
       'Primary_Country_Denmark', 'Primary_Country_France',
       'Primary_Country_Georgia', 'Primary_Country_Germany',
       'Primary_Country_Greece', 'Primary_Country_Hong Kong',
       'Primary_Country_Hungary', 'Primary_Country_Iceland',
       'Primary_Country_India', 'Primary_Country_Indonesia',
       'Primary_Country_Ireland', 'Primary_Country_Israel',
       'Primary_Country_Italy', 'Primary_Country_Japan',
       'Primary_Country_Mexico', 'Primary_Country_Netherlands',
       'Primary_Country_New Zealand', 'Primary_Country_Nigeria',
       'Primary_Country_Norway', 'Primary_Country_Pakistan',
       'Primary_Country_Peru', 'Primary_Country_Philippines',
       'Primary_Country_Poland', 'Primary_Country_South Africa',
       'Primary_Country_South Korea', 'Primary_Country_Spain',
       'Primary_Country_Sweden', 'Primary_Country_Thailand',
       'Primary_Country_Turkey', 'Primary_Country_United Arab Emirates',
       'Primary_Country_United Kingdom', 'Primary_Country_United States',
       'Multiple Production Companies','IMDb Votes_scaled','Netflix_popular_company',
       'IMDb Score_scaled','Actor_Weight','Wins_scaled', 'Nominations_scaled',
       "International"]

from sklearn.metrics.pairwise import linear_kernel
from scipy.sparse import hstack

weighted_high_features = 3 * netflix[high_weight].values
weighted_medium_features = 2 * netflix[medium_weight].values
weighted_low_features = 0.5 * netflix[low_weight].values

#setting a *5 weight 
weighted_desc_tfidf_matrix = 5 * desc_tfidf_matrix

#Combining all weighted features
combined_features = hstack([weighted_desc_tfidf_matrix,weighted_high_features,weighted_medium_features,weighted_low_features],format='csr')

#Computing similarity scores 
linear_sim_combined = linear_kernel(combined_features, combined_features)

In [21]:
#function takes movie title as input, finds index, get list of similarity scores for that movie, descending order, indices of top N movies, translate back to movie titles and present as movie recommendations.
def similar_film_v4(title):
    index = indices[title]
    similarity_scores = linear_sim_combined[index]
    sorted_similarity_scores = np.argsort(similarity_scores) #sorts the indices by order of similarity
    top_5_score = sorted_similarity_scores[-6:-1][::-1]

    top_5_films=[]
    for idx in top_5_score:
        top_5_films.append(netflix.iloc[idx]["Title"])
    return top_5_films

In [22]:
similar_film_v4("Zion")

['A Love Song for Latasha',
 'What Would Sophia Loren Do?',
 'John Was Trying to Contact Aliens',
 'Little Miss Sumo',
 'Becoming']

## Optimising the Recommendation Function (User Input - fuzzywuzzy string matching)

- Fuzzywuzzy uses Levenshtein distance to calculate the similarity between 2 strings.
- Levenshtein distance is the minimum number of changes required to convert a string into another(by inserting, deleting or replacing letters.)
- Fuzzywuzzy can improve how user friendly the recommendation widget will be. Even if a user inputs a Movie Title incorrectly, fuzzy wuzzy will map it to the most similar title.

In [23]:
from fuzzywuzzy import process

def closest_match(query, choices): #takes a query (string) and a list of choices to find the most similar string.
    best_match, score = process.extractOne(query, choices) #extractOne finds the best match from the choices list.
    if score > 80:
        return best_match 
    else:
        return None #If there is no best_match with a score >80, we want to return none as we dont want a completely unrelated movie being recommended.

def similar_film_v5(title):
    choices = netflix["Title"].to_list() #converts all netflix titles to list format.
    best_match = closest_match(title, choices) #Uses the closest match function to to retrieve the closest matching title.

    if best_match is None:  
        return "No Match Found"

    index = indices[best_match]  #fetch the index of this best matching title.
    similarity_scores = linear_sim_combined[index] #retrieves similarity scores of the best matching title against all other titles.
    sorted_similarity_scores = np.argsort(similarity_scores)
    top_5_score = sorted_similarity_scores[-6:-1][::-1]
    
    top_5_films = []
    for idx in top_5_score:
        top_5_films.append(netflix.iloc[idx]["Title"])
        
    return top_5_films




In [24]:
similar_film_v5("Schumacher")

['A Life of Speed: The Juan Manuel Fangio Story',
 'Tony Parker: The Final Shot',
 'David Attenborough: A Life on Our Planet',
 'Dick Johnson Is Dead',
 'The Speed Cubers']

## Embedded Recommendation Widget

- The below is an attempt to create an interactive widget interface for the recommendation function using ipywidgets and IPython.

In [25]:
import ipywidgets as widgets #used for the widgets
from IPython.core.display import HTML, clear_output #for the html link at the bottom of the page.
from IPython.display import display #for displaying the widgets.

#Text Box - sets up where users can enter text.
text_box = widgets.Text(
    value='', #sets the initial value in the text box
    placeholder='Enter a Netflix Original Title :',
    description = "Movie",
    disabled = False) #active, not disabled text box

#Button
button = widgets.Button(
    description = "Find Similar Netflix Originals",
    disabled = False,
    button_style = 'success', #green button
    tooltip = '', #nothing shows when hovering over the button.
    layout=widgets.Layout(width='300px'))

#Output
output = widgets.Output() #This code creates an output area for the widget.

#Defining the result of the button click itself
def button_clicked(button):
    with output: #the following code should send its ouput to the output widget
        clear_output()#clears existing output
        netflix_title = text_box.value #grabs whatever user input in text box widget
        print(f"Searching for: {netflix_title}")

        if netflix_title:
            similar_films = similar_film_v5(netflix_title) #links into our recommendation function.
            print("Recommended Movies:")
            for film in similar_films:
                print(film)
        else:
            print("No Match Found :(")


            
#link the button to the above function
button.on_click(button_clicked) #When the button is clicked, the button_clicked function is executed.

#display the widget
display(text_box, button, output)
display(HTML("<a href='https://en.wikipedia.org/wiki/List_of_original_films_distributed_by_Netflix'>List of Netflix Originals on Wikipedia</a>"))


Text(value='', description='Movie', placeholder='Enter a Netflix Original Title :')

Button(button_style='success', description='Find Similar Netflix Originals', layout=Layout(width='300px'), sty…

Output()

---
## Conclusions
- In todays age of mass-media, customers often experience having the 'problem' of having 'too much to choose from'.
- For this reason, online streaming platforms often use recommendation engines to push users to select content that they believe they will enjoy.
- This notebook attempted to recreate this process, limited to films that are branded as Netflix Originals.
- I believe this project was successful in emulating this process, as the function works by recommending similar titles(in my opinion - feedback would be appreciated and used to optimise the engine).

## Future Work + Drawbacks
- One downfall of this work is that it doesnt take a users previous watch history or likes in consideration.
- The model here only takes one input film to make a recommendation.
- The ability to successfully recommend a film a user would like would be greatly improved if:
    - A user could input more than one film title
    - The algorithm had access to their watch history and thumbs up/thumbs down scoring metrics
    - More general data from other users : What films did other users like if they watched a certain title?
    - Perhaps in the future these issues could be challenged to help develop a more developed recommendation engine.