<font size="5">**<center> Exercise 10.2: Recommender System**</font>

# Description of the recommendor system

This movie recommendation system is interactive. It allows the user to search for a movie they would like to watch. When the recommendation system receives the name of the movie the user would like to watch, it immediately outputs 10 recommendations for the other movies which they might want to watch. The recommendation is made based on the cosine similarity measure between the movie being searched and the other movies in the dataset. Essentially, the recommendor system outputs the movies with the 10 most highest cosine similarity measures relative to the searched movie. The following are the steps performed by the system in order to make a recommendation :

- Step 1 : Accepts the movie title entered by the user.
- Step 2 : Converts the entered title from character to numerical format for it to be processed by the system.
- Step 3 : Calculates the similarity between the entered title and other titles in the dataset using the cosine similarity method
- Step 4 : Identifies 10 movies which are very similar to the entered title based on the calculated similarity measure.
- Step 5 : Outputs the 10 movies as recommendation to the user.

The output of the system is the title of of the recommendation along with the genre.

# Libraries

In [1]:
# Importing the necessary library
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import re
from sklearn.feature_extraction.text import TfidfVectorizer
import ipywidgets as widgets
from IPython.display import display
from IPython.display import HTML

# Loading data 

In [2]:
# Loading the data from the local drive
movies_df = pd.read_csv("D:/Training/Bellevue/Predictive Analytics/Data/movies.csv")

In [3]:
# Checking the shape of dataframe
movies_df.shape

(86537, 3)

The dataframe has 86537 rows and 3 columns

In [4]:
# Checking the information
movies_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86537 entries, 0 to 86536
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   movieId  86537 non-null  int64 
 1   title    86537 non-null  object
 2   genres   86537 non-null  object
dtypes: int64(1), object(2)
memory usage: 2.0+ MB


For all three columns there are 86537 non-null count. Therefore there no missing values in the dataframe.

In [5]:
# Displaying first five observations
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


The data is properly loaded.

# Data Cleaning

In [6]:
def clean_title(title):
    # Function to remove special characters.
    return re.sub("[^a-zA-Z0-9 ]", "", title)

In [7]:
# Cleaning the title by removing special characters.
# This will make the movie search easy
movies_df["title"] = movies_df["title"].apply(clean_title)

# Vectorization

This allows getting distinct features out of the text for the model to train on, by converting text to numerical vectors.

In [8]:
# Converting titles into numbers so that they can be 
# to train the recommendation system.
vectorizer = TfidfVectorizer(ngram_range=(1,2)) # This will be used to convert 
                                                # titlesinto numbers
tfidf = vectorizer.fit_transform(movies_df["title"])

# Building the recommendation System

In [9]:
def search(title):
    '''This function accepts user input(movie) and computes its similarity
    with other movies in the dataset. It outputs the 10 most similar movies
    as a recommendation for the user.    
    '''
    # title being searched
    title = clean_title(title) 
    
    # vectorizing the title being searched.
    querry_vector = vectorizer.transform([title]) 
    
    # Calculating similarity between the searched
    # title and other titles in the dataset.
    similarity = cosine_similarity(querry_vector, tfidf).flatten()
    # identifying the 10 most similar movies to the one being searched.
    # The first movie is shown as the first observation.
    indices = np.argpartition(similarity, -11)[-11:]
    # Extracting the movies which are most similar to the searched movie
    results = movies_df.iloc[indices][::-1] 
    # Remove column name movie Id
    results = results.drop(['movieId'], axis=1)
    # Renaming the columns
    results.columns = ["Movie Title", "Genre"]
    
    # drop first movie as it is the one being searched. 
    #This only leaves the other 10
    # that are similar to it.
    results = results.iloc[1:, :]
    
    # Removing the indexing from dataframe.
    results = HTML(results.to_html(index=False))
    return results

# Iteractive Recommendation system

We will now make the recommendation system interactive using widgets.

In [10]:
# Input widget. This allows users to enter the movie they want to search
movie_input = widgets.Text(
                      value = 'Toy Story',# This is the default movie
                      description = "Input Movie:",
                      disabled = False)

# Creating the output widget
movie_list = widgets.Output()

# This function searches by calling the search function we created above
def on_type(data):
    with movie_list:
        movie_list.clear_output()
        title = data["new"]
        if len(title)>5:
            display(search(title))


# calling the input widget
movie_input.observe(on_type, names='value')

# Displaying the movies
display(movie_input,movie_list)


Text(value='Toy Story', description='Input Movie:')

Output()

References:
1. https://www.geeksforgeeks.org/recommendation-system-in-python/abs