# Chapter 6 - Recommender System

Have you ever wondered how different Netflix subscribers almost always find the movie or series that most interests them? Netflix’s recommender system is one of the most sophisticated recommender systems; it aims at finding personalized show recommendations with the minimum possible effort for the user.
Netflix is not the only one to use recommender systems. Facebook, Amazon, YouTube, and even some online retailers are using highly complicated recommender systems in order to increase their sales by targeting each user with the product recommendation that most interests that user.
Netflix’s recommender system increases the probability that a user would like a particular show based on three main factors:

The user’s viewing history.

What similar users have watched.

Information about the movies themselves including title, genre, categories, actors, etc.

## Types of recommender systems

Recommendation in any recommender system can be classified into three main types:

### Simple recommendation: a generalized approach to non-personalized recommendation.

Simple recommender systems are non-personalized recommender systems that do not care for information about the user, the user’s behavioral patterns, or the user’s purchase history. These systems are mostly popularity-based in which the top recommendations are simply the highest rated or the most purchased item in the dataset. This is the reason the simple recommender system is also known as a popularity recommender system.
One simple way to implement a popularity-based recommender system is to use maximum counts of ratings or purchases and always recommend the items with the highest count.

Using a movie recommender system as an example, a popularity-based recommender system would simply recommend the highest-rated movies.
For an online store recommender system, a popularity-based recommender system would recommend the most purchased items.
In this section, you do not need your computer. You will read and follow the example unless you want to run the code yourself step by step.

### Content-based recommendation: recommendation based on the characteristics of items.

Content-based filtering recommenders are recommender systems that aim at recommending items based on their attributes and their characteristics.
The item’s characteristics are described using a set of words. These words might occur frequently in the description of multiple items. If the profile of the user is built using the same words and using similarity measures, the similarity between different items could be calculated.
The recommender system then chooses the item with the highest similarity measure.
Click on the card to flip it and receive more detailed information about it. Click on the arrows to move on to the next card.

**Term frequency**

The total number of occurrences of a single word in multiple documents. It signifies the occurrence of the word in a document and gives higher weight when the frequency is greater. It is divided by document length to normalize.

**Inverse document frequency**

The count of documents compared to the term frequency through all documents. This is one method used to give a higher weight to rare words.

### Building a content-based recommender system

The content of the movies in our dataset is written in the overview attribute:



In [38]:
import pandas as pd

df = pd.read_csv("c://Users//cbeer//Desktop//data-science-learning/python-for-machine-learning//dat//netflix_titles.csv")


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,81145628,Movie,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby","Alan Marriott, Andrew Toth, Brian Dobson, Cole...","United States, India, South Korea, China","September 9, 2019",2019,TV-PG,90 min,"Children & Family Movies, Comedies",Before planning an awesome wedding for his gra...
1,80117401,Movie,Jandino: Whatever it Takes,,Jandino Asporaat,United Kingdom,"September 9, 2016",2016,TV-MA,94 min,Stand-Up Comedy,Jandino Asporaat riffs on the challenges of ra...
2,70234439,TV Show,Transformers Prime,,"Peter Cullen, Sumalee Montano, Frank Welker, J...",United States,"September 8, 2018",2013,TV-Y7-FV,1 Season,Kids' TV,"With the help of three human allies, the Autob..."
3,80058654,TV Show,Transformers: Robots in Disguise,,"Will Friedle, Darren Criss, Constance Zimmer, ...",United States,"September 8, 2018",2016,TV-Y7,1 Season,Kids' TV,When a prison ship crash unleashes hundreds of...
4,80125979,Movie,#realityhigh,Fernando Lebrija,"Nesta Cooper, Kate Walsh, John Michael Higgins...",United States,"September 8, 2017",2017,TV-14,99 min,Comedies,When nerdy high schooler Dani finally attracts...
...,...,...,...,...,...,...,...,...,...,...,...,...
6229,80000063,TV Show,Red vs. Blue,,"Burnie Burns, Jason Saldaña, Gustavo Sorola, G...",United States,,2015,NR,13 Seasons,"TV Action & Adventure, TV Comedies, TV Sci-Fi ...","This parody of first-person shooter games, mil..."
6230,70286564,TV Show,Maron,,"Marc Maron, Judd Hirsch, Josh Brener, Nora Zeh...",United States,,2016,TV-MA,4 Seasons,TV Comedies,"Marc Maron stars as Marc Maron, who interviews..."
6231,80116008,Movie,Little Baby Bum: Nursery Rhyme Friends,,,,,2016,,60 min,Movies,Nursery rhymes and original music for children...
6232,70281022,TV Show,A Young Doctor's Notebook and Other Stories,,"Daniel Radcliffe, Jon Hamm, Adam Godley, Chris...",United Kingdom,,2013,TV-MA,2 Seasons,"British TV Shows, TV Comedies, TV Dramas","Set during the Russian Revolution, this comic ..."


Scikit-Learn gives you a built-in TfIdfVectorizer class that produces the TF-IDF matrix using the following lines of code

In [40]:
from sklearn.feature_extraction.text import TfidfVectorizer

#Instantiate a new Vectorizer object 
tfidf = TfidfVectorizer(stop_words='english')

#Replace NaN with an empty string
df['cast'] = df['cast'].fillna('')

#This line transforms the description of the movies to the tfidf #matrix needed

tfidf_matrix = tfidf.fit_transform(df['cast'])


<6234x25076 sparse matrix of type '<class 'numpy.float64'>'
	with 89928 stored elements in Compressed Sparse Row format>

## Collaborative filtering recommendation: recommendation based on user ratings, prediction based on history.

### Popularity recommender system

In this mission, you will work with the Netflix dataset to practice implementing a recommender system that is non-personalized by popularity-based recommendation.

To review Netflix and Recommender Systems lesson, click here.

To complete this mission, perform the following task in the provided editor:

Create a popularity recommender system that recommends the oldest Movies.
Your code should find the oldest ten Movies releases.
Your code should return the recommendations in a data frame that shows the title and release_year respectively.
Note: To download the Netflix dataset click the ‘Resources’ button

In [14]:
import numpy as np 
import pandas as pd 
def main():
        
        dat = pd.read_csv("c://Users//cbeer//Desktop//data-science-learning/python-for-machine-learning//dat//netflix_titles.csv")

        recommendations = dat.sort_values('release_year', ascending=True).reset_index().loc[:, ['title', 'release_year']]
        print(recommendations.head(10))
        return recommendations.head(10)
main()




                                              title  release_year
0                 Pioneers: First Women Filmmakers*          1925
1                                    Prelude to War          1942
2                              The Battle of Midway          1942
3     Undercover: How to Operate Behind Enemy Lines          1943
4                Why We Fight: The Battle of Russia          1943
5                   WWII: Report from the Aleutians          1943
6  The Memphis Belle: A Story of a\nFlying Fortress          1944
7                                 The Negro Soldier          1944
8                                  Tunisian Victory          1944
9                                        San Pietro          1945


Unnamed: 0,title,release_year
0,Pioneers: First Women Filmmakers*,1925
1,Prelude to War,1942
2,The Battle of Midway,1942
3,Undercover: How to Operate Behind Enemy Lines,1943
4,Why We Fight: The Battle of Russia,1943
5,WWII: Report from the Aleutians,1943
6,The Memphis Belle: A Story of a\nFlying Fortress,1944
7,The Negro Soldier,1944
8,Tunisian Victory,1944
9,San Pietro,1945


In [19]:
 dat = pd.read_csv("c://Users//cbeer//Desktop//data-science-learning/python-for-machine-learning//dat//netflix_titles.csv")

 dat.sort_values('release_year').head(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
4292,81030762,TV Show,Pioneers: First Women Filmmakers*,,,,"December 30, 2018",1925,TV-PG,1 Season,TV Shows,This collection restores films from women who ...
2011,60027945,Movie,Prelude to War,Frank Capra,,United States,"March 31, 2017",1942,TV-PG,52 min,"Classic Movies, Documentaries",Frank Capra's documentary chronicles the rise ...
2013,60027942,Movie,The Battle of Midway,John Ford,"Henry Fonda, Jane Darwell",United States,"March 31, 2017",1942,TV-G,18 min,"Classic Movies, Documentaries",Director John Ford captures combat footage of ...
2022,80119186,Movie,Undercover: How to Operate Behind Enemy Lines,John Ford,,United States,"March 31, 2017",1943,TV-PG,61 min,"Classic Movies, Documentaries",This World War II-era training film dramatizes...
2023,70013050,Movie,Why We Fight: The Battle of Russia,"Frank Capra, Anatole Litvak",,United States,"March 31, 2017",1943,TV-14,82 min,Documentaries,This installment of Frank Capra's acclaimed do...
2026,70022548,Movie,WWII: Report from the Aleutians,John Huston,,United States,"March 31, 2017",1943,NR,45 min,Documentaries,Filmmaker John Huston narrates this Oscar-nomi...
2017,80119194,Movie,The Memphis Belle: A Story of a\nFlying Fortress,William Wyler,,United States,"March 31, 2017",1944,TV-PG,40 min,"Classic Movies, Documentaries",This documentary centers on the crew of the B-...
2019,80119191,Movie,The Negro Soldier,Stuart Heisler,,United States,"March 31, 2017",1944,TV-14,40 min,"Classic Movies, Documentaries",This documentary urged African Americans to en...
2021,80119189,Movie,Tunisian Victory,"Frank Capra, John Huston, Hugh Stewart, Roy Bo...",Burgess Meredith,"United States, United Kingdom","March 31, 2017",1944,TV-PG,76 min,"Classic Movies, Documentaries",British and American troops join forces to lib...
2012,80119188,Movie,San Pietro,John Huston,,United States,"March 31, 2017",1945,TV-14,32 min,"Classic Movies, Documentaries","After the Allies invade Italy, the Liri Valley..."


### Content-based recommender

In this mission, you will work with the Netflix dataset to practice implementing a recommender system that is personalized by content filtering using Pandas, Python, and TF-IDF vectorizer in Scikit-Learn. 

To review Netflix and Recommender Systems lesson, click here.

To complete this mission, perform the following task in the provided editor:

Create a content-based recommender system that recommends 5 movies.
Your code should perform the following task in the provided editor: write the code for the TF-IDF Vectorizer to include the title. (This means you will get movies with names close to the input. If you include the description in the TF-IDF Vectorizer you will get movies with similar plots.)
Note: To download the Netflix dataset click the ‘Resources’ button

In [41]:
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

def ContentBasedRecommender(title, indices, distance_matrix):
    df = pd.read_csv("c://Users//cbeer//Desktop//data-science-learning/python-for-machine-learning//dat//netflix_titles.csv")
    id_ = indices[title]
    distances = list(enumerate(distance_matrix[id_]))
    distances = sorted(distances, key=lambda x: x[1], reverse = True)
    distances = distances[1:6]
    recommendations = [distance[0] for distance in distances]
    return df['title'].iloc[recommendations]

def main():
    df = pd.read_csv("c://Users//cbeer//Desktop//data-science-learning/python-for-machine-learning//dat//netflix_titles.csv")
    #Instantiate a new Vectorizer object 
    tfidf = TfidfVectorizer(stop_words='english')

    #Replace NaN with an empty string
    df['cast'] = df['title'].fillna('')

    #This line transforms the description of the movies to the tfidf #matrix needed

    tfidf_matrix = tfidf.fit_transform(df['title'])
    distance_matrix = linear_kernel(tfidf_matrix)
    indices = pd.Series(df.index, index=df['title']).drop_duplicates()
    out=ContentBasedRecommender("Kong: King of the Apes", indices,distance_matrix)
    print(out)
    return out
main()


1419                       The King
3320             Love and Hong Kong
5238    Hong Kong West Side Stories
142                    King of Boys
2775                     King’s War
Name: title, dtype: object


1419                       The King
3320             Love and Hong Kong
5238    Hong Kong West Side Stories
142                    King of Boys
2775                     King’s War
Name: title, dtype: object

In [35]:
tfidf_matrix = TfidfVectorizer(df)

0     81145628    Movie      Norm of the North: King Sized Adventure   
1     80117401    Movie                   Jandino: Whatever it Takes   
2     70234439  TV Show                           Transformers Prime   
3     80058654  TV Show             Transformers: Robots in Disguise   
4     80125979    Movie                                 #realityhigh   
...        ...      ...                                          ...   
6229  80000063  TV Show                                 Red vs. Blue   
6230  70286564  TV Show                                        Maron   
6231  80116008    Movie       Little Baby Bum: Nursery Rhyme Friends   
6232  70281022  TV Show  A Young Doctor's Notebook and Other Stories   
6233  70153404  TV Show                                      Friends   

                      director  \
0     Richard Finn, Tim Maltby   
1                          NaN   
2                          NaN   
3                          NaN   
4             Fernando Lebrija   
...

## Memory-based collaborative approaches

A memory-based collaborative approach depends on using different methods of comparing users based on the interaction matrix directly, without using a latent model.
If you do not understand what a latent model is, do not worry! It will be discussed more in the next section.

### User-user method

The user-user memory-based approach depends on identifying the most similar users based on their interaction history, then suggesting items to a specific user that are most common among the user’s neighbours, giving priority to items that this user has not interacted with before.

To recommend an item to a specific user, you need to get the k-nearest neighbors of that user using some kind of a similarity measure that considers the similarity between two users who interacted with the same items in a similar manner.

After computing the nearest neighbour, the system can recommend the most popular items among these neighbours that have not yet been referenced in the user’s interaction matrix.

User-user methods tend to have more personalized recommendations than item-item methods because they search the neighbourhood of similar users before recommending. However, the method has high variance. If the users most similar to this user have each only interacted with one other item, these uncommon items are highly weighted, making the recommendations’ variance considerably high.

### Item-item method

The item-item method depends on identifying similar items as items that most users have interacted with in a similar manner.

The system recommends the user items similar to those the user has already interacted with positively.

## Model-base collaborative approaches

Model-based collaborative approaches assume a latent–intermediate hidden model that explains the interactions in the user-item interaction matrices.
One important example is the idea of matrix factorization. This depends on having an intermediate model that decomposes the sparse interaction matrices into two dense matrices

### Collaborative filtering recommender
Surprise is an easy-to-use Python Scikit-Learn building and analyzing package for recommender systems. You can easily install it using the following command in Anaconda:


`pip install scikit-surprise`

You will use the Surprise library to implement a collaborative filter recommender system based on matrix factorization.
Singular value decomposition (SVD) is one of the most common dimensionality reduction techniques available. It assumes that higher dimensional spaces could be represented by dense matrices in a lower dimensional space. So, singular value decomposition is one algorithm used to perform matrix factorization on user-item interaction matrices.
In this section, you do not need your computer. You will read and follow the example unless you want to run the code yourself step by step.

You will use the dataset `ratings_small.csv` extracted from the Movie Lens dataset we previously discussed.

The timestamp column is not relevant to our problem so you could drop it using the drop() function in Pandas

```

ratings = ratings.drop(‘timestamp’, axis = 1)

```
Split your dataset into training and testing sets.

```
from surprise.model_selection import train_test_split
trainset, testset = train_test_split(data, test_size=0.25)
```


Train an instance of Surprise’s SVD() class for matrix factorization.
You can then test the output predicted ratings of the system and compare with the real ratings of the users to get the root mean square error (RMSE) measure.
Note:
Svd.test() returns structures that contain multiple parameters including the real values and the model-predicted ones.
Hence, using the accuracy.rmse() function we can directly calculate the RMSE value for this problem.

```
svd = SVD()
svd.fit(trainset)
predictions = svd.test(testset)
accuracy.rmse(predictions)
```

Predict a rating for a single user

```

p = svd.predict(1,230)[3]

```


In this mission, you will work with the Netflix dataset to practice implementing a collaborative filtering recommender system. You will use the Surprise library’s implementation of Singular Value Decomposition (SVD) to implement the recommender system. 

To review Netflix and Recommender Systems lesson, click here.

To complete this mission, perform the following task in the provided editor:

Create a collaborative based recommender system that predicts the release year for the movie ‘Transformers Prime’ with date added September 8, 2018 .
Your code should perform the following tasks in the provided editor:
Step 1: Write the code for using Surprise library and SVD() for matrix factorization. 
Step 2: Split the data into 60% training and 40% testing. 
Step 3: Print the root mean squared error (RMSE) value of the testing set.
Note: To download the Netflix dataset click the ‘Resources’ button

In [7]:
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split
import pandas as pd
import surprise

df = pd.read_csv("c://Users//cbeer//Desktop//data-science-learning/python-for-machine-learning//dat//netflix_titles.csv")
df = df[['title','date_added', 'release_year']]
#fitting the model
oldest = df['release_year'].min()
newest = df['release_year'].max()
print("Range: {0} to {1} ".format(oldest,newest))
reader = surprise.Reader(rating_scale = (1925,2020))
data = surprise.Dataset.load_from_df(df,reader)

# sample random trainset and testset

from surprise.model_selection import train_test_split

# test set is made of 40%.

trainset, testset = train_test_split(data, test_size=0.4)

# We'll use the famous SVD algorithm.
# Train the algorithm on the trainset, and predict ratings for the testset

svd = SVD()
svd.fit(trainset)
predictions = svd.test(testset)

# Then compute RMSE

accuracy.rmse(predictions)



Range: 1925 to 2020 
RMSE: 7.8556


7.8556392071222785