## Content-based Recommender Systems

*Prepared by:*
**Jude Michael Teves**  
Faculty, Software Technology Department  
College of Computer Studies - De La Salle University

This notebook is for demonstrating how to do a simple content-based recommendation.

## Preliminaries

### Import library

In [1]:
import pandas as pd

### Load Data

We will be using the MovieLens dataset here. I have already preprocessed the data so it will be easier for us to process later on.

In [2]:
df_ratings = pd.read_csv('https://raw.githubusercontent.com/Cyntwikip/data-repository/main/movielens_movie_ratings.csv')
df_ratings.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


In [3]:
df_genres = pd.read_csv('https://raw.githubusercontent.com/Cyntwikip/data-repository/main/movielens_movie_genres.csv')
df_genres.head()

Unnamed: 0,movieId,title,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,...,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,Toy Story (1995),0,1,1,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2,Jumanji (1995),0,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3,Grumpier Old Men (1995),0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,0
3,4,Waiting to Exhale (1995),0,0,0,0,1,0,0,1,...,0,0,0,0,0,1,0,0,0,0
4,5,Father of the Bride Part II (1995),0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Exercise

- Build the Item Profile matrix.
- Let's focus on userId 1. Compute the user profile.  
- Ignore the ratings for now. Recommend movies that the user has not watched based on the genres.  

Hint! Use the following import to compute the similarity.

In [4]:
from sklearn.metrics.pairwise import cosine_similarity

### Building the Item Profile matrix

In [5]:
df_item = df_genres.drop('title', axis=1).set_index('movieId')
df_item

Unnamed: 0_level_0,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1,0,1,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0
5,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193581,1,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
193583,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
193585,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
193587,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Computing the User Profile 

In [6]:
user_likes = df_ratings.query("userId==1")['movieId']
user_likes

0         1
1         3
2         6
3        47
4        50
       ... 
227    3744
228    3793
229    3809
230    4006
231    5060
Name: movieId, Length: 232, dtype: int64

In [7]:
user_profile = df_item.loc[user_likes].mean(axis=0)
user_profile

Action         0.387931
Adventure      0.366379
Animation      0.125000
Children       0.181034
Comedy         0.357759
Crime          0.193966
Documentary    0.000000
Drama          0.293103
Fantasy        0.202586
Film-Noir      0.004310
Horror         0.073276
IMAX           0.000000
Musical        0.094828
Mystery        0.077586
Romance        0.112069
Sci-Fi         0.172414
Thriller       0.237069
War            0.094828
Western        0.030172
dtype: float64

Take note of the top genres here. You should be seeing that the recommended movies have these genres, more or less.

In [8]:
user_profile.sort_values(ascending=False).head()

Action       0.387931
Adventure    0.366379
Comedy       0.357759
Drama        0.293103
Thriller     0.237069
dtype: float64

### Retrieving Similar Items

In [9]:
df_scores = df_genres.copy()
scores = cosine_similarity(df_item, user_profile.values.reshape(1,-1)).reshape(-1)
df_scores['similarity'] = scores
df_scores.head()

Unnamed: 0,movieId,title,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,...,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,similarity
0,1,Toy Story (1995),0,1,1,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0.634702
1,2,Jumanji (1995),0,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0.498514
2,3,Grumpier Old Men (1995),0,0,0,0,1,0,0,0,...,0,0,0,0,1,0,0,0,0,0.382473
3,4,Waiting to Exhale (1995),0,0,0,0,1,0,0,1,...,0,0,0,0,1,0,0,0,0,0.507109
4,5,Father of the Bride Part II (1995),0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0.411876


The recommended movies below are consistent with our **User Profile**.

In [12]:
df_scores_sorted = df_scores.sort_values('similarity', ascending=False)
df_scores_filtered = df_scores_sorted.query(f"movieId not in {user_likes.values.tolist()}")
df_scores_filtered.head(10)

Unnamed: 0,movieId,title,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,...,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,similarity
8597,117646,Dragonheart 2: A New Beginning (2000),1,1,0,0,1,0,0,1,...,0,0,0,0,0,0,1,0,0,0.867076
6570,55116,"Hunting Party, The (2007)",1,1,0,0,1,0,0,1,...,0,0,0,0,0,0,1,0,0,0.84553
3608,4956,"Stunt Man, The (1980)",1,1,0,0,1,0,0,1,...,0,0,0,0,1,0,1,0,0,0.824532
4681,6990,The Great Train Robbery (1978),1,1,0,0,1,1,0,1,...,0,0,0,0,0,0,0,0,0,0.823337
4005,5657,Flashback (1990),1,1,0,0,1,1,0,1,...,0,0,0,0,0,0,0,0,0,0.823337
9394,164226,Maximum Ride (2016),1,1,0,0,1,0,0,0,...,0,0,0,0,0,1,1,0,0,0.810351
3526,4818,Extreme Days (2001),1,1,0,0,1,0,0,1,...,0,0,0,0,0,0,0,0,0,0.808866
5471,26184,"Diamond Arm, The (Brilliantovaya ruka) (1968)",1,1,0,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0.794487
7409,80219,Machete (2010),1,1,0,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0.794487
5379,8968,After the Sunset (2004),1,1,0,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0.794487


## End
<sup>made by **Jude Michael Teves**</sup> <br>
<sup>for comments, corrections, suggestions, please email:</sup><sup> <href>judemichaelteves@gmail.com</href> or <href>jude.teves@dlsu.edu.ph</href></sup><br>