<div style="text-align:center"><h1> Collaborative Filtering </h1> </div>

The content bases system that we develop does the good job for suggesting the movies
based on the input metadata but it fails to serve the personalized suggestions,
As every user is different, more personalized recommendations can boost the user engagement. 

Content based filtering will suggest the same movies to all users, to make it more
personalized experience we will explore the Collaborative Filtering - 

<img src="https://upload.wikimedia.org/wikipedia/commons/5/52/Collaborative_filtering.gif" width=350 />
<p style="text-align:center"><small>&copy; <a href="https://en.wikipedia.org/wiki/Recommender_system" >wikipedia</a></small></p>

Two Main Approaches - 

      1. User based filtering
      2. Item base filtering 

###### User based filtering 
This system recommends the products that similar users have like.
The similarity between two users can be calculated by cosine similarity or
pearson correlation.

One of the major issue with the approach is that, the user preference changes over the 
time, so pre computing the recommendation based on the similar user could lead to a poor
performance.

###### Item base filtering

Instead of the similar user choices, in Item based CF we recommend the similar items that
user have rated, it is more static, and it could be computed by cosine or pearson as well.


Above both approach has issue with the scalability and sparsity, as the user and movie size increases. 

#### Single Value Decomposition  

One way to handle the issue of scalability and sparsity is the use the <b> Latent factor Model </b>.

We want to convert our recommendation problem into optimization problem. 
We can view it as, how good we're with predicting the ratings for the items given
to the user.

For that we can use Root Mean Square Error as error matrix, lower the RMSE better
our prediction.

What is Latent factor?

It is an broad idea that describes a property that the user or item has. 
e.g. For the music, the particular music could belong to specific genre. 

SVD decreases the dimensionality of utility matrix, by extraction it's latent
factors. essentially we map each user and item into the latent space in dimension r.

It helps us to understand the relation between user and the item as it becomes 
easy to directly compare them.

<img src = "https://kevinkolcheck.com/wp-content/uploads/2017/12/latent-factors.png" >

<p style="text-align:center"><small>&copy; <a href="https://kevinkolcheck.com/portfolio/recommender-system-matrix-factorization-alternating-least-squares/" >kevinkolcheck</a></small></p>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# we will use Surprise for SVD

rating = pd.read_csv('../DATA/ratings_small.csv')
rating.head()
# movies are rated on scale of 5

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [3]:
from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate

In [4]:
reader = Reader()

In [5]:
data = Dataset.load_from_df(rating[["userId","movieId","rating"]],reader=reader)

In [6]:
svd = SVD()

In [7]:
cross_validate(svd,data,measures=["RMSE","MAE"])

{'test_rmse': array([0.89901866, 0.88703058, 0.89311141, 0.89836594, 0.89797311]),
 'test_mae': array([0.69265941, 0.68094006, 0.68777561, 0.69287071, 0.69137591]),
 'fit_time': (6.077236890792847,
  5.701161861419678,
  5.565208196640015,
  5.7645111083984375,
  5.539697885513306),
 'test_time': (0.16824913024902344,
  0.20826506614685059,
  0.2850360870361328,
  0.1445457935333252,
  0.21256613731384277)}

In [8]:
# RMSE 0.89 is good, let's train our dataset and try to predict

train = data.build_full_trainset()
svd.fit(train)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x11bab5f10>

In [9]:
# let's check for one user, and check what he/she is rated
rating[rating['userId'] == 1]

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
5,1,1263,2.0,1260759151
6,1,1287,2.0,1260759187
7,1,1293,2.0,1260759148
8,1,1339,3.5,1260759125
9,1,1343,2.0,1260759131


In [10]:
# first check for the existing one
svd.predict(1,31,3)
# real - 2.5 # predicted - 2.37 which is close

Prediction(uid=1, iid=31, r_ui=3, est=2.4360629569775503, details={'was_impossible': False})

In [11]:
svd.predict(1,1029,3)
# real and predicted both are same = 3

Prediction(uid=1, iid=1029, r_ui=3, est=2.773331431478292, details={'was_impossible': False})

In [12]:
# our model is doing great Job at predicting the movies


In [13]:
svd.predict(1, 302, 3)

Prediction(uid=1, iid=302, r_ui=3, est=2.637185517602041, details={'was_impossible': False})

Here for the movie Id 302, we're getting rating of 2.6, the amazing feature of this model is that, it does not care, what the movie is all about or what it contains, it is purely based on the other user's rating. 

In [14]:
# we should use the Hybrid system for content recommendation

In [15]:
# Thank You!