## Data Collection

- **Explicit data** is information that is provided intentionally, i.e. input from the users such as movie ratings
- **Implicit data** is information that is not provided intentionally but gathered from available data streams like search history, clicks, order history, etc

## Case 1: Recommend the most popular items

- Recommend the items which are liked by most number of users
- There is no personalization involved with this approach
- Surprisingly, this approach still works in places like news portals with a column of “Popular News” which is subdivided into sections and the most read articles of each sections are displayed

## Case 2: Segmentation

- We can divide the users into multiple segments based on their preferences (user features) and recommend items to them based on the segment they belong to

## Case 3: Using a classifier to make recommendation

- We need to define some parameters (features) of the user and the item
- The outcome can be 1 if the user likes it or 0 otherwise
- Incorporates personalization
- It can work even if the user’s past history is short or not available
- The features might actually not be available or even if they are, they may not be sufficient to make a good classifier
- As the number of users and items grow, making a good classifier will become exponentially difficult

## Case 4: Recommendation Algorithms

### Content based algorithms

- <b> Idea: </b> If you like an item then you will also like a “similar” item
- It generally works well when its easy to determine the context/properties of each item. For instance when we are recommending the same kind of item like a movie recommendation or song recommendation.
- In content-based recommender systems, the descriptive attributes of items are used to make recommendations. The term “content” refers to these descriptions. In content-based methods, the ratings and buying behavior of users are combined with the content information available in the items.

<img src="images/Content Based Filtering.jpg">

***Movie Recommendation Example***

- **Profile vector:** It contains the past behavior of the user, i.e the movies liked/disliked by the user and the ratings given by them
- **Item vector:** It contains the details of each movie, like genre, cast, director etc. 

Similarity between profile vector and item vector is calculated using following methods

- **Cosine Similarity:** Cosine of the angle between the profile vector (A) and item vector (B). Closer the vectors, smaller will be the angle and larger the cosine. The value ranges between -1 to 1.
<img src="images/Cosine Similarity.jpg">
- **Euclidean Distance** Similar items will lie in close proximity to each other if plotted in n-dimensional space.
<img src="images/Euclidean Distance.jpg">
- **Perason's Correlation**  It tells us how much two items are correlated.
<img src="images/Correlation.jpg">

The movies are arranged in descending order and one of the two below approaches is used for recommendations.
- **Top-n approach:** Where the top n movies are recommended
- **Rating scale approach:** Where a threshold is set and all the movies above that threshold are recommended

A major drawback of this algorithm is that it will never recommend products which the user has not bought or liked in the past. So if a user has watched or liked only action movies in the past, the system will recommend only action movies.

To improve on this type of system, we need an algorithm that can recommend items not just based on the content, but the behavior of users as well

### Collaborative filtering

#### User-User collaborative filtering

This algorithm first finds the similarity score between users. Based on this similarity score, it then picks out the most similar users and recommends products which these similar users have liked or bought previously

<img src = "Images/User-User Collaborative Filtering.jpg">

For the m×n ratings matrix R = $[r_{uj}]$ with m users and n items, let $I_{u}$ denote the set of item indices for which ratings have been speciﬁed by user (row) u. <br>
Therefore, the set of items rated by both users u and v is given by $I_{u} \cap I_{v}$.

- The ﬁrst step is to compute the mean rating $μ_{u}$ for each user u using her speciﬁed ratings
<img src="Images/User Ratings Mean.jpg">
- Then, the Pearson correlation coeﬃcient (similarity) between the rows (users) u and v is deﬁned as follows
<img src = "Images/User User Similarity.jpg?123456">
- Similarity computation is tricky because diﬀerent users may have diﬀerent scales of ratings. One user might be biased toward liking most items, whereas another user might be biased toward not liking most of the items.  Therefore, the raw ratings need to be mean-centered in row-wise fashion
- The Pearson coefficient is computed between the target user and all the other users. 
- One way of deﬁning the peer group of the target user would be to use the set of k users with the highest Pearson coeﬃcient with the target. 
- Since the number of observed ratings in the top-k peer group of a target user may vary signiﬁcantly with the item at hand, the closest k users are found for the target user separately for each predicted item, such that each of these k users have speciﬁed ratings for that item. 
- The weighted average of these ratings can be returned as the predicted rating for that item. Here, each rating is weighted with the Pearson correlation coefficient of its owner to the target user. 
- As before, the weighted average of the mean-centered rating of an item in the top-k peer group of target user u is used to provide a mean-centered prediction.
- The mean rating of the target user is then added back to this prediction to provide a raw rating prediction $\hat{r}_{uj}$ of target user u for item j. 

Let $P_{u}(j)$ be the set of k closest users to target user u, who have speciﬁed ratings for item j

<img src="Images/User-User Prediction.jpg?123">

- This algorithm, first of all calculates the similarity between each user based on the ratings and then based on each similarity calculates the predictions
- Based on these prediction values, recommendations are made.

*User Item Matrix*

<img src="Images/User Item Matrix.jpg?123" width="750">

In this case, the ratings of ﬁve users 1...5 are indicated for six items denoted by 1...6. <br>
Consider the case where target user index is 3, and we want to make item predictions on the basis of ratings in 2.1. <br>
We need to compute $\hat{r}_{31}$ and $\hat{r}_{36}$ of user 3 for items 1 and 6 in order to determine the top recommended item.
- The first step is to compute the similarity between user 3 and all the other users.

<img src = "Images/User User Similarity Example.jpg" width="600">

- The top 2 closest users to user 3 are users 1 and users 2 according to both measures. 
- By using pearson weighted average of the raw ratings of users 1 and 2,the following predictions are obtained for user 3 with respect to unrated items 1 and 6

<img src = "Images/User User Prediction Example 1.jpg" width = "300">

- Thus, item 1 should be prioritized over item 6 as a recommendation to user 3.
- The predicted ratings for user 3 are of very high values compared to other ratings by user 3. This is because of the peer group {1, 2} users which have very high ratings compared to user 3.
- To avoid such scenarios, we can use mean centered ratings for prediction.

<img src = "Images/User User Prediction Example 2.jpg" width = "400">

-  The mean-centering process enables a much better relative prediction with respect to the ratings that have already been observed

<img src="Images/User Item Matrix 2.jpg?123" width="750">

This algorithm is quite time consuming as it involves calculating the similarity for each user and then calculating prediction for each similarity score. One way of handling this problem is to select neighbors by
- Select a threshold similarity and choose all the users above that value
- Randomly select the users
- Arrange the neighbors in descending order of their similarity value and choose top-N users
- Use clustering for choosing neighbors

#### Item-Item collaborative filtering
In this algorithm, we compute the similarity between each pair of items <br>
It is effective when the number of users is more than the items being recommended

<img src="Images/Item-Item Collaborative Filtering.jpg">

In item-based collaborative filtering, peer groups are constructed in terms of items rather than users. <br>
As in the case of user-based ratings, the average rating of each item in the ratings matrix is subtracted from each rating to create a mean-centered matrix. 

Let $U_{i}$ be the indices of the set of users who have specified ratings for item i. <br>
Then the adjusted cosine similarity between the items (columns) i and j is defined as follows:


<img src = "Images/Cosine Similarity Item Based.jpg?lastmod = 123" width = "500">

As mean-centered ratings ($S_{ui}$) are used, similarity is called Adjusted Cosine. <br> 
Although the Pearson correlation can also be used on the columns in the case of the item-based method, the adjusted cosine generally provides superior results.

Consider the case of determining rating of item t for user u.
- The first step is to determine the top-k most similar items to item t based on the adjusted cosine similarity
- Let the top-k matching items to item t, for which the user u as specified ratings be denoted by $Q_{t}(u)$.
- The weighted average of these (raw) ratings is reported as the predicted value.
- The weight of item j in this average is equal to the adjusted cosine similarity between item j and the target item t.
- Therefore, the predicted rating $\hat{r}_{ut}$ of user u for target item t as follows

<img src="Images/Item-Item Prediction.jpg?lastmod=12345678">

*User Item Matrix*

<img src="Images/User Item Matrix.jpg?123" width="750">

<img src="Images/User Item Matrix 2.jpg?123" width="750">

The missing ratings of user 3 are predicted with the item-based algorithm as the ratings of items 1 and 6 are missing for user 3, the similarity of the columns for items 1 and 6 needs to be computed with respect to the other columns (items)
- First, the similarity between items are computed after adjusting for mean-centering.
- The value of the adjusted cosine between items is calculated as 

<img src = "Images/Item Item Similarity Example.jpg">

- It is evident that items 2 and 3 are most similar to item 1, whereas items 4 and 5 are most similar to item 6.
- Therefore, the weighted average of raw ratings of user 3 for items 2 and 3 is used to predict the rating $\hat{r}_{31}$ of item 1, whereas the weighted average of the raw ratings of user 3 for items 4 and 5 is used to predict the rating $/hat{r}_{36}$ of item 6:

<img src = "Images/Item Item Prediction Example 1.jpg">

- Thus, the item-based method also suggests that item 1 is more likely to be preferred by user 3 than item 6. However, in this case, because the ratings are predicted using the ratings of user 3 herself, the predicted ratings tend to be much more consistent with the other ratings of this user.
- The greater prediction accuracy of the item-based method is its main advantage

#### Cold start
What will happen if a new user or a new item is added in the dataset?
- Visitor Cold Start: To avoid this, we can apply a popularity based strategy. Once we know the user preferences, recommending products will be easier. 
- Product Cold Start: To avoid this, we can use the contect of the product (content based filtering) for recommendations and the eventually the user actions on the product

## Case Study in Python using the MovieLens Dataset

In [2]:
import pandas as pd

In [3]:
# Reading users file:
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('Datasets/ml-100k/u.user', sep = '|', names = u_cols, encoding = 'latin-1')
print(users.shape)
users.head()

(943, 5)


Unnamed: 0,user_id,age,sex,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


In [4]:
# Reading ratings file:
r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv('Datasets/ml-100k/u.data', sep = '\t', names = r_cols, encoding = 'latin-1')
print(ratings.shape)
ratings.head()

(100000, 4)


Unnamed: 0,user_id,movie_id,rating,unix_timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [5]:
# Reading items file:
i_cols = ['movie id', 'movie title' ,'release date','video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
 'Animation', 'Children\'s', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy',
 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']
items = pd.read_csv('Datasets/ml-100k/u.item', sep = '|', names = i_cols, encoding = 'latin-1')
print(items.shape)
items.head()

(1682, 24)


Unnamed: 0,movie id,movie title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


In [6]:
# Train and Test sets
r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings_base = pd.read_csv('Datasets/ml-100k/ua.base', sep = '\t', names = r_cols, encoding = 'latin-1')
ratings_test = pd.read_csv('Datasets/ml-100k/ua.test', sep = '\t', names = r_cols, encoding = 'latin-1')
ratings_base.shape, ratings_test.shape

((90570, 4), (9430, 4))

### Build collaborative filtering model from scratch

Recommend movies based on user-user similarity and item-item similarity

In [7]:
# unique users and movies
n_users = ratings.user_id.unique().shape[0]
n_movies = ratings.movie_id.unique().shape[0]

In [9]:
import numpy as np

In [10]:
for line in ratings.itertuples():
    print(line)
    break

Pandas(Index=0, user_id=196, movie_id=242, rating=3, unix_timestamp=881250949)


In [11]:
# user-item matrix
data_matrix = np.zeros((n_users,n_movies))
for line in ratings.itertuples():
    data_matrix[line[1]-1,line[2]-1] = line[3]

In [12]:
data_matrix.shape

(943, 1682)

In [13]:
data_matrix[:5,:5]

array([[5., 3., 4., 3., 3.],
       [4., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [4., 3., 0., 0., 0.]])

In [14]:
# calculate the similarity
from sklearn.metrics.pairwise import pairwise_distances
user_similarity = pairwise_distances(data_matrix, metric = 'cosine')
item_similarity = pairwise_distances(data_matrix.T, metric = 'cosine')

In [17]:
user_similarity

array([[0.        , 0.83306902, 0.95254046, ..., 0.85138306, 0.82049212,
        0.60182526],
       [0.83306902, 0.        , 0.88940868, ..., 0.83851522, 0.82773219,
        0.89420212],
       [0.95254046, 0.88940868, 0.        , ..., 0.89875744, 0.86658385,
        0.97344413],
       ...,
       [0.85138306, 0.83851522, 0.89875744, ..., 0.        , 0.8983582 ,
        0.90488042],
       [0.82049212, 0.82773219, 0.86658385, ..., 0.8983582 , 0.        ,
        0.81753534],
       [0.60182526, 0.89420212, 0.97344413, ..., 0.90488042, 0.81753534,
        0.        ]])

In [18]:
item_similarity

array([[0.        , 0.59761782, 0.66975521, ..., 1.        , 0.95281693,
        0.95281693],
       [0.59761782, 0.        , 0.72693082, ..., 1.        , 0.92170064,
        0.92170064],
       [0.66975521, 0.72693082, 0.        , ..., 1.        , 1.        ,
        0.90312495],
       ...,
       [1.        , 1.        , 1.        , ..., 0.        , 1.        ,
        1.        ],
       [0.95281693, 0.92170064, 1.        , ..., 1.        , 0.        ,
        1.        ],
       [0.95281693, 0.92170064, 0.90312495, ..., 1.        , 1.        ,
        0.        ]])

In [19]:
user_similarity.shape, item_similarity.shape

((943, 943), (1682, 1682))

In [20]:
mean_user_rating = data_matrix.mean(axis = 1)
mean_user_rating.shape

(943,)

In [21]:
mean_user_rating[:, np.newaxis].shape

(943, 1)

In [22]:
data_matrix - mean_user_rating[:, np.newaxis]

array([[ 4.41617122,  2.41617122,  3.41617122, ..., -0.58382878,
        -0.58382878, -0.58382878],
       [ 3.86325803, -0.13674197, -0.13674197, ..., -0.13674197,
        -0.13674197, -0.13674197],
       [-0.08977408, -0.08977408, -0.08977408, ..., -0.08977408,
        -0.08977408, -0.08977408],
       ...,
       [ 4.9470868 , -0.0529132 , -0.0529132 , ..., -0.0529132 ,
        -0.0529132 , -0.0529132 ],
       [-0.20035672, -0.20035672, -0.20035672, ..., -0.20035672,
        -0.20035672, -0.20035672],
       [-0.34066587,  4.65933413, -0.34066587, ..., -0.34066587,
        -0.34066587, -0.34066587]])

In [23]:
# Function to make predictions
def predict(ratings, similarity, type = 'user'):
    if type == 'user':
        mean_user_rating = ratings.mean(axis = 1)
        ratings_diff = (ratings - mean_user_rating[:, np.newaxis])
        pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff)/np.array([np.abs(similarity).sum(axis = 1)]).T
    elif type == 'item':
        pred = ratings.dot(similarity)/np.array([np.abs(similarity).sum(axis = 1)])
    return pred

In [24]:
user_prediction = predict(data_matrix, user_similarity, type = 'user')
item_prediction = predict(data_matrix, item_similarity, type = 'item')

In [25]:
user_prediction.shape

(943, 1682)

In [26]:
user_prediction

array([[ 2.06532606,  0.73430275,  0.62992381, ...,  0.39359041,
         0.39304874,  0.3927712 ],
       [ 1.76308836,  0.38404019,  0.19617889, ..., -0.08837789,
        -0.0869183 , -0.08671183],
       [ 1.79590398,  0.32904733,  0.15882885, ..., -0.13699223,
        -0.13496852, -0.13476488],
       ...,
       [ 1.59151513,  0.27526889,  0.10219534, ..., -0.16735162,
        -0.16657451, -0.16641377],
       [ 1.81036267,  0.40479877,  0.27545013, ..., -0.00907358,
        -0.00846587, -0.00804858],
       [ 1.8384313 ,  0.47964837,  0.38496292, ...,  0.14686675,
         0.14629808,  0.14641455]])

In [27]:
item_prediction.shape

(943, 1682)

In [28]:
item_prediction

array([[0.44627765, 0.475473  , 0.50593755, ..., 0.58815455, 0.5731069 ,
        0.56669645],
       [0.10854432, 0.13295661, 0.12558851, ..., 0.13445801, 0.13657587,
        0.13711081],
       [0.08568497, 0.09169006, 0.08764343, ..., 0.08465892, 0.08976784,
        0.09084451],
       ...,
       [0.03230047, 0.0450241 , 0.04292449, ..., 0.05302764, 0.0519099 ,
        0.05228033],
       [0.15777917, 0.17409459, 0.18900003, ..., 0.19979296, 0.19739388,
        0.20003117],
       [0.24767207, 0.24489212, 0.28263031, ..., 0.34410424, 0.33051406,
        0.33102478]])

## Basics of Matrix Factorization

The key idea in dimensionality reduction tehcniques is that the reduced, rotated, and compeletely specified representation can be robuslty estimated from an incomplete data matrix. <br>
Once the completely specified representation has been obtained, one can rotate it back to the original axis system in order to obtain the fully specified representation. <br>
Under the covers, dimensionality reduction methods leverage the row and column correlations to create the fully specified and reduced representation. <br>
Matrix Factorization methods provide a neat way to leverage all row and column correlations in one shot to estimate the entire data matrix.

### Basic Matrix Factorization Principles 

In the basic matrix factorization model, the m * n ratings matrix R is approximately factorized into m * k matrix U and n * k matrix V, as $R \approx UV^{T}$

Each column of U (or V) is referred to as a latent vector or latent component, whereas each row of U (or V) is referred to as a latent factor.

<img src = "Images\Matrix Factorization Example.jpg">

The ith row $\vec{u_{i}}$ of U is referred to as a user factor, and it contains k entries corresponding to the affinity of user i towards the k concepts in the ratings matrix.
- For example, in the case above, $\vec{u_{i}}$ is a 2-dimensional vector containig the affinity of user i towards the history and romance genres in the ratings matrix.

Each row $\vec{v_{i}}$ of V is referred to as an item factor, and it respresents the affinity of ith item towards these k concepts. 
- For example, in the case above, $\vec{v_{i}}$  conatins the affinity of the item towards the two categories of movies

Each rating $r_{ij}$ in R can be approximately expressed as a dot product of the ith user factor and jth item factor:
$r_{ij} \approx   \vec{u_{i}} .  \vec{v_{i}} $

Since the latent factors $\vec{u_{i}}$ and $\vec{v_{i}}$ can be viewed as the affinites of the users for k different concepts

<img src = "Images\Matrix Factorization Rating.JPG" widht = "100">

In the case above, the summation is expressed as 
<img src = "Images\Matrix Factorization Rating Example.JPG" widht = "100">

#### How can we determine the factor matrices U and V so that the fully specified matrix R matches $UV^{T}$ as closely as possible

Optimzation problem with respect to the matrices U and V 

<img src = "Images\Matrix Factorization Optimization.JPG" width = "400">

A variety of gradient descent methods can be used to provide an optimal solution to this factorization

Let the set of all user-item pairs (i, j), which are observed in R, be denoted by S. Here i is index of a user and j is index of an item.
The set S of observed user-item pair s defined as follows:
    
<img src = "Images\Matrix Factorization Set.JPG" width = "400">

<img src = "Images\Matrix Factorization Optimization 1.JPG" width = "500">

Objective function sums up the error only over the observed entries in S

Here $u_{is}$ and $v_{js}$ are the unknown variables which need to be learned to minimize the objective function. This can be acheived by gradient descent methods.

<img src = "Images\Matrix Factorization Gradient Descent.JPG" width = "500">

<img src = "Images\Matrix Factorization Gradient Descent Algorithm.JPG" width = "750">

### Building a recommendation engine using matrix factorization

In [58]:
class MF():

    # Initializing the user-movie rating matrix, no. of latent features, alpha and beta.
    def __init__(self, R, K, alpha, beta, iterations):
        self.R = R
        self.num_users, self.num_items = R.shape
        self.K = K
        self.alpha = alpha
        self.beta = beta
        self.iterations = iterations

    # Initializing user-feature and movie-feature matrix 
    def train(self):
        self.P = np.random.normal(scale=1./self.K, size=(self.num_users, self.K))
        self.Q = np.random.normal(scale=1./self.K, size=(self.num_items, self.K))

        # Initializing the bias terms
        self.b_u = np.zeros(self.num_users)
        self.b_i = np.zeros(self.num_items)
        self.b = np.mean(self.R[np.where(self.R != 0)])

        # List of training samples
        self.samples = [
        (i, j, self.R[i, j])
        for i in range(self.num_users)
        for j in range(self.num_items)
        if self.R[i, j] > 0
        ]

        # Stochastic gradient descent for given number of iterations
        training_process = []
        for i in range(self.iterations):
            np.random.shuffle(self.samples)
            self.sgd()
            mse = self.mse()
            training_process.append((i, mse))
            if (i+1) % 20 == 0:
                print("Iteration: %d ; error = %.4f" % (i+1, mse))

        return training_process

    # Computing total mean squared error
    def mse(self):
        xs, ys = self.R.nonzero()
        predicted = self.full_matrix()
        error = 0
        for x, y in zip(xs, ys):
            error += pow(self.R[x, y] - predicted[x, y], 2)
        return np.sqrt(error)

    # Stochastic gradient descent to get optimized P and Q matrix
    def sgd(self):
        for i, j, r in self.samples:
            prediction = self.get_rating(i, j)
            e = (r - prediction)

            self.b_u[i] += self.alpha * (e - self.beta * self.b_u[i])
            self.b_i[j] += self.alpha * (e - self.beta * self.b_i[j])

            self.P[i, :] += self.alpha * (e * self.Q[j, :] - self.beta * self.P[i,:])
            self.Q[j, :] += self.alpha * (e * self.P[i, :] - self.beta * self.Q[j,:])

    # Ratings for user i and moive j
    def get_rating(self, i, j):
        prediction = self.b + self.b_u[i] + self.b_i[j] + self.P[i, :].dot(self.Q[j, :].T)
        return prediction

    # Full user-movie rating matrix
    def full_matrix(self):
        return mf.b + mf.b_u[:,np.newaxis] + mf.b_i[np.newaxis:,] + mf.P.dot(mf.Q.T)

Convert the user item ratings to matrix form

In [59]:
R= np.array(ratings.pivot(index = 'user_id', columns ='movie_id', values = 'rating').fillna(0))

In [61]:
mf = MF(R, K=20, alpha=0.001, beta=0.01, iterations=100)
training_process = mf.train()
print()
print("P x Q:")
print(mf.full_matrix())
print()

Iteration: 20 ; error = 296.1260
Iteration: 40 ; error = 291.0691
Iteration: 60 ; error = 287.6773
Iteration: 80 ; error = 282.2684
Iteration: 100 ; error = 273.0257

P x Q:
[[3.84823603 3.35749775 3.22557226 ... 3.29693447 3.42494487 3.3798756 ]
 [3.87050735 3.31918712 3.11892638 ... 3.3685576  3.50228129 3.45023079]
 [3.40625431 2.75845158 2.53213911 ... 2.81245311 2.93065692 2.91516687]
 ...
 [4.24505526 3.64255364 3.40883074 ... 3.6453093  3.79136316 3.75419681]
 [4.37021679 3.73638916 3.50187019 ... 3.78169863 3.90826854 3.87378199]
 [3.59036706 3.21432188 3.04078225 ... 3.26959698 3.35246093 3.33333072]]



## Evaluation Metrics

### Recall
What proportion of items that a user likes were actually recommended
<img src = "Images/Recall.jpg">

#### Precision
Out of all the recommended items, how many did the user actually like?
<img src = "Images/Precision.jpg">

#### RMSE (Root Mean Squared Error)
Lesser the RMSE value, better the recommendations
<img src = "Images/RMSE.jpg">

The above metrics tell us how accurate our recommendations are but they do not focus on the order of recommendations

#### Mean Reciporcal Rank
<img src = "Images/MRR.jpg">
- Suppose we have recommended 3 movies to a user, say A, B, C in the given order, but the user only liked movie C. As the rank of movie C is 3, the reciprocal rank will be 1/3 <br>
- Larger the mean reciprocal rank, better the recommendations

#### MAP at k (Mean Average Precision at cutoff k)
<img src = "Images/MAP.jpg">
- Suppose we have made three recommendations [0, 1, 1]. Here 0 means the recommendation is not correct while 1 means that the recommendation is correct. Then the precision at k will be [0, 1/2, 2/3], and the average precision will be (1/3)*(0+1/2+2/3) = 0.38 <br>
- Larger the mean average precision, more correct will be the recommendations

#### NDCG (Normalized Discounted Cumulative Gain)
- The main difference between MAP and NDCG is that MAP assumes that an item is either of interest (or not), while NDCG gives the relevance score
- Let us understand it with an example: suppose out of 10 movies – A to J, we can recommend the first five movies, i.e. A, B, C, D and E while we must not recommend the other 5 movies, i.e., F, G, H, I and J. The recommendation was [A,B,C,D]. So the NDCG in this case will be 1 as the recommended products are relevant for the user
- Higher the NDCG value, better the recommendations

References: https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-recommendation-engine-python/