# **Matrix factorization for finding Hidden genres**

For this, we need to look into the rating matrix again. But now comes an additional problem. The more data we have, bigger the matrix become. And therefore computations become time consuming. In order to maintain the data features while minimizing the time consumption(computation power required), we use a technique called `Dimensional reduction`. In this notebook to separate the rating matrix into smaller parts we use a mathematical technique called `Singular Value Decomposition` (SVD).

Before going in to the mentioned topics, it is important to understand the basics of matrix factorization. Assume you have a matrix `R` and then we can decompose it in following form.

<center>

__R = U.V__
</center>

If R has dimensions n\*m then U will have n\*d and V will have d\*m dimensions. This is called `UV-Decomposition`. In recommender systems field, U is the user-feature matrix, V is the item-feature matrix and R will be the ratings matrix. The idea behind the factorization is to find values to U and V which yield their multiplication to R matrix as close as possible, in theory it is basically solving several linear equations. (ie. which satisfy the matrix multiplication.)

# SVD (Singular Value Decomposition)



One of the most common way of factorizing matrices is SVD. In SVD we construct 3 matrices namely `U`, `V* (V Transpose)` and `Σ (Sigma)`. Here U and V acts as factors while sigma acts as the regulator for the data dimensions.
<pre style='color:yellow'>
<center>M = U.Σ.V*</center>

- M  = Data Marix
- U  = User feature matrix
- Σ  = Weights diagonal matrix (eigen values matrix)
- V* = Item Feature matrix
</pre>
To get more theoritical understanding behind SVD you can watch [videos in here.](https://www.youtube.com/watch?v=gXbThCXjZFM&list=PLMrJAkhIeNNSVjnsviglFoY2nXildDCcv&index=1)

Instead of implementing myself, I have used the numpy implementation of the algorithm.

In [1]:
import pandas as pd
import numpy as np

movies = ['mib', 'st', 'av', 'b', 'ss', 'lm']
users = ['Sara', 'Jesper', 'Therese', 'Helle', 'Pietro', 'Ekaterina']
M = pd.DataFrame([
                    [5.0, 3.0, 0.0, 2.0, 2.0, 2.0],
                    [4.0, 3.0, 4.0, 0.0, 3.0, 3.0],
                    [5.0, 2.0, 5.0, 2.0, 1.0, 1.0],
                    [3.0, 5.0, 3.0, 0.0, 1.0, 1.0],
                    [3.0, 3.0, 3.0, 2.0, 4.0, 5.0],
                    [2.0, 3.0, 2.0, 3.0, 5.0, 5.0]],
                columns=movies,
                index=users)

from numpy import linalg

U, sigma, V_t = linalg.svd(M)

If we check the output from above matrices, we will see that their multiplication does not add up to the original values exactly. But close enough to be usable. Also in the sigma matrix (eigen values sorted) we can check the amount of information given by the each of the data rows/columns in the U/V matrices.

Also we can use the sigma matrix to reduce the dimensions of the matrices while retaining most of the information available on the original data. To do that we can select the most weighted values from the sigma matrix.

In [9]:
def rank_k(k):
    '''
    Function to reduce the rank to the given level
    '''
    U_reduced= np.mat(U[:,:k])
    Vt_reduced = np.mat(V_t[:k,:])
    Sigma_reduced = np.eye(k)*sigma[:k]
    
    return U_reduced, Sigma_reduced, Vt_reduced

U_reduced, Sigma_reduced, Vt_reduced = rank_k(4)
M_hat = U_reduced * Sigma_reduced * Vt_reduced

print(M_hat)

[[ 4.87147087  3.11444112  0.04893344  2.23870109  1.94083799  1.920736  ]
 [ 3.49344678  3.45787572  4.19067126  0.94886084  2.61521613  2.82032378]
 [ 5.22111879  1.8034114   4.91572235  1.58969108  1.09528095  1.14205388]
 [ 3.25351113  4.77315242  2.90384191 -0.4721446   1.14157873  1.13455568]
 [ 2.93061675  3.04700483  3.03112668  2.11137004  4.29526848  4.67079756]
 [ 2.27270952  2.76664391  1.89315701  2.50473044  4.91596291  5.35161957]]


Here based on the column and row indices,we can identfy the related ratings for a particular item. 

Also as we can see, we need to do 3 multiplications to get the M_hat matrix. To avoid that we can only save the decomposed matrices. We will take the squareroot of the Σ matrix and multiply it with each of the other matrices as below to get the decomposed matrices.

In [10]:
def rank_k_v2(k):
    '''
    Updated version of rank reduction function.
    '''
    U_reduced= np.mat(U[:,:k])
    Vt_reduced = np.mat(V_t[:k,:])
    Sigma_reduced = np.eye(k)*sigma[:k]
    Sigma_sqrt = np.sqrt(Sigma_reduced)

    return U_reduced*Sigma_sqrt, Sigma_sqrt*Vt_reduced

U_reduced, Vt_reduced = rank_k_v2(4)
M_hat = U_reduced * Vt_reduced

print(M_hat)

[[ 4.87147087  3.11444112  0.04893344  2.23870109  1.94083799  1.920736  ]
 [ 3.49344678  3.45787572  4.19067126  0.94886084  2.61521613  2.82032378]
 [ 5.22111879  1.8034114   4.91572235  1.58969108  1.09528095  1.14205388]
 [ 3.25351113  4.77315242  2.90384191 -0.4721446   1.14157873  1.13455568]
 [ 2.93061675  3.04700483  3.03112668  2.11137004  4.29526848  4.67079756]
 [ 2.27270952  2.76664391  1.89315701  2.50473044  4.91596291  5.35161957]]


But the problem in the above is that, we marked the values/ ratings we did not know with zero. It causes the decomposed matrices to generate values close to zero as well. Therefore to avoid this problem we use a technique called `Imputation`.

Basically this means that we can either normalize the each data row such that mean is zero or we can calculate the mean for each user/item and fill the null values with it. Apparently both these techniques are called Imputation methods. By doing this we can get somewhat sensible value for the null values in decomposition.

> ### __Adding new user to the decomposed matrix__

In SVD method, it is really easy to add new users to the decompsed matrix by nature. Since matrix multiplication works in a certain way, we can directly append user row to the rating (U) matrix.

<pre style='color:yellow'>
<center>u_reduced = u_original * Vt * Σ^-1</center>

here,

- u_reduced = reduced vector of user ratings.
- u_original = original ratings given by the new user.
- Vt = item Matrix transpose from SVD method.
- Σ^-1 = Inverse of the Σ matrix of SVD.

</pre>

In [11]:
from numpy.linalg import inv

# new user original ratings
u_original = np.array([4.0, 5.0, 0.0, 3.0, 3.0, 0.0])

# reduced vector
u_reduced = u_original *Vt_reduced.T* inv(Sigma_reduced)

print(u_reduced)

[[-1.45894059  0.23555071  1.9293472  -0.58325078]]


> ### __Adding new item to the decomposed matrix__

Just as we add a new user to the decomposed matrix, we can add new items to the V matrix as well.

<pre style='color:yellow'>
<center>i_reduced = i_original_t * U * Σ^-1</center>

here,

- i_reduced = the vector in the reduced space to represent the new item.
- i_original_t = original item ratings transpose vector.
- U = User Matrix from SVD method.
- Σ^-1 = Inverse of the Σ matrix of SVD.

</pre>





> One thing you might wonder is why the heck we are decomposing the matrices filled with mean values and then using the decomposed matrices to estimate the original matrix again. Answer would be, the point of decomposing is reducing the size of original matrix and extracting hidden topics for further processing.

> ## __Recommendations with SVD__

To do recommendations using SVD we can use few methods.

1. Calculate M_hat and use it to find the highest rated item for the user (which they havent consumed). (weird to me as well, apparently it works! :-/ )
2. Can use the decomposed user matrix to do the collabarative filtering
3. Using the item similarity on the decomposed item matrix.

Can test the 3 methods to see how it work.

> ## **Problems of SVD**

1. Need to fill out the empty values.
2. Slow to calculate on large matrices.
3. Even though we can add new users and items on the go, we need to recalculate the Sigma matrix often to make the decomposition accurate.
4. Not always explainable.



# Baseline Predictors

Practically speaking, Baseline predictors are type of methods that we can use for recommend items to users. 

What we do in this technique is that, we measure the inherent bias for items/ users and them use those biases as baseline for our prediction. Intuition behind baseline predictors is that if a item is considered good, its general ratings would be high. Opposite would also be true for the bad items. Also some users would be inherently highly negative/positive. In all these cases we can see some kind of bias towards a item. If we can measure this bias, it can be used as a baseline, which is what we try to do in Baseline predictors.

We use following equation as the predictor.


<pre style='color:yellow'>
<center>r_ui = µ + b_u + b_i</center>

here,

- r_ui = base prediction for item i from user u.
- µ = average of all ratings
- b_u = user bias
- b_i = item bias

</pre>

We can calculate the user bias and item bias terms using existing rating.

 1. Calculate the average of all ratings by taking sum of all elements and dividing by number of non zero elements.
 2. For the given user select a rating already given and fill the r_ui and µ values. (at this step, we know them)
 3. Do that for many items. In this way we can get many formulas.
 4. Solve those equations as a least-square problem to get the bias terms.


<center><image src="./images/Baseline predictors.jpg" width="200px" /></center>

Since this calculation is time consuming and comparatively complex, more simpler method is used as well.

