# Model-based CF

In this step we will implement the PureSVD, one approach of the Collaborative Filtering methods.
For this reason, you must do:

- Read the train file extracted from the dataset 
- Create a sparse matrix to them
- Extract the eigenvalues and eigenvectors of this matrix via SVD
- Combine these latent factors to predict an user-item rating
- Recommend the items with the highest score

In [None]:
# import libs
import operator
import scipy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy.sparse import csr_matrix
from collections import OrderedDict
from scipy.sparse import linalg 

# useful command
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

plt.rcParams.update({'font.size': 14})

## Reading train and test files

You can read this file as you prefer. I propose to read the files by the pandas' library and create the sparse matrix after it.

In [None]:
df_train = 
df_test = 

df_train.head()
df_test.head()

## Creating Sparse Matrix

I propose to use the csr_matrix from scipy.

In [None]:
# Select users, items and ratings logs (i.e., all information from each column)
users = 
items = 
ratings = 

In [None]:
# Define the matrix dimensions based on the max index related to users and items
nb_users = max(users)
nb_items = max(items)

In [None]:
# Creating matrix of ratings
ratings_matrix = csr_matrix((ratings, (users, items)), shape=(nb_users+1, nb_items+1))

ratings_matrix.shape

## An useful function

This function is used to save the recommendations in a file.

In [None]:
def dumpRecommendation(recommendation, users_targets, file_name):
    
    file_out = open(file_name, 'w')
    
    # for each user target
    for userId in users_targets:
        issuedItems = ""
        # for each item in the previous order
        for itemId in recommendation[userId]:
            issuedItems += str(itemId) + ":" + str(0.0) + ","
        # saving in file in correct format
        string_s = str(userId) + "\t" + "[" + issuedItems
        string_out = string_s[:-1] + ']'
        file_out.write(string_out + "\n")
    
    file_out.close()

## PureSVD Recommendation

In PureSVD model, the prediction is based on the latent factors extracted via SVD.

- Let a ratings matrix, we apply the SVD to extract three matrices:
    - *U* represents the users factors *(m x f)*
    - *S* the eigenvalues associated to each eigenvector *(f x f)*
    - *Q* represents the items factors *(f x n)*


- The prediction is similar to:
$
\begin{align}
\widehat{r}_{ui} = r_u \cdot Q^T \cdot q_i
\end{align}
$

### Extracting users and items latent factors

Define the number of latent factors and use it to run the SVD method.

In [None]:
numFactors = 

In [None]:
[U, S, Q_t] = scipy.sparse.linalg.svds(ratings_matrix, numFactors, return_singular_vectors=True)

In [None]:
U.shape
S.shape
Q_t.shape

### Predicting items ratings

Predict ratings for each user-item based on the PureSVD rules.

In [None]:
prediction_matrix = csr_matrix((nb_users+1, nb_items+1))

prediction_matrix.shape

In [None]:
# Realize a prediction for each user
Q = Q_t.transpose()

for u in range(ratings_matrix.shape[0]):
    r_u = 
    # optimization: instead to do 'q_i', we can use all Q
    aux = 
    prediction_matrix[u,:] = 
    
prediction_matrix.shape

In [None]:
# Optimized way
prediction_matrix = csr_matrix((nb_users+1, nb_items+1))

Q = Q_t.transpose()

aux_matrix = 
prediction_matrix = 

In [None]:
prediction_matrix.shape

### Recommending items

The recommendation is related to the cosine similarity of users and items vectors.

In [None]:
# Size of each recommendation
top_k = 10

In [None]:
# Setting the recommendations for each user
recommendation = {}

for u in range(ratings_matrix.shape[0]):
    recommendation[u] = []
    cont = 0
    # sorting items by relevance
    order = np.argsort(prediction_matrix[u,:])[::-1]
    # recommending the best items that have never seen by users
    for i in order:
        # recommending the top-k items 
        if (cont < top_k):
            if ( ):
                recommendation[u].append(i)
                cont += 1
        else:
            break

In [None]:
# Save in a file
users_targets = df_test['userId'].unique()
dumpRecommendation(recommendation, users_targets, "recList_PureSVD.txt")

In [None]:
recommendation[300]
recommendation[3000]
recommendation[6010]