In [22]:
import numpy as np
import pandas as pd
from scipy.sparse import coo_matrix
from scipy.sparse.linalg import svds

# Singular Value Decomposition

$$ \underset{(n, d)}A \approx \underset{(n, n)}U * \underset{(n, d)}\Sigma * \underset{(d, d)} V^T  $$

Cualquier matriz de tamaño (n, d) se puede descomponer en producto de tres factores

* En *U* de tamaño (n, n) es una matriz ortogonal que contiene los vectores singulares izquierdos de *A*.
* En $\Sigma$ que es una matriz diagonal (n,d), cuyos valores son los valores singulares de la matriz *A* ordenados en valor decreciente
* En *V* que es una matriz transpuesta (d,d), cuyos valores son los vectores singulares derechos de *A*.

*Ortogonal significa que multiplicando la transpuesta por si misma, se obtiene la matriz identidad*

Con esto lo que se consigue es que podemos ir elminando vectores de las matrices con la información que no es fundamental, (limpiar los datos) y quedarnos con aquella información más determinante.

## Aplicación práctica

Lo que se hace con los motores de recomendación, es para una película que tu no has visto, teniendo en cuenta tus características y las de otros usuarios. Mediante SVD nos quedamos con los usuarios que son parecidos a ti, y vemos las peliculas que no has visto
## Preprocesamos los datos

# Singular Value Decomposition

## 1.  Basic Math introduction

This algorith is based in matrix factorization. Matrix factorization say that every matrix of size (n, d) can be decomposed in a product of three factors:

* *U* of size (n, n) is an ortogonal matrix that contains the left singular vectors of *A*.
* $\Sigma$ is a diagonal matrix (n,d), that constains the singular values of *A* in a decreasing order.
* *V* is a transpose (d,d), that contain the right singular vectors of *A*.

With this, we can start deleting vectores in the matrices wich the info that is not fundamental (cleaning data) and keep the *singular* information

## 2. Creating a recomender system with SVD and Surprise
 

###  Load librarys

In [25]:
import pandas as pd
from pandas.core.frame import DataFrame
from pandas.io.parsers import read_csv
from surprise import SVDpp
from surprise import Dataset, Reader
from surprise.model_selection import train_test_split
from surprise import accuracy
from collections import defaultdict

### Load data 

In [17]:
movies = pd.read_csv("/home/rubiales/Desktop/Projects/General/SVD/data/movies.csv")
ratings = pd.read_csv("/home/rubiales/Desktop/Projects/General/SVD/data/ratings.csv")
df_movies = ratings.merge(movies, on="movieId", how="left")
df_movies_to_model = movies[movies.columns[:-3]]

### Preprocessing

In [4]:
reader = Reader()
data = Dataset.load_from_df(df_movies_to_model[movies.columns[:-3]], reader)
train, test = train_test_split(data, test_size=0.25)

### Training and testing

In [5]:
svd = SVDpp()
svd.fit(train)
preds = svd.test(test)

### Evaluation

It's a good evaluations marks so let's train the model with the complete Dataset

In [6]:
accuracy.mae(preds)
accuracy.rmse(preds)

MAE:  0.6639
RMSE: 0.8679


0.8678512394549509

### Train all data

In [7]:
trainfull = data.build_full_trainset()

svd = SVDpp()
svd.fit(trainfull)

svd.predict(uid=1, iid=1)

Prediction(uid=1, iid=1, r_ui=None, est=4.823116918604602, details={'was_impossible': False})

### Create a recommender function

In [15]:
def recommend_system(userId, dataframe, algorithm, n_commends):
    """
with the parameters, returns back the top n recommends movies.

Parameters
-----------

userId: the user ID of the person that we want recommendations

dataframe: the DataFrame of movies.

algorithm: the algorith used to recommend movies.

n_commends: the number of movies recommended.


return
------

ID of movies that a specific user will like.

    """
    movie_ids = dataframe['movieId'].to_list()
    movies_watched = dataframe[dataframe["userId"] == userId]["movieId"]
    movies_no_watched = [movie for movie in movie_ids if movie not in movies_watched]

    preds = [algorithm.predict(uid=userId, iid=movie) for movie in movies_no_watched]
    commends_ratting = {pred[1]:pred[3] for pred in preds}
    order_dict = {k: v for k, v in sorted(commends_ratting.items(), key=lambda item: item[1])}

    top_predictions = list(order_dict.keys())[:n_commends]

    return dataframe[dataframe["movieId"].isin(top_predictions)][["title", "genres"]].drop_duplicates()

In [18]:
movies_recommended = recommend_system(1, df_movies, svd, 5)
print("ID of the movies recommended:", movies_recommended)

ID of the movies recommended:                                               title  \
148    I Still Know What You Did Last Summer (1998)   
2554                 Speed 2: Cruise Control (1997)   
2795                        Inspector Gadget (1999)   
14369                            Spice World (1997)   
26886                                Jason X (2002)   

                                 genres  
148             Horror|Mystery|Thriller  
2554            Action|Romance|Thriller  
2795   Action|Adventure|Children|Comedy  
14369                            Comedy  
26886            Horror|Sci-Fi|Thriller  


### Check if have sense and conclusion

In [46]:
def check_movies_user(userId, dataframe, n):
    return dataframe[dataframe["userId"] ==userId].sort_values("rating", ascending=False)[:n]

In [48]:
print(f"Movies user likes:", check_movies_user(1, df_movies, 20))

Movies user likes:      userId  movieId  rating  timestamp  \
231       1     5060     5.0  964984002   
185       1     2872     5.0  964981680   
89        1     1291     5.0  964981909   
90        1     1298     5.0  964984086   
190       1     2948     5.0  964982191   
189       1     2947     5.0  964982176   
188       1     2944     5.0  964981872   
186       1     2899     5.0  964982703   
184       1     2858     5.0  964980868   
179       1     2700     5.0  964980985   
98        1     1517     5.0  964981107   
100       1     1573     5.0  964982290   
102       1     1587     5.0  964982346   
103       1     1617     5.0  964982951   
181       1     2761     5.0  964982703   
105       1     1625     5.0  964983504   
88        1     1282     5.0  964982703   
87        1     1278     5.0  964983414   
86        1     1275     5.0  964982290   
85        1     1270     5.0  964983705   

                                                 title  \
231                

## 3. Conclusions
* Algorithm: have good evaluation metrics, we can improve it with other options from surprise library as cross validation
* Recommender function: can be improved and make more general and in a better way for production, but for this task is enought
* Results: We can see the user loves Thriller, action, adventure, and some comedy, and our recommender systems give back this kind of movies and the user have never seen before. Perfect!