# Desafio de Tripulaciones 
03/22 Grupo 3 


### ALGORITMO UTILIZADO:
#### Singular Value Decomposition

$$ \underset{(n, d)}A \approx \underset{(n, n)}U * \underset{(n, d)}\Sigma * \underset{(d, d)} V^T  $$

Cualquier matriz de tamaño (n, d) se puede descomponer en producto de tres factores

* En *U* de tamaño (n, n) es una matriz ortogonal que contiene los vectores singulares izquierdos de *A*.
* En $\Sigma$ que es una matriz diagonal (n,d), cuyos valores son los valores singulares de la matriz *A* ordenados en valor decreciente
* En *V* que es una matriz transpuesta (d,d), cuyos valores son los vectores singulares derechos de *A*.

*Ortogonal significa que multiplicando la transpuesta por si misma, se obtiene la matriz identidad*

Con esto lo que se consigue es que podemos ir elminando vectores de las matrices con la información que no es fundamental, (limpiar los datos) y quedarnos con aquella información más determinante.

## Aplicación práctica

Lo que se hace con los motores de recomendación, es para una actividad que tu no has realizado, teniendo en cuenta tus características y las de otros usuarios. Mediante SVD nos quedamos con los usuarios que son parecidos a ti, y vemos las actividades que no has visto

###  Cargamos librerías

In [1]:
import numpy as np
import pandas as pd
from scipy.sparse import coo_matrix
from scipy.sparse.linalg import svds
from pandas.core.frame import DataFrame
from pandas.io.parsers import read_csv
from surprise import SVDpp
from surprise import Dataset, Reader
from surprise.model_selection import train_test_split
from surprise import accuracy
from collections import defaultdict

### Cargamos los datos

** Entrenamos el algoritmo con un set de películas de forma temporal, cuando los resultados de la encuesta estén listos, entrenaremos el modelo con esos datos

In [2]:
movies = pd.read_csv("./data/movies.csv")
ratings = pd.read_csv("./data/ratings.csv")
df_movies = ratings.merge(movies, on="movieId", how="left")
df_activities = ratings[ratings.columns[:-1]]

In [3]:
df_activities.columns

Index(['userId', 'movieId', 'rating'], dtype='object')

### Preprocessing

In [4]:
reader = Reader()
data = Dataset.load_from_df(df_activities, reader)

train, test = train_test_split(data, test_size=0.25)

### Training and testing

In [5]:
svd = SVDpp()
SVD_model_for_pickle = svd.fit(train)
preds = svd.test(test)

### Evaluation

In [45]:
accuracy.mae(preds)
accuracy.rmse(preds)

MAE:  0.6653
RMSE: 0.8695


0.8694624799057272

### Train all data

In [13]:
trainfull = data.build_full_trainset()

svd = SVDpp()
SVD_model_for_pickle = svd.fit(trainfull)

SVD_model_for_pickle.predict(uid=1, iid=1)

In [10]:
userId = 1
n_recommendations = 5

item_ids = df_activities['movieId'].to_list()
items_finished = df_activities[df_activities['userId'] == userId]['movieId']
items_no_finished = []
for item in item_ids:
    if item not in items_finished:
        items_no_finished.append(item)

preds = []
for item in items_no_finished:
    preds.append(SVD_model_for_pickle.predict(uid=userId, iid=item))

recommendations_rating = {pred[1]:pred[3] for pred in preds}

order_dict = {k: v for k, v in sorted(recommendations_rating.items(), key=lambda item: item[1])}

top_predictions = list(order_dict.keys())[:n_recommendations]
top_predictions

[5323, 3593, 1556, 1882, 2643]

### Función Defiant Recommender para el DESAFIO

In [16]:
def defiant_recommender(userId, dataframe, algorithm, n_recommendations, column_iid= None, column_uid= None):
    """
    This functions will use a trained algorithm to find the n top list of recommended items for a given userID.

    Parameters
    -----------

    userId (int): the user ID of the person that we want recommendations for.

    dataframe (object): the DataFrame containing three columns; userID, itemID and rating.

    algorithm (object): the trained algorith used to recommend items.

    n_rcommendations (int): the number of items recommended.

    column_iid (string): name of the column containing the item ID.

    column_uid (string): name of the column containing the user ID.


    return
    ------

    List of ID of items that an specific user will like.

    """
    item_ids = dataframe[column_iid].to_list()
    items_finished = dataframe[dataframe[column_uid] == userId][column_iid]

    items_no_finished = []
    for item in item_ids:
        if item not in items_finished:
            items_no_finished.append(item)

    preds = []
    for item in items_no_finished:
        preds.append(SVD_model_for_pickle.predict(uid=userId, iid=item))

    recommendations_rating = {pred[1]:pred[3] for pred in preds}

    order_dict = {k: v for k, v in sorted(recommendations_rating.items(), key=lambda item: item[1])}

    top_predictions = list(order_dict.keys())[:n_recommendations]
    
    return top_predictions

In [17]:
activities_recommended = defiant_recommender(1, df_activities, SVD_model_for_pickle, 5, 'movieId', 'userId')
print("ID of the activities recommended:", activities_recommended)

ID of the activities recommended: [5323, 3593, 1556, 1882, 2643]


In [8]:
# Guardar el modelo
import pickle

with open('DEFIANT_RECOMMENDER.model', "wb") as archivo_salida:
    pickle.dump(SVD_model_for_pickle, archivo_salida)

In [9]:
# Para volver a leer el modelo
with open('DEFIANT_RECOMMENDER.model', "rb") as archivo_entrada:
    defiant_pickle = pickle.load(archivo_entrada)
    
print(defiant_pickle)

<surprise.prediction_algorithms.matrix_factorization.SVDpp object at 0x7ff5c3a14bd0>


### Funcion para comprobar la logica de la recomendacion


In [8]:
def check_activities_user(userId, dataframe, n, column_rating= None, column_uid= None):
    """
    This functions will show the n top rated items for a given userID.

    Parameters
    -----------

    userId (int): the user ID of the person that we want recommendations for.

    dataframe (object): the DataFrame containing three columns; userID, itemID and rating.

    n (int): number of top rated items to show.

    column_rating (string): name of the column containing the item rating.

    column_uid (string): name of the column containing the user ID.


    return
    ------

    A dataframe with the n top rated items by that given user.

    """
    return dataframe[dataframe[column_uid] ==userId].sort_values(column_rating, ascending=False)[:n]

In [11]:
print(f"Top activities user likes:\n", check_activities_user(1, df_activities, 5, 'rating', 'userId'))

Top activities user likes:
      userId  movieId  rating
231       1     5060     5.0
185       1     2872     5.0
89        1     1291     5.0
90        1     1298     5.0
190       1     2948     5.0
