### Sistema de Recomendacion
- Para esta parte de Machine Learning nos apoyamos fuertemente en <a href = 'https://rubialesalberto.medium.com/singular-value-decomposition-svd-con-python-creando-un-sistema-de-recomendacion-640de1b8b002'> Sistema de Recomendacion con SVD </a>

In [104]:
import pandas as pd
import numpy as np


In [105]:
from surprise import SVD
from surprise import Dataset, SVDpp
from surprise import Reader
from surprise.model_selection import train_test_split

In [106]:

from surprise import accuracy
from collections import defaultdict

In [107]:
# Cargamos todo el dataset. 
df_aux = pd.read_csv('data_total.csv')
df_aux.head(2)


  df_aux = pd.read_csv('data_total.csv')


Unnamed: 0,id,type,title,director,cast,country,date_added,release_year,rating_x,duration_int,duration_type,listed_in,description,userId,rating_y,timestamp,score
0,as1,movie,the grand seduction,don mckellar,"brendan gleeson, taylor kitsch, gordon pinsent",canada,2021-03-30,2014,g,113,min,"comedy, drama",a small fishing village must procure a local d...,543,5.0,1059596582,3.4671
1,as1,movie,the grand seduction,don mckellar,"brendan gleeson, taylor kitsch, gordon pinsent",canada,2021-03-30,2014,g,113,min,"comedy, drama",a small fishing village must procure a local d...,595,3.0,839937603,3.4671


In [108]:
# Vemos que columnas usaremos para nuestro Sistema de Recomendacion.
columnas = ['id', 'title', 'listed_in', 'userId', 'rating_y']


In [109]:
df = df_aux[columnas]
df

Unnamed: 0,id,title,listed_in,userId,rating_y
0,as1,the grand seduction,"comedy, drama",543,5.0
1,as1,the grand seduction,"comedy, drama",595,3.0
2,as1,the grand seduction,"comedy, drama",611,3.0
3,as1,the grand seduction,"comedy, drama",2523,3.5
4,as1,the grand seduction,"comedy, drama",3082,4.0
...,...,...,...,...,...
11024284,ns8807,zubaan,"dramas, international movies, music & musicals",122699,4.0
11024285,ns8807,zubaan,"dramas, international movies, music & musicals",122869,4.0
11024286,ns8807,zubaan,"dramas, international movies, music & musicals",123708,3.0
11024287,ns8807,zubaan,"dramas, international movies, music & musicals",123841,4.0


In [110]:
df = df.rename(columns={"id": "movieId", "rating_y": "score"})
df

Unnamed: 0,movieId,title,listed_in,userId,score
0,as1,the grand seduction,"comedy, drama",543,5.0
1,as1,the grand seduction,"comedy, drama",595,3.0
2,as1,the grand seduction,"comedy, drama",611,3.0
3,as1,the grand seduction,"comedy, drama",2523,3.5
4,as1,the grand seduction,"comedy, drama",3082,4.0
...,...,...,...,...,...
11024284,ns8807,zubaan,"dramas, international movies, music & musicals",122699,4.0
11024285,ns8807,zubaan,"dramas, international movies, music & musicals",122869,4.0
11024286,ns8807,zubaan,"dramas, international movies, music & musicals",123708,3.0
11024287,ns8807,zubaan,"dramas, international movies, music & musicals",123841,4.0


In [111]:
# Vemos las peliculas con mayor popularidad. Nos basamos en la cantidad de veces que un usuario puntuo cierta pelicula. 
df.title.value_counts()



married at first sight                      1938
moesha                                      1524
national treasure                           1501
cold case files                             1485
yu-gi-oh! arc-v                             1483
                                            ... 
dave                                         404
cop and a half: new recruit                  402
i am alone                                   401
rob schneider: asian momma, mexican kids     399
mitzi (nbc 1968)                             396
Name: title, Length: 22042, dtype: int64

Usamos Reader() del paquete Surprise para poner los datos en el formato que nos piden los algoritmos


In [112]:
reader = Reader()


In [113]:
N_filas = 100000 # Limitamos el dataset a N_filas

data = Dataset.load_from_df(df[['userId', 'movieId', 'score']][:N_filas], reader)

In [114]:
# Separo en train y test
train, test = train_test_split(data, test_size=0.25)

# Instanciamos el algoritmo y entrenamos
svd = SVDpp()
svd.fit(train)
preds = svd.test(test)

# Métricas de evaluación
accuracy.mae(preds)
accuracy.rmse(preds)


MAE:  0.8425
RMSE: 1.0545


1.0544690154655187

In [115]:

# Creamos todo el dataset completo con Train y Test
trainfull = data.build_full_trainset()

# Instanciamos de nuevo el algoritmo
svd = SVDpp()
# Entrenamos el algoritmo
svd.fit(trainfull)



<surprise.prediction_algorithms.matrix_factorization.SVDpp at 0x1aabab09780>

In [116]:
# Realizamos una predicción para ver que todo funciona
svd.predict(uid=1, iid=1)

Prediction(uid=1, iid=1, r_ui=None, est=3.533905, details={'was_impossible': False})

In [117]:

# Creamos una función que pasandole, un usuario, un DataFrame, un algoritmo y el número de recomendaciones que queremos
#  con los parametros, devuelve las mejores n peliculas.
def recommend_system(userId, dataframe, algorithm, n_commends):
    """
    Parametros
    -----------
    userId: El Id de usuario de la persona que queremos recomendarle peliculas.
    dataframe: El dataframe que utilizamos.
    algoritmo: El algoritmo que creamos para recomendar peliculas.
    n_commends: El numero de peliculas recomendadas
    return
    ------
    IDs de peliculas que a un usuario especifico le gustarian..
    """
    movie_ids = df['movieId'].to_list()
    movies_watched = dataframe[dataframe["userId"] == userId]["movieId"]
    movies_no_watched = [movie for movie in movie_ids if movie not in movies_watched]

    preds = [algorithm.predict(uid=userId, iid=movie) for movie in movies_no_watched]
    commends_ratting = {pred[1]:pred[3] for pred in preds}
    order_dict = {k: v for k, v in sorted(commends_ratting.items(), key=lambda item: item[1])}

    top_predictions = list(order_dict.keys())[:n_commends]

    return dataframe[dataframe["movieId"].isin(top_predictions)][["title"]].drop_duplicates()




In [118]:
#Función que nos da las películas de un usuario que más le gustan, según la puntuación que le ha dado
def check_movies_user(userId, dataframe, n):
    return dataframe[dataframe["userId"] ==userId].sort_values("score", ascending=False)[:n] 
 


In [122]:
#Vemos las películas que ve, y las que le hemos recomendado.
movies_recommended = recommend_system(3, df, svd, 5)

In [120]:
print(f"Movies user likes:", check_movies_user(3, df, 20))


Movies user likes:          movieId                                           title  \
6311426   hs2056                    biography: the trump dynasty   
5743811    hs871                                     mayans m.c.   
10330254  ns7360                                      lucky days   
9496761   ns5624                                    semana santa   
9395246   ns5413                                  criminal minds   
8707262   ns3980                       charlie's colorforms city   
8554979   ns3663  frankenstein’s monster’s monster, frankenstein   
7427291   ns1310                        my teacher, my obsession   
6466982   hs2381                               la diosa coronada   
755418    as1577                                       mr. robot   
10916767  ns8584                              thorne: sleepyhead   
4550923   as9500                                      kept woman   
3311191   as6909                                    mexican moon   
2930085   as6112             

In [121]:
print("ID of the movies recommended:", movies_recommended)

ID of the movies recommended:                                      title
11981               zambezi: force of life
22510  yoga for mindfulness and meditation
26915                      yancy derringer
58508                    where hands touch
65690                                wendy
