In [1]:
import pandas as pd
from surprise import dump
import inference
import aggregation_functions as agg

# Entrenaminento

Entrenando modelo de predicciones agregadas:

In [2]:
!python train_agg_predictions_model.py --test_size 0.3 --top_k 10 --relevance_rating_threshold 3.0 --relevance_min_ratings 2

Leyendo conjunto de datos...
Dividiendo conjunto de datos en train y test...
Usando GridSearchCV para encontrar los mejores hiperparámetros...
Mejor MSE 0.7617908227039981 con parámetros: {'n_epochs': 60, 'lr_all': 0.01, 'reg_all': 0.1, 'n_factors': 300}
Entrenando el mejor modelo...
Realizando predicciones sobre el conjunto de test...
Evaluando el modelo...
Precisión@10 (borda_count): 0.6901639344262296
Precisión@10 (average): 0.1098360655737705
Precisión@10 (additive_utilitarian): 0.9327868852459016
Precisión@10 (multiplicative): 0.9327868852459016
Precisión@10 (fairness): 0.3704918032786885
Precisión@10 (least_misery): 0.03442622950819673
Precisión@10 (highest_frequency): 0.8836065573770492
Guardando el modelo en models/svd_agg_pred_model.dump ...


Entrenando modelo agregado:

In [3]:
!python train_agg_model.py --test_size 0.3 --top_k 10 --relevance_rating_threshold 3.5

Leyendo conjunto de datos...
Dividiendo conjunto de datos en train y test...
Usando GridSearchCV para encontrar los mejores hiperparámetros...
Mejor MSE 0.80196557399294 con parámetros: {'n_epochs': 60, 'lr_all': 0.005, 'reg_all': 0.1, 'n_factors': 300}
Entrenando el mejor modelo...
Realizando predicciones sobre el conjunto de test...
Evaluando el modelo...
Precision@10: 0.8770491803278688
Guardando el modelo en models/svd_agg_model.dump ...


# Inferencia

Seleccionar un grupo al cual recomendar películas.

In [4]:
df_user_groups = pd.read_csv("data/user_group_mapping.csv")
df_user_groups.sample(5)

Unnamed: 0,userId,groupId
422,564,42
200,22,20
448,247,44
506,267,50
142,530,14


In [5]:
df_group = df_user_groups[df_user_groups["groupId"] == 9]
users_in_group = df_group["userId"].values

In [6]:
df = pd.read_csv("data/ratings.csv")
group_ratings = df[df["userId"].isin(users_in_group)]
df_movies = pd.read_csv("data/movies.csv")
group_ratings = group_ratings.merge(df_movies, on="movieId")[
    ["userId", "title", "rating"]
]
group_ratings

Unnamed: 0,userId,title,rating
0,26,GoldenEye (1995),3.0
1,26,Babe (1995),3.0
2,26,Seven (a.k.a. Se7en) (1995),4.0
3,26,Apollo 13 (1995),3.0
4,26,Batman Forever (1995),3.0
...,...,...,...
536,578,Kick-Ass (2010),4.5
537,578,Letters to Juliet (2010),4.0
538,578,Piranha (Piranha 3D) (2010),0.5
539,578,127 Hours (2010),5.0


Visualizamos la matriz de usuarios - películas:

In [7]:
user_item_matrix = group_ratings.pivot(index="userId", columns="title", values="rating")
user_item_matrix

title,(500) Days of Summer (2009),101 Dalmatians (One Hundred and One Dalmatians) (1961),127 Hours (2010),13 Going on 30 (2004),27 Dresses (2008),"400 Blows, The (Les quatre cents coups) (1959)",50 First Dates (2004),8 1/2 (8½) (1963),Abbott and Costello Meet Frankenstein (1948),About a Boy (2002),...,X-Men (2000),X-Men: First Class (2011),"Yards, The (2000)","Year of Living Dangerously, The (1982)",Yi Yi (2000),You Can Count on Me (2000),You've Got Mail (1998),"Young Victoria, The (2009)","Yours, Mine and Ours (2005)",Zombieland (2009)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
26,,,,,,,,,,,...,,,,,,,,,,
143,5.0,,,5.0,3.0,,5.0,,,,...,4.5,4.0,,,,,,,3.0,1.0
150,,,,,,,,,,,...,,,,,,,,,,
155,,,,,,,,,,,...,4.0,,,,,,,,,
185,,4.0,,,,,,,,3.0,...,,,,,,,,,,
335,,,,,,,,,,,...,,,,,,,3.0,,,
410,,,,,,4.0,,4.0,5.0,,...,,,4.0,,5.0,5.0,,,,
541,,,,,,,,,,,...,,,,,,,,,,
547,,,,,,,,,,,...,,,,3.0,,,,,,
578,,,5.0,,,,,,,,...,,,,,,,,4.5,,


Obtenemos las 5 películas mejor valoradas por cada usuario del grupo:

In [8]:
for user in group_ratings["userId"].unique():
    user_ratings = group_ratings[group_ratings["userId"] == user]
    top_movies = (
        user_ratings.sort_values(by="rating", ascending=False).head(5)["title"].tolist()
    )
    print(f"Usuario {user} - Top películas:\n{top_movies}\n")

Usuario 26 - Top películas:
['Pulp Fiction (1994)', 'Die Hard: With a Vengeance (1995)', 'Batman (1989)', 'Fugitive, The (1993)', 'Firm, The (1993)']

Usuario 143 - Top películas:
['Forrest Gump (1994)', '(500) Days of Summer (2009)', '13 Going on 30 (2004)', 'Bill Cosby, Himself (1983)', 'Uptown Girls (2003)']

Usuario 150 - Top películas:
['Star Trek: First Contact (1996)', 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)', 'Birdcage, The (1996)', 'Mission: Impossible (1996)', 'Heat (1995)']

Usuario 155 - Top películas:
['Armour of God II: Operation Condor (Operation Condor) (Fei ying gai wak) (1991)', 'Star Wars: Episode I - The Phantom Menace (1999)', 'Office Space (1999)', 'Dinosaur (2000)', 'Men in Black (a.k.a. MIB) (1997)']

Usuario 185 - Top películas:
["Singin' in the Rain (1952)", 'Requiem for a Dream (2000)', 'Beautiful Mind, A (2001)', 'Memento (2000)', 'Traffic (2000)']

Usuario 335 - Top películas:
['Usual Suspects, The (1995)', 'Terminator 2: Judgment Day (1991)', 'Pulp Fict

Cargamos los modelos que hemos entrenado previamente:

In [9]:
_, agg_predictions_model = dump.load("models/svd_agg_pred_model.dump")
_, agg_model = dump.load("models/svd_agg_model.dump")

Hacemos recomendaciones al grupo usando el modelo de predicciones agregadas con dos estrategias diferentes de agregación de predicciones:

In [10]:
inference.recommend_movies_agg_predictions(
    agg_predictions_model,
    users_in_group,
    n=10,
    agg_function=agg.PreferenceAggregationFunction.LEAST_MISERY,
)

['Top Hat (1935)',
 'His Girl Friday (1940)',
 'Lawrence of Arabia (1962)',
 'Touch of Evil (1958)',
 'Jules and Jim (Jules et Jim) (1961)',
 'Hustler, The (1961)',
 'Jetée, La (1962)',
 'Sophie Scholl: The Final Days (Sophie Scholl - Die letzten Tage) (2005)',
 'Louis C.K.: Shameless (2007)',
 'Three Billboards Outside Ebbing, Missouri (2017)']

In [11]:
inference.recommend_movies_agg_predictions(
    agg_predictions_model,
    users_in_group,
    n=10,
    agg_function=agg.PreferenceAggregationFunction.BORDA_COUNT,
)

['Shawshank Redemption, The (1994)',
 'Top Hat (1935)',
 'His Girl Friday (1940)',
 'Touch of Evil (1958)',
 'Big Sleep, The (1946)',
 'Hustler, The (1961)',
 'Jetée, La (1962)',
 'Sophie Scholl: The Final Days (Sophie Scholl - Die letzten Tage) (2005)',
 'Louis C.K.: Shameless (2007)',
 'Three Billboards Outside Ebbing, Missouri (2017)']

Hacemos recomendaciones al grupo usando el modelo agregado:

In [12]:
inference.recommend_movies_agg_model(agg_model, group_id=9, n=10)

['Wallace & Gromit: The Best of Aardman Animation (1996)',
 'Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)',
 'Laura (1944)',
 'Paths of Glory (1957)',
 'Boondock Saints, The (2000)',
 'Double Indemnity (1944)',
 'Conversation, The (1974)',
 'You Can Count on Me (2000)',
 'Yi Yi (2000)',
 'Trial, The (Procès, Le) (1962)']