En este archivo, voy a desarrollar el segundo sistema de recomendación que se me pide para este proyecto.
Este sistema de recomendación funciona de la siguiente manera:
![alt text](../img/user_item.jpg "Title")

Suponiendo que tenemos a un usuario 1 y a un usuario 2. Al usuario 1 le gusta el cubo de rubik, los dados y un juego de cartas. Al usuario 2 también le gusta el cubo de rubik y los dados, pero no ha jugado al juego de cartas.
 De forma muy simple, el sistema de recomendación user-item reconocería que estos dos usuarios comparten gustos similares, y al usuario 2 se le recomendaría el juego de cartas, que es jugado por otro usuario con gustos similares al suyo.  

El algoritmo esta fundamentado en que un usuario es similar a otro cuando le gustan (en este caso) los mismos juegos que a otro usuario. Entonces, si hay un juego que el usuario 2 no ha jugado, pero que el usuario 1 si, y son usuarios similares, es probable que al usuario 2 le guste el juego. 

Comenzamos importando las librerías necesarias para trabajar

In [1]:
# Para procesar los datos
import pandas as pd
import numpy as np
import scipy.stats as stats

# Para visualizar los datos
import seaborn as sns

# Para la similitud
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
juegos = pd.read_parquet("../Datasets/steam_games_complete.parquet")
juegos.head()

Unnamed: 0,item_id,item_name,developer,genres,tags,specs,release_date,price
88310,761140,Lost Summoner Kitty,Kotoshiro,"[Action, Casual, Indie, Simulation, Strategy]","[Strategy, Action, Indie, Casual, Simulation]",[Single-player],2018-01-04,4.99
88311,643980,Ironbound,Secret Level SRL,"[Free to Play, Indie, RPG, Strategy]","[Free to Play, Strategy, Indie, RPG, Card Game...","[Single-player, Multi-player, Online Multi-Pla...",2018-01-04,0.0
88312,670290,Real Pool 3D - Poolians,Poolians.com,"[Casual, Free to Play, Indie, Simulation, Sports]","[Free to Play, Simulation, Sports, Casual, Ind...","[Single-player, Multi-player, Online Multi-Pla...",2017-07-24,0.0
88313,767400,弹炸人2222,彼岸领域,"[Action, Adventure, Casual]","[Action, Adventure, Casual]",[Single-player],2017-12-07,0.99
88315,772540,Battle Royale Trainer,Trickjump Games Ltd,"[Action, Adventure, Simulation]","[Action, Adventure, Simulation, FPS, Shooter, ...","[Single-player, Steam Achievements]",2018-01-04,3.99


In [3]:
reseñas = pd.read_parquet("../Datasets/reviews_con_puntaje.parquet")
reseñas.head()

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review,user_id,puntaje
0,,"Posted November 5, 2011.",,1250,No ratings yet,True,Simple yet with great replayability. In my opi...,76561197970982479,2
1,,"Posted July 15, 2011.",,22200,No ratings yet,True,It's unique and worth a playthrough.,76561197970982479,2
2,,"Posted April 21, 2011.",,43110,No ratings yet,True,Great atmosphere. The gunplay can be a bit chu...,76561197970982479,2
3,,"Posted June 24, 2014.",,251610,15 of 20 people (75%) found this review helpful,True,I know what you think when you see this title ...,js41637,2
4,,"Posted September 8, 2013.",,227300,0 of 1 people (0%) found this review helpful,True,For a simple (it's actually not all that simpl...,js41637,2


In [4]:
user_items = pd.read_parquet("../EDA/user_items_complete.parquet")
user_items.head()

Unnamed: 0,item_id,item_name,playtime_forever,playtime_2weeks,user_id
0,10,Counter-Strike,6,0,76561197970982479
1,20,Team Fortress Classic,0,0,76561197970982479
2,30,Day of Defeat,7,0,76561197970982479
3,40,Deathmatch Classic,0,0,76561197970982479
4,50,Half-Life: Opposing Force,0,0,76561197970982479


A continuación, veamos un poco la longitud de estos 3 conjuntos de datos.

In [5]:
print(len(juegos))
print()
print(len(reseñas))
print()
print(len(user_items))


22530

59305

5153209


Comencemos filtrando el dataset de user_items, ya que es el más largo de todos.

In [6]:
# Primero, observemos el valor máximo de la columna playtime_forever de user_items
user_items["playtime_forever"].max()

642773

Según deduzco, la unidad de tiempo en la que se midió este valor es en minutos, ya que (si fuesen horas) 642773 / 24 = 26782 (días jugados), 26782 / 365 = 73 (años jugados). Es una barbaridad, y es lógico pensar que esa no es la unidad de tiempo. En cambio, si consideramos que son minutos: 642773 / 60 = 10712 (horas jugadas), 10712 / 24 = 446 (días jugados), 446 / 365 = un año y meses. Tiene más sentido.


Voy a considerar a los usuarios que unicamente invirtieron historicamente más de 10 días (240 horas, 14.400 minutos) en el juego dado. Veamos cómo se reduce el dataset.

In [7]:
len(user_items[user_items["playtime_forever"] > 14400])

66487

Con eso pasaría de 5 millones de filas a sólo 66487. Las cuáles siguen siendo bastante, pero con esa consideración logré reducir el dataset en un 98% de su tamaño original.

In [8]:
# Guardamos los cambios en el dataset
user_items = user_items[user_items["playtime_forever"] > 14400]
user_items

Unnamed: 0,item_id,item_name,playtime_forever,playtime_2weeks,user_id
178,730,Counter-Strike: Global Offensive,23532,0,76561197970982479
1301,466170,Idling to Rule the Gods,28545,1554,evcentric
1354,12210,Grand Theft Auto IV,52062,0,Riot-Punch
1361,21660,Street Fighter IV,23903,0,Riot-Punch
1907,730,Counter-Strike: Global Offensive,19800,0,doctr
...,...,...,...,...,...
5150878,304930,Unturned,14476,0,76561198285507552
5150968,730,Counter-Strike: Global Offensive,29789,0,sexyawp
5151088,730,Counter-Strike: Global Offensive,16635,0,tarik22
5152561,304930,Unturned,20201,0,76561198304604920


In [9]:
len(user_items["item_id"].unique())

1071

Lo que voy a hacer a continuación es tener en cuenta sólo aquellos juegos que tienen más de 100 reseñas.

In [10]:
juegos_con_mas_de_100_reseñas = reseñas["item_id"].value_counts() > 100
# juegos_con_mas_de_100_reseñas
juegos_con_mas_de_100_reseñas.value_counts()
# reseñas[reseñas["item_id"] == (reseñas["item_id"].value_counts() > 100)]

count
False    3583
True       99
Name: count, dtype: int64

In [11]:
len(reseñas[reseñas["item_id"] == "211820"])

351

In [12]:
reseñas

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review,user_id,puntaje
0,,"Posted November 5, 2011.",,1250,No ratings yet,True,Simple yet with great replayability. In my opi...,76561197970982479,2
1,,"Posted July 15, 2011.",,22200,No ratings yet,True,It's unique and worth a playthrough.,76561197970982479,2
2,,"Posted April 21, 2011.",,43110,No ratings yet,True,Great atmosphere. The gunplay can be a bit chu...,76561197970982479,2
3,,"Posted June 24, 2014.",,251610,15 of 20 people (75%) found this review helpful,True,I know what you think when you see this title ...,js41637,2
4,,"Posted September 8, 2013.",,227300,0 of 1 people (0%) found this review helpful,True,For a simple (it's actually not all that simpl...,js41637,2
...,...,...,...,...,...,...,...,...,...
59300,,Posted July 10.,,70,No ratings yet,True,a must have classic from steam definitely wort...,76561198312638244,2
59301,,Posted July 8.,,362890,No ratings yet,True,this game is a perfect remake of the original ...,76561198312638244,2
59302,1 person found this review funny,Posted July 3.,,273110,1 of 2 people (50%) found this review helpful,True,had so much fun plaing this and collecting res...,LydiaMorley,2
59303,,Posted July 20.,,730,No ratings yet,True,:D,LydiaMorley,2


In [13]:
lista_booleana_para_mascara = []
for id in reseñas["item_id"]:
    lista_booleana_para_mascara.append(juegos_con_mas_de_100_reseñas[id])


In [14]:
lista_booleana_para_mascara

[True,
 False,
 False,
 False,
 True,
 False,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 True,
 False,
 True,
 True,
 True,
 True,
 False,
 True,
 True,
 False,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 False,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 False,
 True,
 True,
 True,
 False,
 True,
 True,
 False,
 False,
 False,
 True,
 False,
 True,
 False,
 False,
 True,
 False,
 True,
 True,
 False,
 True,
 True,
 True,
 True,
 False,
 True,
 False,
 True,
 True,
 False,
 True,
 True,
 True,
 True,
 True,
 True,
 False,
 False,
 True,
 True,
 True,
 False,
 True,
 True,
 True,
 False,
 True,
 False,
 True,
 True,
 False,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 True,
 False,
 False,
 True,
 False,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 False,
 False,
 False,
 True,
 False,
 Fa

In [15]:
reseñas = reseñas[lista_booleana_para_mascara]
reseñas

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review,user_id,puntaje
0,,"Posted November 5, 2011.",,1250,No ratings yet,True,Simple yet with great replayability. In my opi...,76561197970982479,2
4,,"Posted September 8, 2013.",,227300,0 of 1 people (0%) found this review helpful,True,For a simple (it's actually not all that simpl...,js41637,2
6,,Posted February 3.,,248820,No ratings yet,True,A suitably punishing roguelike platformer. Wi...,evcentric,2
16,,"Posted November 22, 2012.",,207610,No ratings yet,True,The ending to this game is.... ♥♥♥♥♥♥♥.... Jus...,doctr,2
18,3 people found this review funny,"Posted April 15, 2014.",,211420,35 of 43 people (81%) found this review helpful,True,Git gud,maplemage,2
...,...,...,...,...,...,...,...,...,...
59295,,Posted May 31.,,261030,0 of 1 people (0%) found this review helpful,True,I cried in the end its so sadding ]'; I wish l...,76561198306599751,0
59296,,Posted June 17.,,730,0 of 1 people (0%) found this review helpful,True,Gra naprawdę fajna.Ale jest kilka rzeczy do kt...,Ghoustik,1
59297,1 person found this review funny,Posted June 23.,,570,1 of 1 people (100%) found this review helpful,True,Well Done,76561198310819422,2
59303,,Posted July 20.,,730,No ratings yet,True,:D,LydiaMorley,2


In [16]:
usuarios_con_mas_de_5_reseñas = reseñas["user_id"].value_counts() > 5
usuarios_con_mas_de_5_reseñas

user_id
76561198063316459     True
OfficialSenix         True
76561198094304449     True
ITCGaming101          True
ironcass              True
                     ...  
Melonman231          False
pandashan            False
76561198059167447    False
76561198116801566    False
76561197970982479    False
Name: count, Length: 19620, dtype: bool

In [17]:
usuarios_con_mas_de_5_reseñas.value_counts()

count
False    19164
True       456
Name: count, dtype: int64

In [18]:
lista_booleana_para_mascara = []
for id in reseñas["user_id"]:
    lista_booleana_para_mascara.append(usuarios_con_mas_de_5_reseñas[id])

lista_booleana_para_mascara

[False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 True,
 True,
 True,
 True,
 True,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 Fals

In [19]:
reseñas[lista_booleana_para_mascara]

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review,user_id,puntaje
39,,"Posted July 14, 2014.",,304930,2 of 4 people (50%) found this review helpful,True,"At first, it looked to me that...""Wow, just an...",DJKamBer,2
40,,"Posted April 27, 2013.",,570,0 of 1 people (0%) found this review helpful,True,"Awesome graphics, texture. Reqiures HARDCORE t...",DJKamBer,2
41,1 person found this review funny,"Posted July 20, 2015.",,218620,2 of 6 people (33%) found this review helpful,True,Upside to PAYDAY 2:+ Updates are released ever...,DJKamBer,2
42,,"Posted November 4, 2013.",,224260,No ratings yet,True,Awesome game! It's like L4D2 and Killing Floor...,DJKamBer,2
43,,"Posted July 12, 2013.",,1250,No ratings yet,True,"Compared to Left 4 Dead 2, this game REALLY gi...",DJKamBer,2
...,...,...,...,...,...,...,...,...,...
58954,2 people found this review funny,"Posted June 1, 2015.",,230410,6 of 9 people (67%) found this review helpful,True,A game for in between. Nothing special,SponsoredByBenQ,1
58955,2 people found this review funny,"Posted June 1, 2015.",,304930,6 of 11 people (55%) found this review helpful,True,Nice Game,SponsoredByBenQ,2
58956,,"Posted May 7, 2015.",,730,1 of 2 people (50%) found this review helpful,True,top game. mm sucks sometimes cause of many rus...,SponsoredByBenQ,2
58957,5 people found this review funny,"Posted May 18, 2015.",,236390,8 of 17 people (47%) found this review helpful,True,"A game for in between. Despite all this, a goo...",SponsoredByBenQ,2


In [20]:
# Se supone que este usuario DJKamBer tiene más de 5 reseñas, comprobamos:
print(usuarios_con_mas_de_5_reseñas["DJKamBer"])
print(len(reseñas[reseñas["user_id"] == "DJKamBer"]))

True
6


Aplico la mascara para dejar unicamente a los 456 usuarios con más de 5 reseñas

In [21]:
reseñas = reseñas[lista_booleana_para_mascara]
reseñas

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review,user_id,puntaje
39,,"Posted July 14, 2014.",,304930,2 of 4 people (50%) found this review helpful,True,"At first, it looked to me that...""Wow, just an...",DJKamBer,2
40,,"Posted April 27, 2013.",,570,0 of 1 people (0%) found this review helpful,True,"Awesome graphics, texture. Reqiures HARDCORE t...",DJKamBer,2
41,1 person found this review funny,"Posted July 20, 2015.",,218620,2 of 6 people (33%) found this review helpful,True,Upside to PAYDAY 2:+ Updates are released ever...,DJKamBer,2
42,,"Posted November 4, 2013.",,224260,No ratings yet,True,Awesome game! It's like L4D2 and Killing Floor...,DJKamBer,2
43,,"Posted July 12, 2013.",,1250,No ratings yet,True,"Compared to Left 4 Dead 2, this game REALLY gi...",DJKamBer,2
...,...,...,...,...,...,...,...,...,...
58954,2 people found this review funny,"Posted June 1, 2015.",,230410,6 of 9 people (67%) found this review helpful,True,A game for in between. Nothing special,SponsoredByBenQ,1
58955,2 people found this review funny,"Posted June 1, 2015.",,304930,6 of 11 people (55%) found this review helpful,True,Nice Game,SponsoredByBenQ,2
58956,,"Posted May 7, 2015.",,730,1 of 2 people (50%) found this review helpful,True,top game. mm sucks sometimes cause of many rus...,SponsoredByBenQ,2
58957,5 people found this review funny,"Posted May 18, 2015.",,236390,8 of 17 people (47%) found this review helpful,True,"A game for in between. Despite all this, a goo...",SponsoredByBenQ,2


In [43]:
# Lo que voy a hacer es aumentar el valor del puntaje de las reseñas en 1, considerando las negativas con un 1, las neutras con un 2 y las positivas con un tres.
reseñas["puntaje"] += 1
reseñas

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reseñas["puntaje"] += 1


Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review,user_id,puntaje
39,,"Posted July 14, 2014.",,304930,2 of 4 people (50%) found this review helpful,True,"At first, it looked to me that...""Wow, just an...",DJKamBer,4
40,,"Posted April 27, 2013.",,570,0 of 1 people (0%) found this review helpful,True,"Awesome graphics, texture. Reqiures HARDCORE t...",DJKamBer,4
41,1 person found this review funny,"Posted July 20, 2015.",,218620,2 of 6 people (33%) found this review helpful,True,Upside to PAYDAY 2:+ Updates are released ever...,DJKamBer,4
42,,"Posted November 4, 2013.",,224260,No ratings yet,True,Awesome game! It's like L4D2 and Killing Floor...,DJKamBer,4
43,,"Posted July 12, 2013.",,1250,No ratings yet,True,"Compared to Left 4 Dead 2, this game REALLY gi...",DJKamBer,4
...,...,...,...,...,...,...,...,...,...
58954,2 people found this review funny,"Posted June 1, 2015.",,230410,6 of 9 people (67%) found this review helpful,True,A game for in between. Nothing special,SponsoredByBenQ,3
58955,2 people found this review funny,"Posted June 1, 2015.",,304930,6 of 11 people (55%) found this review helpful,True,Nice Game,SponsoredByBenQ,4
58956,,"Posted May 7, 2015.",,730,1 of 2 people (50%) found this review helpful,True,top game. mm sucks sometimes cause of many rus...,SponsoredByBenQ,4
58957,5 people found this review funny,"Posted May 18, 2015.",,236390,8 of 17 people (47%) found this review helpful,True,"A game for in between. Despite all this, a goo...",SponsoredByBenQ,4


In [23]:
print(len(reseñas["item_id"].unique())) # 99 juegos
print(len(reseñas["user_id"].unique())) # 456 usuarios

99
456


Ahora lo que voy a hacer es unir las tablas de reseña y de juegos para tener disponible el nombre del juego, y no sólo el item_id, lo cuál me va a simplificar mucho más adelante

In [24]:
juegos_y_reseñas = pd.merge(reseñas,juegos,on="item_id")
juegos_y_reseñas

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review,user_id,puntaje,item_name,developer,genres,tags,specs,release_date,price
0,,"Posted July 14, 2014.",,304930,2 of 4 people (50%) found this review helpful,True,"At first, it looked to me that...""Wow, just an...",DJKamBer,3,Unturned,Smartly Dressed Games,"[Action, Adventure, Casual, Free to Play, Indie]","[Free to Play, Survival, Zombies, Multiplayer,...","[Single-player, Online Multi-Player, Online Co...",2017-07-07,0.00
1,,"Posted November 4, 2013.",,224260,No ratings yet,True,Awesome game! It's like L4D2 and Killing Floor...,DJKamBer,3,No More Room in Hell,No More Room in Hell Team,"[Action, Free to Play, Indie]","[Free to Play, Zombies, Multiplayer, Survival,...","[Multi-player, Co-op, Cross-Platform Multiplay...",2011-10-31,0.00
2,,"Posted July 12, 2013.",,1250,No ratings yet,True,"Compared to Left 4 Dead 2, this game REALLY gi...",DJKamBer,3,Killing Floor,Tripwire Interactive,[Action],"[FPS, Zombies, Co-op, Survival, Action, Multip...","[Single-player, Multi-player, Co-op, Cross-Pla...",2009-05-14,19.99
3,,"Posted August 19, 2012.",,440,No ratings yet,True,You won't regret playing it!,DJKamBer,3,Team Fortress 2,Valve,"[Action, Free to Play]","[Free to Play, Multiplayer, FPS, Action, Shoot...","[Multi-player, Cross-Platform Multiplayer, Ste...",2007-10-10,0.00
4,,Posted July 30.,,4000,0 of 1 people (0%) found this review helpful,True,Vale a pena a pagar 20 R$ nesse jogo porque: ...,diego9031,2,Garry's Mod,Facepunch Studios,"[Indie, Simulation]","[Sandbox, Multiplayer, Funny, Moddable, Buildi...","[Single-player, Multi-player, Co-op, Cross-Pla...",2006-11-29,9.99
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2551,2 people found this review funny,"Posted June 1, 2015.",,107410,23 of 26 people (88%) found this review helpful,True,A really nice game. If you want to play a real...,SponsoredByBenQ,3,Arma 3,Bohemia Interactive,"[Action, Simulation, Strategy]","[Simulation, Military, Multiplayer, Realistic,...","[Single-player, Multi-player, Online Multi-Pla...",2013-09-12,39.99
2552,,"Posted May 18, 2015.",,440,5 of 6 people (83%) found this review helpful,True,"A game for in between. Despite all this, a goo...",SponsoredByBenQ,3,Team Fortress 2,Valve,"[Action, Free to Play]","[Free to Play, Multiplayer, FPS, Action, Shoot...","[Multi-player, Cross-Platform Multiplayer, Ste...",2007-10-10,0.00
2553,2 people found this review funny,"Posted June 1, 2015.",,230410,6 of 9 people (67%) found this review helpful,True,A game for in between. Nothing special,SponsoredByBenQ,2,Warframe,Digital Extremes,"[Action, Free to Play]","[Free to Play, Action, Co-op, Multiplayer, Thi...","[Single-player, Multi-player, Co-op, Steam Tra...",2013-03-25,0.00
2554,2 people found this review funny,"Posted June 1, 2015.",,304930,6 of 11 people (55%) found this review helpful,True,Nice Game,SponsoredByBenQ,3,Unturned,Smartly Dressed Games,"[Action, Adventure, Casual, Free to Play, Indie]","[Free to Play, Survival, Zombies, Multiplayer,...","[Single-player, Online Multi-Player, Online Co...",2017-07-07,0.00


A continuación elimino las columnas y filas innecesarias para una correcta visualización y óptimo procesamiento

In [25]:
# Empecemos por las columnas
juegos_y_reseñas.columns 

Index(['funny', 'posted', 'last_edited', 'item_id', 'helpful', 'recommend',
       'review', 'user_id', 'puntaje', 'item_name', 'developer', 'genres',
       'tags', 'specs', 'release_date', 'price'],
      dtype='object')

Las únicas columnas que me sirven son:

- 'user_id'
- 'puntaje'
- 'item_name'

User ID me sirve para usarlo de índice en la tabla que voy a crear a continuación, item_name me sirve para usarlo de nombre de columnas, y puntajes para los valores dentro de esa tabla.


Procedo a eliminar las columnas que no me sirven

In [26]:
for columna in juegos_y_reseñas.columns:
    if columna not in ["user_id","puntaje","item_name"]:
        juegos_y_reseñas.drop(columna, inplace=True, axis=1)
    
print(juegos_y_reseñas.columns)

Index(['user_id', 'puntaje', 'item_name'], dtype='object')


Ahora voy con las filas

In [27]:
juegos_y_reseñas

Unnamed: 0,user_id,puntaje,item_name
0,DJKamBer,3,Unturned
1,DJKamBer,3,No More Room in Hell
2,DJKamBer,3,Killing Floor
3,DJKamBer,3,Team Fortress 2
4,diego9031,2,Garry's Mod
...,...,...,...
2551,SponsoredByBenQ,3,Arma 3
2552,SponsoredByBenQ,3,Team Fortress 2
2553,SponsoredByBenQ,2,Warframe
2554,SponsoredByBenQ,3,Unturned


In [28]:
# Chequeo de nulos
juegos_y_reseñas.isnull().sum() # No hay, entonces no elimino ninguna fila

user_id      0
puntaje      0
item_name    0
dtype: int64

Procedo a crear la tabla

In [29]:
matrix = juegos_y_reseñas.pivot_table(index='user_id',columns='item_name',values='puntaje')
matrix.head()

item_name,APB Reloaded,ARK: Survival Evolved,Ace of Spades: Battle Builder,AdVenture Capitalist,Arma 3,Awesomenauts - the 2D moba,Bad Rats: the Rats' Revenge,BattleBlock Theater®,BioShock Infinite,Blacklight: Retribution,...,The Stanley Parable,The Walking Dead,The Walking Dead: Season 2,Tomb Raider,Trove,Undertale,Unturned,Verdun,Warframe,XCOM: Enemy Unknown
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
09879655452567,,3.0,,,,,,,,,...,,,,,1.0,,,,,
10051997,,,,,,,,,,,...,,,,,,,,,,
1011001,,,,,,,,,,,...,,,,,,,,,,
111222333444555666888,,,,,,,,1.0,2.0,,...,,,,,,,,,,
1234567io9872345678765432,1.0,,,,1.0,,,,,,...,,,,,,,1.0,,,


Ahora le resto a cada fila la media de la fila para normalizar la tabla

In [30]:
matrix_norm = matrix.subtract(matrix.mean(axis=1), axis = 'rows')
matrix_norm.head()

item_name,APB Reloaded,ARK: Survival Evolved,Ace of Spades: Battle Builder,AdVenture Capitalist,Arma 3,Awesomenauts - the 2D moba,Bad Rats: the Rats' Revenge,BattleBlock Theater®,BioShock Infinite,Blacklight: Retribution,...,The Stanley Parable,The Walking Dead,The Walking Dead: Season 2,Tomb Raider,Trove,Undertale,Unturned,Verdun,Warframe,XCOM: Enemy Unknown
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
09879655452567,,1.0,,,,,,,,,...,,,,,-1.0,,,,,
10051997,,,,,,,,,,,...,,,,,,,,,,
1011001,,,,,,,,,,,...,,,,,,,,,,
111222333444555666888,,,,,,,,-0.666667,0.333333,,...,,,,,,,,,,
1234567io9872345678765432,-0.2,,,,-0.2,,,,,,...,,,,,,,-0.2,,,


In [31]:
matrix_norm

item_name,APB Reloaded,ARK: Survival Evolved,Ace of Spades: Battle Builder,AdVenture Capitalist,Arma 3,Awesomenauts - the 2D moba,Bad Rats: the Rats' Revenge,BattleBlock Theater®,BioShock Infinite,Blacklight: Retribution,...,The Stanley Parable,The Walking Dead,The Walking Dead: Season 2,Tomb Raider,Trove,Undertale,Unturned,Verdun,Warframe,XCOM: Enemy Unknown
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
09879655452567,,1.0,,,,,,,,,...,,,,,-1.0,,,,,
10051997,,,,,,,,,,,...,,,,,,,,,,
1011001,,,,,,,,,,,...,,,,,,,,,,
111222333444555666888,,,,,,,,-0.666667,0.333333,,...,,,,,,,,,,
1234567io9872345678765432,-0.2,,,,-0.2,,,,,,...,,,,,,,-0.2,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
zakbot,,-0.6,-0.6,,,,,,,,...,,,,,,,,,,
zayyntt,,,,,,,,,,,...,,,,,,,-0.2,,,
zerzang,,,,,,,,,,,...,,,,,,,,,,
zrustz16,,,,,,,,,,,...,,,,,,,-0.4,,,


In [32]:
# Rellenar NaN con 0s ya que el coseno no se ve afectado por ceros
matrix_norm_filled = matrix_norm.fillna(0)

# Calcular la similitud del coseno
similitud_del_coseno = cosine_similarity(matrix_norm_filled)

# Convertir a DataFrame para mejor legibilidad
similitud_del_coseno

array([[ 1.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  1.        , -0.15474612, ...,  0.        ,
        -0.59628479,  0.        ],
       [ 0.        , -0.15474612,  1.        , ...,  0.        ,
         0.2076137 ,  0.        ],
       ...,
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        , -0.59628479,  0.2076137 , ...,  0.        ,
         1.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]])

Ahora lo que voy a hacer es ver la similitud entre los usuarios, colocandolos tanto en el indice cómo en las columnas

In [33]:
# Convertir a DataFrame para mejor legibilidad
similitud_de_usuarios_coseno = pd.DataFrame(similitud_del_coseno, index=matrix_norm.index, columns=matrix_norm.index)

similitud_de_usuarios_coseno

user_id,09879655452567,10051997,1011001,111222333444555666888,1234567io9872345678765432,1873410337,1snap,210396,29123,2sBs,...,xtomx_freedom,yotuic,you_re_ded,youngbenaffleck,zaaikbr,zakbot,zayyntt,zerzang,zrustz16,zyr0n1c
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
09879655452567,1.000000,0.000000,0.000000,0.000000,0.000000,0.111803,0.265334,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,-0.167705,0.000000,0.0,0.000000,0.0
10051997,0.000000,1.000000,-0.154746,-0.333333,-0.730297,0.821584,0.000000,0.117851,0.288675,0.484481,...,0.521749,-0.125988,0.195180,0.182574,0.000000,0.000000,-0.091287,0.0,-0.596285,0.0
1011001,0.000000,-0.154746,1.000000,0.154746,0.355983,-0.339032,0.000000,0.109422,0.268028,-0.034602,...,0.069205,0.058489,0.181220,-0.288177,0.346023,0.000000,-0.084758,0.0,0.207614,0.0
111222333444555666888,0.000000,-0.333333,0.154746,1.000000,0.365148,-0.365148,0.000000,0.000000,0.000000,-0.186339,...,-0.149071,0.062994,0.000000,-0.091287,0.000000,0.000000,0.000000,0.0,0.223607,0.0
1234567io9872345678765432,0.000000,-0.730297,0.355983,0.365148,1.000000,-0.800000,0.000000,0.000000,0.000000,-0.306186,...,-0.244949,0.345033,-0.106904,-0.275000,0.000000,0.000000,0.050000,0.0,0.571548,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
zakbot,-0.167705,0.000000,0.000000,0.000000,0.000000,-0.075000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.262500,0.000000,1.000000,0.000000,0.0,0.000000,0.0
zayyntt,0.000000,-0.091287,-0.084758,0.000000,0.050000,0.000000,0.339032,-0.064550,-0.158114,0.183712,...,-0.122474,0.207020,-0.213809,0.000000,-0.163299,0.000000,1.000000,0.0,0.081650,0.0
zerzang,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
zrustz16,0.000000,-0.596285,0.207614,0.223607,0.571548,-0.571548,0.000000,0.000000,0.000000,-0.250000,...,-0.200000,0.422577,-0.174574,-0.122474,0.000000,0.000000,0.081650,0.0,1.000000,0.0


Acá comenzaría el primer prototipo del sistema de recomendación

In [34]:
# Elijo un usuario cualquiera para encontrar similitudes, en mi caso (el primero):
usuario_elegido = "09879655452567"	

# Elimino la fila con el usuario elegido y muestro el dataframe
similitud_de_usuarios_coseno.drop(index=usuario_elegido,inplace=True)
similitud_de_usuarios_coseno.head()

user_id,09879655452567,10051997,1011001,111222333444555666888,1234567io9872345678765432,1873410337,1snap,210396,29123,2sBs,...,xtomx_freedom,yotuic,you_re_ded,youngbenaffleck,zaaikbr,zakbot,zayyntt,zerzang,zrustz16,zyr0n1c
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10051997,0.0,1.0,-0.154746,-0.333333,-0.730297,0.821584,0.0,0.117851,0.288675,0.484481,...,0.521749,-0.125988,0.19518,0.182574,0.0,0.0,-0.091287,0.0,-0.596285,0.0
1011001,0.0,-0.154746,1.0,0.154746,0.355983,-0.339032,0.0,0.109422,0.268028,-0.034602,...,0.069205,0.058489,0.18122,-0.288177,0.346023,0.0,-0.084758,0.0,0.207614,0.0
111222333444555666888,0.0,-0.333333,0.154746,1.0,0.365148,-0.365148,0.0,0.0,0.0,-0.186339,...,-0.149071,0.062994,0.0,-0.091287,0.0,0.0,0.0,0.0,0.223607,0.0
1234567io9872345678765432,0.0,-0.730297,0.355983,0.365148,1.0,-0.8,0.0,0.0,0.0,-0.306186,...,-0.244949,0.345033,-0.106904,-0.275,0.0,0.0,0.05,0.0,0.571548,0.0
1873410337,0.111803,0.821584,-0.339032,-0.365148,-0.8,1.0,0.0,0.0,0.0,0.408248,...,0.244949,-0.138013,0.0,0.275,0.0,-0.075,0.0,0.0,-0.571548,0.0


In [35]:
# Elijo una determinada cantidad de usuarios similares para encontrar, en mi caso voy a hacer el top 9 de usuarios más similares
cantidad_de_usuarios_similares = 9

# El porcentaje minimo de similitud lo establecí en un 36%
similitud_minima_entre_usuarios = 0.36

In [36]:
# Busco los usuarios similares y los ordeno desde el más similar al menos similar
usuarios_similares = similitud_de_usuarios_coseno[similitud_de_usuarios_coseno[usuario_elegido] > similitud_minima_entre_usuarios][usuario_elegido].sort_values(ascending=False)[:cantidad_de_usuarios_similares]

In [37]:
print(f'Los usuarios similares al usuario {usuario_elegido} son', usuarios_similares)

Los usuarios similares al usuario 09879655452567 son user_id
tarjla               0.462910
ImWinKo              0.433013
WCM03                0.381385
76561198096443555    0.365148
ShevilleWarhand      0.365148
Name: 09879655452567, dtype: float64


Vemos que en el caso del usuario elegido, el usuario que "mas se parece" tiene una similitud de un 46%

A continuación, voy a eliminar todos los juegos excepto aquellos que jugó el usuario elegido.

In [38]:
juegos_jugados_por_usuario_elegido = matrix_norm[matrix_norm.index == usuario_elegido].dropna(axis=1,how="all")
juegos_jugados_por_usuario_elegido


item_name,ARK: Survival Evolved,Garry's Mod,Heroes &amp; Generals,The Forest,Trove
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
9879655452567,1.0,0.0,1.0,-1.0,-1.0


In [39]:
# Juegos que usuarios similares han jugado. Eliminamos los juegos que ninguno de los usuarios similares haya jugado
juegos_de_usuarios_similares = matrix_norm[matrix_norm.index.isin(usuarios_similares.index)].dropna(axis=1, how='all')
juegos_de_usuarios_similares

item_name,Arma 3,Call of Duty®: Black Ops III,Counter-Strike: Global Offensive,Counter-Strike: Source,DayZ,Five Nights at Freddy's,Garry's Mod,Heroes &amp; Generals,Middle-earth™: Shadow of Mordor™,No More Room in Hell,...,Red Orchestra 2: Heroes of Stalingrad with Rising Storm,Rust,Sid Meier's Civilization® V,Starbound,Team Fortress 2,Terraria,The Forest,Trove,Unturned,Verdun
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
76561198096443555,,,-0.666667,,-0.666667,,,1.333333,0.333333,,...,,-0.666667,,,,,,,,0.333333
ImWinKo,0.666667,,-1.333333,,0.666667,,,0.666667,,,...,,0.666667,,,,,-1.333333,,,
ShevilleWarhand,,0.333333,,,,0.333333,,,,,...,,,,,0.333333,,0.333333,-1.666667,,
WCM03,,,0.875,0.875,,,,0.875,,-1.125,...,-0.125,-1.125,0.875,,,,-1.125,,,
tarjla,,,0.285714,,,,0.285714,,,,...,,0.285714,,0.285714,,0.285714,-1.714286,,0.285714,


In [40]:
# Eliminamos los juegos jugados por el usuario elegido
juegos_de_usuarios_similares.drop(juegos_jugados_por_usuario_elegido.columns,axis=1, inplace=True, errors='ignore')

# Y veamos la tabla resultante
juegos_de_usuarios_similares

item_name,Arma 3,Call of Duty®: Black Ops III,Counter-Strike: Global Offensive,Counter-Strike: Source,DayZ,Five Nights at Freddy's,Middle-earth™: Shadow of Mordor™,No More Room in Hell,Realm of the Mad God,Red Orchestra 2: Heroes of Stalingrad with Rising Storm,Rust,Sid Meier's Civilization® V,Starbound,Team Fortress 2,Terraria,Unturned,Verdun
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
76561198096443555,,,-0.666667,,-0.666667,,0.333333,,,,-0.666667,,,,,,0.333333
ImWinKo,0.666667,,-1.333333,,0.666667,,,,,,0.666667,,,,,,
ShevilleWarhand,,0.333333,,,,0.333333,,,0.333333,,,,,0.333333,,,
WCM03,,,0.875,0.875,,,,-1.125,,-0.125,-1.125,0.875,,,,,
tarjla,,,0.285714,,,,,,,,0.285714,,0.285714,,0.285714,0.285714,


In [41]:
len(juegos_de_usuarios_similares.columns) # 17 juegos posibles para recomendar

17

In [42]:
# Creamos un diccionario para guardar las puntuaciones promedio de los juegos
puntaje_promedio = {}

# Iteramos sobre los juegos de los usuarios similares (que el usuario elegido no jugó)
for juego in juegos_de_usuarios_similares.columns:
  # Obtenemos el puntaje para el juego sobre el cuál se esté iterando
  puntaje_juego = juegos_de_usuarios_similares[juego]

  # Create a variable to store the score Variable para guardar el puntaje total del juego
  total = 0

  # Variable para contar la cantidad de puntajes que tiene el juego dado
  cantidad = 0

  # Iteramos entre los usuarios similares
  for usuario in usuarios_similares.index:

    # Si el juego fué punteado por el usuario
    if pd.isna(puntaje_juego[usuario]) == False:

      # puntaje_parcial es la multiplicación entre el puntaje entre ambos usuarios (el elegido y el iterado) y entre el puntaje que el usuario iterado le dió al juego en cuestión
      puntaje_parcial = usuarios_similares[usuario] * puntaje_juego[usuario]

      # Agregamos el puntaje parcial al total previamente creado
      total += puntaje_parcial

      # Agregamos 1 a la cantidad
      cantidad += 1
  
  # Luego de iterar sobre cada usuario, obtenemos el puntaje promedio para el juego iterado. El juego se almacena cómo clave dentro del diccionario y el valor es el promedio
  puntaje_promedio[juego] = total / cantidad

# Convertimos el diccionario a un dataframe
item_score = pd.DataFrame(puntaje_promedio.items(), columns=['item_name', 'item_score'])

# Ordenamos los juegos por su puntaje
ranked_item_score = item_score.sort_values(by='item_score', ascending=False)

# Mostramos el top m juegos
m = 5
ranked_item_score.head(m)

Unnamed: 0,item_name,item_score
3,Counter-Strike: Source,0.333712
11,Sid Meier's Civilization® V,0.333712
0,Arma 3,0.288675
15,Unturned,0.13226
14,Terraria,0.13226


Doy por finalizado la creación del sistema de recomendación, en esta misma carpeta voy a crear otro archivo en dónde desarrolle el sistema de recomendación dentro de una función