# Pr√°ctico 2: Recomendaci√≥n de videojuegos

Autor: Mariano Zibecchi  
mzibecchi@gmail.com . 

La resolucion de la primer parte del practico, consiste en la definicion de una funcion __recomend_games_for_user__

La funcion recibe como parametros el username, el modelo entrenado y la cantidad de recomendaciones que se desea recibir.

Retorna como resultado una lista de juegos recomendados.
  
  
  
  

En este pr√°ctico trabajaremos con un subconjunto de datos sobre [videojuegos de Steam](http://cseweb.ucsd.edu/~jmcauley/datasets.html#steam_data). Para facilitar un poco el pr√°ctico, se les dar√° el conjunto de datos previamente procesado. En este mismo notebook mostraremos el proceso de limpieza, para que quede registro del proceso (de todas maneras, por el tama√±o de los datos no recomendamos que pierdan tiempo en el proceso salvo que lo consideren √∫til a fines personales). 

El conjunto de datos se basa en dos partes: lista de juegos (items), y lista de reviews de usuarios sobre distintos juegos. Este √∫ltimo, en su versi√≥n original es muy grande, (pesa 1.3GB), por lo que ser√° solo una muestra del mismo sobre la que trabajar√°n.

A diferencia del conjunto de datos de LastFM utilizados en el [Pr√°ctico 1](./practico1.ipynb), en este caso los datos no est√°n particularmente pensados para un sistema de recomendaci√≥n, por lo que requerir√° de un poco m√°s de trabajo general sobre el dataset.

La idea es que, de manera similar al pr√°ctico anterior, realicen un sistema de recomendaci√≥n. A diferencia del pr√°ctico anterior, este ser√° un poco m√°s completo y deber√°n hacer dos sistemas, uno que, dado un nombre de usuario le recomiende una lista de juegos, y otro que dado el t√≠tulo de un juego, recomiende una lista de juegos similares. Adem√°s, en este caso se requiere que el segundo sistema (el que recomienda juegos basado en el nombre de un juego en particular) haga uso de la informaci√≥n de contenido (i.e. o bien har√°n un filtrado basado en contenido o algo h√≠brido).

## Obtenci√≥n y limpieza del conjunto de datos

El conjunto de datos originalmente se encuentra en archivos que deber√≠an ser de formato "JSON". Sin embargo, en realidad es un archivo donde cada l√≠nea es un objeto de JSON. Hay un problema no obstante y es que las l√≠neas est√°n mal formateadas, dado que no respetan el est√°ndar JSON de utilizar comillas dobles (**"**) y en su lugar utilizan comillas simples (**'**). Afortunadamente, se pueden evaluar como diccionarios de Python, lo cu√°l permite trabajarlos directamente.

## Ejercicio 1: An√°lisis Exploratorio de Datos

Ya teniendo los datos, podemos cargarlos y empezar con el pr√°ctico. Antes que nada vamos a hacer una exploraci√≥n de los datos. Lo principal a tener en cuenta para este caso es que debemos identificar las variables con las que vamos a trabajar. A diferencia del pr√°ctico anterior, este conjunto de datos no est√° documentado, por lo que la exploraci√≥n es necesaria para poder entender que cosas van a definir nuestro sistema de recomendaci√≥n.

In [1]:
import pandas as pd

### Caracter√≠sticas del conjunto de datos sobre videojuegos

Las caracter√≠sticas del conjunto de datos de videojuegos tienen la informaci√≥n necesaria para hacer el "vector de contenido" utilizado en el segundo sistema de recomendaci√≥n. Su tarea es hacer un an√°lisis sobre dicho conjunto de datos y descartar aquella informaci√≥n redundante.

In [2]:
games_df = pd.read_json("./data/steam/games.json.gz")
games_df.sample(15)

Unnamed: 0,publisher,genres,app_name,title,release_date,tags,discount_price,specs,price,early_access,id,developer,sentiment,metascore
30422,SCS Software,"[Indie, Simulation]",Euro Truck Simulator 2,Euro Truck Simulator 2,2013-01-16,"[Simulation, Driving, Open World, Realistic, R...",,"[Single-player, Steam Achievements, Steam Trad...",19.99,False,227300.0,SCS Software,Overwhelmingly Positive,79.0
22251,Modern Dream Ltd,"[Casual, Indie]",Savana,Savana,2016-07-26,"[Indie, Casual, Point & Click]",,[Single-player],14.99,False,494970.0,Modern Dream Ltd,4 user reviews,
21015,Joe Censored Games,"[Action, Casual]",Omega Reaction,Omega Reaction,2016-10-17,"[Action, Casual, Indie, Twin Stick Shooter, Co...",,"[Single-player, Partial Controller Support]",4.99,False,516520.0,Joe Censored Games,2 user reviews,
5219,Individual Software,[Accounting],Quicken Legal Business Pro,Quicken Legal Business Pro,2015-10-15,"[Software, Education, Great Soundtrack, Atmosp...",,,44.99,False,411350.0,Nolo,,
29049,Microids,"[Action, Adventure, Indie]",Iron Storm,Iron Storm,2002-10-25,"[Action, Adventure, Indie, FPS, Alternate Hist...",,"[Single-player, Multi-player]",2.99,False,296180.0,Microids,Mostly Positive,69.0
631,Kalypso Media Digital,[Strategy],Patrician IV: Rise of a Dynasty,Patrician IV: Rise of a Dynasty,2011-04-11,"[Strategy, Trading, Simulation, Economy]",,"[Single-player, Multi-player, Co-op, Downloada...",9.99,False,57730.0,Gaming Minds Studios,Mixed,67.0
29600,THQ Nordic,"[Action, Indie, Racing, Simulation, Sports, Ea...",Next Car Game: Wreckfest,Next Car Game: Wreckfest,2014-01-14,"[Early Access, Racing, Destruction, Simulation...",,"[Single-player, Multi-player, Online Multi-Pla...",39.99,True,228380.0,Bugbear,Mixed,
1275,Ubisoft,"[Casual, Simulation]",Rocksmith - Fall Out Boy - Thnks fr th Mmrs,Rocksmith - Fall Out Boy - Thnks fr th Mmrs,2013-02-05,"[Casual, Simulation]",,"[Single-player, Shared/Split Screen, Downloada...",2.99,False,222085.0,Ubisoft - San Francisco,,
27186,Black Cloud Studios,"[Adventure, Indie, RPG, Early Access]",After Reset RPG,After Reset RPG,2015-03-09,"[Early Access, RPG, Adventure, Indie, Post-apo...",,[Single-player],49.99,True,335850.0,Black Cloud Studios,Mixed,
30367,Paradox Interactive,[Strategy],Impire,Impire,2013-02-14,"[Strategy, Fantasy, Villain Protagonist]",,"[Single-player, Multi-player, Co-op, Steam Ach...",19.99,False,202130.0,Cyanide Montreal,Mixed,45.0


In [3]:
len(games_df)

32135

In [4]:
games_df.dtypes

publisher          object
genres             object
app_name           object
title              object
release_date       object
tags               object
discount_price    float64
specs              object
price              object
early_access         bool
id                float64
developer          object
sentiment          object
metascore          object
dtype: object

In [5]:
games_df['id'].isnull().sum()

2

In [6]:
#games_df = games_df.dropna()
games_df[ games_df['id'].isnull() ]


Unnamed: 0,publisher,genres,app_name,title,release_date,tags,discount_price,specs,price,early_access,id,developer,sentiment,metascore
74,,,,,,,14.99,,19.99,False,,,,
30961,"Warner Bros. Interactive Entertainment, Feral ...","[Action, Adventure]",Batman: Arkham City - Game of the Year Edition,Batman: Arkham City - Game of the Year Edition,2012-09-07,"[Action, Open World, Batman, Adventure, Stealt...",,"[Single-player, Steam Achievements, Steam Trad...",19.99,False,,"Rocksteady Studios,Feral Interactive (Mac)",Overwhelmingly Positive,91.0


In [7]:
games_df = games_df.dropna (  subset=['id'] )

In [8]:
games_df[ games_df['id'].isnull() ]

Unnamed: 0,publisher,genres,app_name,title,release_date,tags,discount_price,specs,price,early_access,id,developer,sentiment,metascore


In [9]:
games_df

Unnamed: 0,publisher,genres,app_name,title,release_date,tags,discount_price,specs,price,early_access,id,developer,sentiment,metascore
0,Kotoshiro,"[Action, Casual, Indie, Simulation, Strategy]",Lost Summoner Kitty,Lost Summoner Kitty,2018-01-04,"[Strategy, Action, Indie, Casual, Simulation]",4.49,[Single-player],4.99,False,761140.0,Kotoshiro,,
1,"Making Fun, Inc.","[Free to Play, Indie, RPG, Strategy]",Ironbound,Ironbound,2018-01-04,"[Free to Play, Strategy, Indie, RPG, Card Game...",,"[Single-player, Multi-player, Online Multi-Pla...",Free To Play,False,643980.0,Secret Level SRL,Mostly Positive,
2,Poolians.com,"[Casual, Free to Play, Indie, Simulation, Sports]",Real Pool 3D - Poolians,Real Pool 3D - Poolians,2017-07-24,"[Free to Play, Simulation, Sports, Casual, Ind...",,"[Single-player, Multi-player, Online Multi-Pla...",Free to Play,False,670290.0,Poolians.com,Mostly Positive,
3,ÂΩºÂ≤∏È¢ÜÂüü,"[Action, Adventure, Casual]",ÂºπÁÇ∏‰∫∫2222,ÂºπÁÇ∏‰∫∫2222,2017-12-07,"[Action, Adventure, Casual]",0.83,[Single-player],0.99,False,767400.0,ÂΩºÂ≤∏È¢ÜÂüü,,
4,,,Log Challenge,,,"[Action, Indie, Casual, Sports]",1.79,"[Single-player, Full controller support, HTC V...",2.99,False,773570.0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32130,Ghost_RUS Games,"[Casual, Indie, Simulation, Strategy]",Colony On Mars,Colony On Mars,2018-01-04,"[Strategy, Indie, Casual, Simulation]",1.49,"[Single-player, Steam Achievements]",1.99,False,773640.0,"Nikita ""Ghost_RUS""",,
32131,Sacada,"[Casual, Indie, Strategy]",LOGistICAL: South Africa,LOGistICAL: South Africa,2018-01-04,"[Strategy, Indie, Casual]",4.24,"[Single-player, Steam Achievements, Steam Clou...",4.99,False,733530.0,Sacada,,
32132,Laush Studio,"[Indie, Racing, Simulation]",Russian Roads,Russian Roads,2018-01-04,"[Indie, Simulation, Racing]",1.39,"[Single-player, Steam Achievements, Steam Trad...",1.99,False,610660.0,Laush Dmitriy Sergeevich,,
32133,SIXNAILS,"[Casual, Indie]",EXIT 2 - Directions,EXIT 2 - Directions,2017-09-02,"[Indie, Casual, Puzzle, Singleplayer, Atmosphe...",,"[Single-player, Steam Achievements, Steam Cloud]",4.99,False,658870.0,"xropi,stev3ns",1 user reviews,


In [10]:
games_df['id'] = games_df['id'].astype(int)

In [11]:
games_df.head()

Unnamed: 0,publisher,genres,app_name,title,release_date,tags,discount_price,specs,price,early_access,id,developer,sentiment,metascore
0,Kotoshiro,"[Action, Casual, Indie, Simulation, Strategy]",Lost Summoner Kitty,Lost Summoner Kitty,2018-01-04,"[Strategy, Action, Indie, Casual, Simulation]",4.49,[Single-player],4.99,False,761140,Kotoshiro,,
1,"Making Fun, Inc.","[Free to Play, Indie, RPG, Strategy]",Ironbound,Ironbound,2018-01-04,"[Free to Play, Strategy, Indie, RPG, Card Game...",,"[Single-player, Multi-player, Online Multi-Pla...",Free To Play,False,643980,Secret Level SRL,Mostly Positive,
2,Poolians.com,"[Casual, Free to Play, Indie, Simulation, Sports]",Real Pool 3D - Poolians,Real Pool 3D - Poolians,2017-07-24,"[Free to Play, Simulation, Sports, Casual, Ind...",,"[Single-player, Multi-player, Online Multi-Pla...",Free to Play,False,670290,Poolians.com,Mostly Positive,
3,ÂΩºÂ≤∏È¢ÜÂüü,"[Action, Adventure, Casual]",ÂºπÁÇ∏‰∫∫2222,ÂºπÁÇ∏‰∫∫2222,2017-12-07,"[Action, Adventure, Casual]",0.83,[Single-player],0.99,False,767400,ÂΩºÂ≤∏È¢ÜÂüü,,
4,,,Log Challenge,,,"[Action, Indie, Casual, Sports]",1.79,"[Single-player, Full controller support, HTC V...",2.99,False,773570,,,


In [12]:
games_df.metascore.unique()

array([None, 96, 84, 80, 76, 70, 'NA', 69, 81, 75, 72, 66, 67, 77, 91, 89,
       83, 61, 88, 65, 94, 57, 86, 87, 92, 79, 82, 58, 74, 85, 90, 68, 71,
       60, 73, 59, 64, 54, 53, 78, 51, 44, 63, 38, 56, 49, 52, 62, 93, 48,
       34, 95, 43, 55, 24, 46, 41, 20, 39, 45, 35, 47, 40, 36, 50, 32, 37,
       33, 42, 27, 29, 30], dtype=object)

In [13]:
games_df.sentiment.unique()

array([None, 'Mostly Positive', 'Mixed', '1 user reviews',
       '3 user reviews', '8 user reviews', 'Very Positive',
       'Overwhelmingly Positive', '6 user reviews', '5 user reviews',
       '2 user reviews', 'Very Negative', 'Positive', 'Mostly Negative',
       '9 user reviews', 'Negative', '4 user reviews', '7 user reviews',
       'Overwhelmingly Negative'], dtype=object)

### Caracter√≠sticas del conjunto de datos de reviews

Este ser√° el conjunto de datos a utilizar para obtener informaci√≥n sobre los usuarios y su interacci√≥n con videojuegos. Como se puede observar no hay un rating expl√≠cito, sino uno impl√≠cito a calcular, que ser√° parte de su trabajo (deber√°n descubrir que caracter√≠stica les puede dar informaci√≥n que puede ser equivalente a un rating).

In [14]:
reviews_df = pd.read_json("./data/steam/reviews.json.gz")
reviews_df.sample(15)

Unnamed: 0,username,product_id,page_order,text,hours,products,date,early_access,page,compensation,found_funny,user_id
478742,Credit to Ding,440,3,I like the part where i get bodyshot by a ftp ...,1723.1,155.0,2016-03-26,False,9039,,1.0,
111904,Big Smoke,382490,9,Too addicting,4.2,30.0,2015-09-01,False,182,,,7.65612e+16
581076,ƒêa–òy \ Exams Yay /,409710,5,Already made a review for Bioshock several yea...,1.6,89.0,2017-11-27,False,20,,,
14722,MitziWho,270210,3,I love this game you can play songs from your ...,19.1,95.0,2014-03-15,True,185,,1.0,7.65612e+16
699125,Toxic Larva,434420,9,I don't recommend this game,3.4,561.0,2017-01-08,False,13,,,7.65612e+16
142218,Haduu,236450,8,BEAST GAME 1000000 HIGH SCORE GET ON MY LEVEL ...,0.3,109.0,2014-09-22,False,287,,,7.65612e+16
515030,cvblade,48220,3,"Better then 5, Leaps and bounds better than 4,...",8.2,181.0,2015-01-26,False,117,,,7.65612e+16
542526,professor_chaos_,291650,4,"This game has so much potential, but the comba...",11.6,82.0,2016-01-15,False,302,,,7.65612e+16
469105,nilssonbst,620,3,"***REVIEW FOR MAC***\nModel\nMac Air, Mid-2012...",2.2,142.0,2015-01-30,False,3142,,,
131097,ixurge,296050,4,played 15 minutes.\nbored to death.,0.2,829.0,2016-01-22,False,8,,,


Descripcion del Dataset:  
* hours - horas jugadas antes de emitir el review
* early_access - si se trata de una version early access del producto
**Early Access is not meant to be a form of pre-purchase, but a tool to get your game in front of Steam users and gather feedback while finishing your game. Early Access titles must deliver a playable game or usable software to the customer at the time of purchase, while pre-purchase games are delivered at a future date.**
* found_funny - otros usuarios encontraron el review divertido
* compensation - 
* products - indica los diferentes juegos q tiene la cuenta en su libreria

In [15]:
user = 'TheDoofMoments'
reviews_df[ reviews_df['username'] == user ]

Unnamed: 0,username,product_id,page_order,text,hours,products,date,early_access,page,compensation,found_funny,user_id
127491,TheDoofMoments,200210,0,I like this game,625.3,65.0,2017-07-02,False,150,,,


In [16]:
user = 'SPejsMan'
reviews_df[ reviews_df['username'] == user ]

Unnamed: 0,username,product_id,page_order,text,hours,products,date,early_access,page,compensation,found_funny,user_id
0,SPejsMan,227940,0,Just one word... Balance!,23.0,92.0,2015-02-25,True,3159,,,
87241,SPejsMan,282070,7,It is good to be bad.,13.8,92.0,2016-11-26,False,459,,,


In [17]:
user = 'Agron Bulchard-Chataeu'
reviews_df[ reviews_df['username'] == user ]

Unnamed: 0,username,product_id,page_order,text,hours,products,date,early_access,page,compensation,found_funny,user_id


Veamos un juego en particular __Total War‚Ñ¢: ROME II - Emperor Edition__  
y los reviews que tuvo...  


In [18]:
games_df[ games_df ['id'] == 214950 ]

Unnamed: 0,publisher,genres,app_name,title,release_date,tags,discount_price,specs,price,early_access,id,developer,sentiment,metascore
30024,SEGA,[Strategy],Total War‚Ñ¢: ROME II - Emperor Edition,Total War‚Ñ¢: ROME II - Emperor Edition,2013-09-02,"[Strategy, Historical, Turn-Based Strategy, Gr...",,"[Single-player, Multi-player, Steam Trading Ca...",59.95,False,214950,Creative Assembly,Mostly Positive,76


In [19]:
roma_reviews_df = reviews_df [ reviews_df['product_id'] == 214950]
roma_reviews_df

Unnamed: 0,username,product_id,page_order,text,hours,products,date,early_access,page,compensation,found_funny,user_id
7,CaptainPlanet,214950,1,"If you like slaughtering in the name of Rome, ...",203.3,274.0,2016-10-27,False,285,,,7.656120e+16
1449,Ciddie Fobbler,214950,5,Does ok at the things that are ok,130.6,115.0,2015-08-08,False,621,,,
1896,turtleownage,214950,1,"Super fun, battles are epic and a treat to wat...",7.1,40.0,2017-10-25,False,78,,,7.656120e+16
2259,battlegalactika,214950,8,The combat is simply broken\nIt doesnt feel li...,76.9,5.0,2017-05-12,False,153,,,7.656120e+16
2468,Terminally Chill‚Ñ¢,214950,8,Stiltzkin: are you high\nStiltzkin: youve been...,43.0,17.0,2015-02-15,False,816,,10.0,
...,...,...,...,...,...,...,...,...,...,...,...,...
697932,sosacharly30,214950,1,WORSE‚ô•‚ô•‚ô•‚ô•‚ô•‚ô•I HAVE EVER PLAYED BY CA I SWEAR SL...,805.1,23.0,2014-08-01,False,1195,,,7.656120e+16
698520,DrFDisK,214950,1,Worse than I expected. Still not a finished ga...,6.1,265.0,2015-06-28,False,664,,,
699106,AMP2533,214950,1,I personally like this game but after having s...,96.4,95.0,2016-02-10,False,467,,,
699216,Adjusted for Infeeltion,214950,4,A great strategy game built on the reliable to...,70.8,170.0,2016-11-23,False,266,,,


In [20]:
user = 'sosacharly30'
reviews_df[ reviews_df['username'] == user ].text

697932    WORSE‚ô•‚ô•‚ô•‚ô•‚ô•‚ô•I HAVE EVER PLAYED BY CA I SWEAR SL...
Name: text, dtype: object

In [21]:
roma_reviews_df.username.value_counts()

Caesar                   2
bart.es                  2
Fuckmuffin               2
Sitout.AmnesiacGaming    2
Gutenisse                2
                        ..
Anonymous                1
jhtheone13030            1
bigh100                  1
Tobias F√ºnke             1
AMP2533                  1
Name: username, Length: 1228, dtype: int64

In [22]:
user = 'Caesar'
roma_reviews_df[ roma_reviews_df['username'] == user ]

Unnamed: 0,username,product_id,page_order,text,hours,products,date,early_access,page,compensation,found_funny,user_id
209383,Caesar,214950,4,can't compare to rome 1,686.2,36.0,2014-09-29,False,1081,,,
647322,Caesar,214950,4,I love everything simply because of the mod ba...,495.3,86.0,2014-09-20,False,1117,,,


In [23]:
len(reviews_df)

700000

In [24]:
reviews_df.describe()

Unnamed: 0,product_id,page_order,hours,products,page,found_funny,user_id
count,700000.0,700000.0,697558.0,698708.0,700000.0,107375.0,285587.0
mean,251130.840463,4.485711,111.498797,236.192551,890.90857,7.738217,7.65612e+16
std,150044.054746,2.875279,385.359458,485.33717,1923.769739,71.931147,100973000.0
min,10.0,0.0,0.0,1.0,1.0,1.0,7.65612e+16
25%,203770.0,2.0,4.0,45.0,52.0,1.0,7.65612e+16
50%,252490.0,4.0,15.2,110.0,237.0,1.0,7.65612e+16
75%,346110.0,7.0,59.3,246.0,828.0,2.0,7.65612e+16
max,773900.0,9.0,18570.9,12832.0,18371.0,6956.0,7.65612e+16


In [25]:
reviews_df.dtypes

username                object
product_id               int64
page_order               int64
text                    object
hours                  float64
products               float64
date            datetime64[ns]
early_access              bool
page                     int64
compensation            object
found_funny            float64
user_id                float64
dtype: object

In [26]:
reviews_df.compensation.value_counts()

Product received for free    13286
Name: compensation, dtype: int64

#### Juegos con mas reviews

In [27]:
reviews_df.groupby('product_id').agg({'page_order':'count'}).sort_values('page_order', ascending=False).head()

Unnamed: 0_level_0,page_order
product_id,Unnamed: 1_level_1
440,16570
252490,9161
49520,6349
377160,6335
271590,5239


In [28]:
reviews_df.found_funny.value_counts()

1.0       62816
2.0       17737
3.0        7511
4.0        4259
5.0        2585
          ...  
285.0         1
1153.0        1
1170.0        1
432.0         1
253.0         1
Name: found_funny, Length: 589, dtype: int64

## Ejercicio 2 - Sistema de Recomendaci√≥n Basado en Usuarios

Este sistema de recomendaci√≥n deber√° entrenar un algoritmo y desarrollar una interfaz que, dado un usuario, le devuelva una lista con los juegos m√°s recomendados.

In [29]:
#cant_horas_user = reviews_df.groupby(['username']).agg( cant_horas = pd.NamedAgg(column='hours', aggfunc=sum))
cant_horas_user = reviews_df.groupby(['username']).hours.sum()
cant_horas_user

username
!                      123.3
!             *          5.6
! 5tryx                  0.9
! DeadlyDeal !           6.3
! Indelible             58.3
                       ...  
Û∞ÄàSpectraÛ∞Äà                4.9
Û∞Äç loopuleasa Û∞Äç          96.9
Û∞ÄìOxymoronicphalanxÛ∞Äì      0.4
Û∞Äï–°–≠–ù–°Û∞Äï                  21.1
Û∞Äó Lolicage               7.5
Name: hours, Length: 495383, dtype: float64

In [30]:
reviews_df['cant_horas_user'] = reviews_df['username'].apply(lambda x: cant_horas_user[x] 
                                                          if x in cant_horas_user else -1)


In [31]:
reviews_df[ reviews_df['username'] == 'Caesar' ]


Unnamed: 0,username,product_id,page_order,text,hours,products,date,early_access,page,compensation,found_funny,user_id,cant_horas_user
77242,Caesar,221100,5,"Forever ""Early Access"".",67.0,21.0,2016-02-01,True,2517,,,7.65612e+16,1723.3
195973,Caesar,327070,0,"Game's unplayable (I know it's EA, but still) ...",1.3,280.0,2016-07-08,True,73,,,,1723.3
209383,Caesar,214950,4,can't compare to rome 1,686.2,36.0,2014-09-29,False,1081,,,,1723.3
235587,Caesar,219640,8,"Total loss of money,time and resources.",5.1,88.0,2017-03-27,False,290,,,7.65612e+16,1723.3
275411,Caesar,204100,0,Good pc port I can max settings @ 1080p with 2...,9.5,48.0,2015-01-31,False,420,,,,1723.3
285946,Caesar,441790,0,For an EA this game is fun. Graphics are amazi...,81.8,37.0,2016-04-30,True,46,,,,1723.3
298652,Caesar,588430,7,pay to win.,3.3,76.0,2017-06-20,False,249,,,,1723.3
349061,Caesar,223850,3,The ultimate pay to win game.\n10/10,19.3,183.0,2017-04-06,False,18,,3.0,,1723.3
397510,Caesar,582660,8,Incredible design and textures. Game has great...,120.5,92.0,2017-07-04,False,346,,,7.65612e+16,1723.3
408255,Caesar,204300,8,Awesomenauts is a side-scrolling 3v3 DOTA-styl...,100.3,76.0,2015-05-14,False,545,,,,1723.3


In [32]:
reviews_df['rating'] = reviews_df['hours'] / reviews_df['cant_horas_user']


In [33]:
reviews_df[ reviews_df['username'] == 'Caesar' ]


Unnamed: 0,username,product_id,page_order,text,hours,products,date,early_access,page,compensation,found_funny,user_id,cant_horas_user,rating
77242,Caesar,221100,5,"Forever ""Early Access"".",67.0,21.0,2016-02-01,True,2517,,,7.65612e+16,1723.3,0.038879
195973,Caesar,327070,0,"Game's unplayable (I know it's EA, but still) ...",1.3,280.0,2016-07-08,True,73,,,,1723.3,0.000754
209383,Caesar,214950,4,can't compare to rome 1,686.2,36.0,2014-09-29,False,1081,,,,1723.3,0.39819
235587,Caesar,219640,8,"Total loss of money,time and resources.",5.1,88.0,2017-03-27,False,290,,,7.65612e+16,1723.3,0.002959
275411,Caesar,204100,0,Good pc port I can max settings @ 1080p with 2...,9.5,48.0,2015-01-31,False,420,,,,1723.3,0.005513
285946,Caesar,441790,0,For an EA this game is fun. Graphics are amazi...,81.8,37.0,2016-04-30,True,46,,,,1723.3,0.047467
298652,Caesar,588430,7,pay to win.,3.3,76.0,2017-06-20,False,249,,,,1723.3,0.001915
349061,Caesar,223850,3,The ultimate pay to win game.\n10/10,19.3,183.0,2017-04-06,False,18,,3.0,,1723.3,0.011199
397510,Caesar,582660,8,Incredible design and textures. Game has great...,120.5,92.0,2017-07-04,False,346,,,7.65612e+16,1723.3,0.069924
408255,Caesar,204300,8,Awesomenauts is a side-scrolling 3v3 DOTA-styl...,100.3,76.0,2015-05-14,False,545,,,,1723.3,0.058202


Verificamos que no hayan quedado valores nulos en el rating que calculamos

In [34]:
reviews_df['rating'].isnull().sum()

2616

Vamos a reemplazar los nulos por 0

In [35]:
import numpy as np

reviews_df['rating'] = reviews_df['rating'].replace(to_replace = np.nan, value =0 )

In [36]:
reviews_df['rating'].isnull().sum()

0

In [37]:
import pandas as pd
import seaborn as sns

ax = sns.distplot(reviews_df['rating'])

### Armamos la matriz de usuario-contenido

In [38]:
muc_df = reviews_df[ ['username','product_id','rating'] ]
muc_df.head()

Unnamed: 0,username,product_id,rating
0,SPejsMan,227940,0.625
1,Spodermen,270170,0.308176
2,josh,41700,0.052637
3,Sammyrism,332310,0.054054
4,moonmirroir,303210,1.0


In [39]:
len(reviews_df)

700000

In [40]:
len(muc_df)

700000

In [41]:
muc_df = muc_df.sample(5000)

In [42]:
muc_df.head()

Unnamed: 0,username,product_id,rating
444663,S-Zachel,428750,1.0
468004,igrvak,427520,1.0
399901,Johnny Hammerstix,346110,0.920511
135067,Tord,232430,1.0
349192,King Ling Ming Chigga Ting II,8930,0.952644


In [43]:
from surprise import Dataset, Reader, KNNWithMeans, BaselineOnly
from surprise.accuracy import rmse
from surprise.model_selection import cross_validate, train_test_split


In [44]:
#ratings_train, ratings_test = train_test_split(ratings, test_size=0.2)

# model = KNNWithMeans(k=3).fit( ratings.build_full_trainset() )

reader = Reader(rating_scale=(muc_df.rating.min(), muc_df.rating.max()))

ratings = Dataset.load_from_df( muc_df[["username", "product_id", "rating"]], reader)

trainset = ratings.build_full_trainset()
                               
bsl_options = {'method': 'als',
               'n_epochs': 5,
               'reg_u': 12,
               'reg_i': 5
               }
model = BaselineOnly(bsl_options=bsl_options)

model.fit( trainset )


Estimating biases using als...


<surprise.prediction_algorithms.baseline_only.BaselineOnly at 0x126182fd0>

### Trabajamos en la funcion de recomendacion

In [48]:
from collections import defaultdict

def get_top_n(predictions, n=10):
    '''Return the top-N recommendation for each user from a set of predictions.

    Args:
        predictions(list of Prediction objects): The list of predictions, as
            returned by the test method of an algorithm.
        n(int): The number of recommendation to output for each user. Default
            is 10.

    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of size n.
    '''

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n


In [49]:
# Predict ratings for all pairs (u, i) that are NOT in the training set.

testset = trainset.build_anti_testset()
predictions = model.test(testset)

top_n = get_top_n(predictions, n=10)

# Print the recommended items for each user
for uid, user_ratings in top_n.items():
    print(uid, [iid for (iid, _) in user_ratings])

S-Zachel [346110, 346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700]
igrvak [346110, 346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700]
Johnny Hammerstix [346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700, 440]
Tord [346110, 346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700]
King Ling Ming Chigga Ting II [346110, 346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700]
yourmom260 [346110, 346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700]
UNLOLS [346110, 346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700]
‚ùÑÔº≥ÔΩÖÔΩíÔΩÖÔΩéÔΩâÔΩîÔΩô‚ùÑ [346110, 346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700]
Dropthegun [346110, 346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700]
OFFINE [346110, 346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700]
Jeton [346110, 346900, 440900, 231430, 363970, 4000, 214950, 226860, 316010, 48700]
fal

In [60]:
def recomend_games_for_user( username  ):
    
    return top_n[username]


In [61]:
recomend_games_for_user( 'S-Zachel' )

[(346110, 0.9237605965808111),
 (346900, 0.9120861331317033),
 (440900, 0.904827413142782),
 (231430, 0.8997169062284731),
 (363970, 0.8964866480534623),
 (4000, 0.8955804189012952),
 (214950, 0.8906655333454503),
 (226860, 0.8893000594882192),
 (316010, 0.8868024043135004),
 (48700, 0.8855440916283479)]

In [62]:
recomend_games_for_user( 'Tord' )

[(346110, 0.937597961445493),
 (346900, 0.9259234979963853),
 (440900, 0.9186647780074642),
 (231430, 0.9135542710931551),
 (363970, 0.9103240129181442),
 (4000, 0.9094177837659773),
 (214950, 0.9045028982101323),
 (226860, 0.9031374243529012),
 (316010, 0.9006397691781824),
 (48700, 0.8993814564930299)]

## Ejercicio 3 - Sistema de Recomendaci√≥n Basado en Juegos

Similar al caso anterior, con la diferencia de que este sistema espera como entrada el nombre de un juego y devuelve una lista de juegos similares. El sistema deber√° estar programado en base a informaci√≥n de contenido de los juegos (i.e. filtrado basado en contenido o sistema h√≠brido).

In [None]:
# Completa