# Introducción, objetivos y contenido

Este trabajo corresponde a la fase de ETL (Extraction, Transformation and Loading). El objetivo de esta fase es obtener datasets limpios y listos para ser utilizado en fases posteriores del proyecto. 

Contenidos:
* Importación de librerías
* Carga de datos
* Preparación de datos para cada dataset 
    * Ingeniería de características
    * Verificación de tipos de datos
    * Valores duplicados
    * Valores nulos
* Exportación de los datasets limpios
* Armado y exportación de dataframes para endpoints de API

# Importación de librerías

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as mpl
from matplotlib import pyplot as plt
from math import factorial
from scipy import stats as st
import json
import gzip
import ast
from pandas import json_normalize
from textblob import TextBlob
import re

# Carga de datos

Tenemos un total de 3 datasets en formato json comprimido, debido a ello haremos la carga de datos en forma separada para tomar los recaudos correspondientes.

Dataset GAMES: Este archivo ha sido posible cargarlo en formato jason descomprimido, por lo cual su código resulta simple.

In [2]:
steam_games = pd.read_json('steam_games.json', lines=True)

Dataset REVIEWS: Dado que este dataset tiene una estructura menos estandarizada, ha sido necesario cargarlo aplicando un código que estandarice cada línea del archivo. Luego esas líneas se incorporan como lista a una variable archivo que recopila los datos originales transformados.

In [3]:
dataset_list_reviews = []
with gzip.open('user_reviews.json.gz', 'rb') as file:
    for line in file:
        dataset_list_reviews.append(ast.literal_eval(line.decode('utf-8')))
user_reviews = pd.DataFrame(dataset_list_reviews)
file.close()

Dataset ITEMS: Dado que este dataset tiene una estructura menos estandarizada, ha sido necesario cargarlo aplicando un código que estandarice cada línea del archivo. Luego esas líneas se incorporan como lista a una variable archivo que recopila los datos originales transformados.

In [4]:
dataset_list_items = []
with gzip.open('users_items.json.gz', 'rb') as file:
    for line in file:
        dataset_list_items.append(ast.literal_eval(line.decode('utf-8')))
user_items = pd.DataFrame(dataset_list_items)
file.close()

# Preparación de datos

## Dataset GAMES

### Ingeniería de características - Dataset GAMES

In [5]:
df_games = steam_games
df_games.sample(2)

Unnamed: 0,publisher,genres,app_name,title,url,release_date,tags,reviews_url,specs,price,early_access,id,developer
95201,Flox Studios Ltd.,"[Action, Indie, RPG]",Frozen State,Frozen State,http://store.steampowered.com/app/270270/Froze...,2016-08-05,"[Survival, RPG, Horror, Zombies, Action, Craft...",http://steamcommunity.com/app/270270/reviews/?...,"[Single-player, Steam Achievements, Full contr...",11.99,0.0,270270.0,Flox Studios Ltd.
38369,,,,,,,,,,,,,


In [6]:
# Renombramiento del campo "id"
df_games.rename(columns={'id': 'item_id'}, inplace=True)

In [7]:
# Desagregación de campos cuyos valores son listas
df_games = df_games.explode('genres')
df_games = df_games.explode('tags')
df_games = df_games.explode('specs')
df_games.sample(5)

Unnamed: 0,publisher,genres,app_name,title,url,release_date,tags,reviews_url,specs,price,early_access,item_id,developer
116383,,RPG,Fantasy Grounds - C&C: A5 The Shattered Horn,Fantasy Grounds - C&amp;C: A5 The Shattered Horn,http://store.steampowered.com/app/332020/Fanta...,2014-11-04,Strategy,http://steamcommunity.com/app/332020/reviews/?...,Downloadable Content,3.99,0.0,332020.0,"SmiteWorks USA, LLC"
90737,Winter Wolves,Indie,Heileen 3: New Horizons,Heileen 3: New Horizons,http://store.steampowered.com/app/305490/Heile...,2012-12-12,Casual,http://steamcommunity.com/app/305490/reviews/?...,Single-player,24.99,0.0,305490.0,Winter Wolves
105719,Crayder Studios,Indie,Wishmere,Wishmere,http://store.steampowered.com/app/419020/Wishm...,2017-09-26,Pixel Graphics,http://steamcommunity.com/app/419020/reviews/?...,Steam Trading Cards,11.99,0.0,419020.0,Crayder Studios
103494,GamersHype Productions,Indie,Box Maze 2 - Unlock All Levels,Box Maze 2 - Unlock All Levels,http://store.steampowered.com/app/714320/Box_M...,2017-10-03,Adventure,http://steamcommunity.com/app/714320/reviews/?...,Steam Cloud,0.99,0.0,714320.0,GamersHype Productions
113036,Quandary Solutions LTD,Action,Dexterity Ball 3D™,Dexterity Ball 3D™,http://store.steampowered.com/app/403680/Dexte...,2015-12-07,Multiplayer,http://steamcommunity.com/app/403680/reviews/?...,Multi-player,6.99,0.0,403680.0,Quandary Solutions LTD


In [8]:
# Agregación del campo "year"
default_date = pd.to_datetime('1900-01-01')  # Imputar un valor predeterminado en lugar de los valores no válidos en 'release_date'
df_games['release_date'] = pd.to_datetime(df_games['release_date'], errors='coerce').fillna(default_date)

df_games['release_date'] = pd.to_datetime(df_games['release_date'])     # Convertir la columna 'release_date' a objetos de fecha y hora
df_games['year'] = df_games['release_date'].dt.year
df_games = df_games[df_games['year'] != 1900]
df_games.sample(5)

Unnamed: 0,publisher,genres,app_name,title,url,release_date,tags,reviews_url,specs,price,early_access,item_id,developer,year
119459,"Devolver Digital, Croteam",Indie,Serious Sam Double D XXL,Serious Sam Double D XXL,http://store.steampowered.com/app/111600/Serio...,2011-08-30,Gore,http://steamcommunity.com/app/111600/reviews/?...,Local Co-op,9.99,0.0,111600.0,Mommy's Best Games,2011
107756,Salmi Games,Indie,Ellipsis,Ellipsis,http://store.steampowered.com/app/514620/Ellip...,2017-01-25,Bullet Hell,http://steamcommunity.com/app/514620/reviews/?...,Full controller support,9.99,0.0,514620.0,Salmi Games,2017
100990,Nezon Production,Action,Formata,Formata,http://store.steampowered.com/app/580040/Formata/,2017-12-07,War,http://steamcommunity.com/app/580040/reviews/?...,Steam Achievements,14.99,0.0,580040.0,Nezon Production,2017
118606,The Men Who Wear Many Hats,Strategy,Organ Trail: Director's Cut,Organ Trail: Director's Cut,http://store.steampowered.com/app/233740/Organ...,2013-03-19,Horror,http://steamcommunity.com/app/233740/reviews/?...,Steam Achievements,4.99,0.0,233740.0,The Men Who Wear Many Hats,2013
90484,The Behemoth,Action,BattleBlock Theater®,BattleBlock Theater®,http://store.steampowered.com/app/238460/Battl...,2014-05-15,Singleplayer,http://steamcommunity.com/app/238460/reviews/?...,Full controller support,14.99,0.0,238460.0,The Behemoth,2014


In [9]:
# Eliminación de campos que no serán utilizados
#df_games_eliminarcampos = ['url', 'title','release_date', 'reviews_url', 'specs', ]
# df_games = df_games.drop(df_games_eliminarcampos, axis=1)

In [10]:
# Filtrado de campos a utilizar
df_games = df_games[['item_id', 'app_name', 'genres', 'year', 'price', 'developer']]
df_games.sample(5)

Unnamed: 0,item_id,app_name,genres,year,price,developer
115137,367380.0,One Manga Day - Bonus Content,Indie,2015,3.99,DeXP
113483,390920.0,Army of Pixels,Strategy,2015,4.99,"Gergely Zsolnay,Richard Markos"
108658,559340.0,Sunset Rangers,Early Access,2016,14.99,www.fishermangamedev.com
103648,717030.0,Kritika Online: Free Elite Player's Pack,RPG,2017,Free,ALLM
117615,265930.0,Goat Simulator,Simulation,2014,9.99,Coffee Stain Studios


### Verificación de tipos de datos - Dataset GAMES

In [11]:
df_games.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1983295 entries, 88310 to 120443
Data columns (total 6 columns):
 #   Column     Dtype  
---  ------     -----  
 0   item_id    float64
 1   app_name   object 
 2   genres     object 
 3   year       int32  
 4   price      object 
 5   developer  object 
dtypes: float64(1), int32(1), object(4)
memory usage: 98.4+ MB


In [12]:
# Conversión de tipos de datos

df_games['price'] = pd.to_numeric(df_games['price'], errors='coerce')  # Conversión a tipo numérico, forzando los errores a NaN
#df_games['early_access'] = df_games['early_access'].astype(bool)       # Conversión a tipo booleano
df_games['item_id'] = pd.to_numeric(df_games['item_id'], errors='coerce')        # Conversión a tipo entero, forzando los errores a NaN
df_games['item_id'].fillna(0, inplace=True)
df_games['item_id'] = df_games['item_id'].astype(int)
df_games.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1983295 entries, 88310 to 120443
Data columns (total 6 columns):
 #   Column     Dtype  
---  ------     -----  
 0   item_id    int64  
 1   app_name   object 
 2   genres     object 
 3   year       int32  
 4   price      float64
 5   developer  object 
dtypes: float64(1), int32(1), int64(1), object(3)
memory usage: 98.4+ MB


In [13]:
df_games.sample(5)

Unnamed: 0,item_id,app_name,genres,year,price,developer
114686,369560,The Story Goes On,Action,2015,4.99,Scarecrow Arts
117928,261400,Episode 11 - King Midas,Casual,2014,0.99,Spicyhorse
108518,540330,Hunger Dungeon Deluxe Edition + Sound Track,Action,2016,11.99,Buka Game Studio
118532,227400,Darkfall Unholy Wars,Indie,2013,,Aventurine SA
95860,507010,Mr.President!,Adventure,2016,4.99,Game Developer X


### Verficación de valores duplicados - Dataset GAMES

In [14]:
df_games.duplicated().sum()

1911358

In [15]:
df_games = df_games.drop_duplicates().reset_index(drop=True)
df_games = df_games.drop_duplicates(subset=['item_id'])     # Eliminar filas cuyo campo "item_id" tiene duplicados
df_games.info()

<class 'pandas.core.frame.DataFrame'>
Index: 29782 entries, 0 to 71935
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   item_id    29782 non-null  int64  
 1   app_name   29781 non-null  object 
 2   genres     28548 non-null  object 
 3   year       29782 non-null  int32  
 4   price      27210 non-null  float64
 5   developer  28532 non-null  object 
dtypes: float64(1), int32(1), int64(1), object(3)
memory usage: 1.5+ MB


### Verificación de valores nulos - Dataset GAMES

In [16]:
df_games.isnull().sum()

item_id         0
app_name        1
genres       1234
year            0
price        2572
developer    1250
dtype: int64

En este marco es difícil establecer un patrón de valores nulos que nos permita comprender la razón de su existencia. Considerando los efectos para los que es necesario este dataset, optaremos por eliminar los registros cuyos campos prinicipales del dataset presenten valores nulos.

In [17]:
df_games = df_games.dropna(subset=['genres', 'app_name'])

In [18]:
df_games.isnull().sum()

item_id         0
app_name        0
genres          0
year            0
price        2502
developer     169
dtype: int64

## Dataset REVIEWS

### Ingeniería de características - Dataset REVIEWS

In [19]:
df_reviews = user_reviews
df_reviews.sample(2)

Unnamed: 0,user_id,user_url,reviews
10839,76561198052516588,http://steamcommunity.com/profiles/76561198052...,"[{'funny': '1 person found this review funny',..."
24569,76561198094722660,http://steamcommunity.com/profiles/76561198094...,"[{'funny': '', 'posted': 'Posted June 14, 2014..."


In [20]:
# Desagregación del campo "reviews"

df_reviews = user_reviews.explode('reviews')
df_reviews = pd.concat([df_reviews.drop(['reviews'], axis=1), df_reviews['reviews'].apply(pd.Series)], axis=1)
df_reviews.sample(5)

Unnamed: 0,user_id,user_url,funny,posted,last_edited,item_id,helpful,recommend,review,0
772,76561198038374904,http://steamcommunity.com/profiles/76561198038...,32 people found this review funny,Posted June 18.,,450660,83 of 106 people (78%) found this review helpful,True,You get to play as Ron Pearlman.What more do y...,
7203,76561198049796309,http://steamcommunity.com/profiles/76561198049...,,"Posted November 29, 2014.",,298630,No ratings yet,True,Bloody Brilliant little game and has already t...,
19538,dearcatherine,http://steamcommunity.com/id/dearcatherine,,"Posted June 20, 2015.",,293680,2 of 2 people (100%) found this review helpful,True,"95% dialogue, would like to see more animation...",
1375,lazrknight,http://steamcommunity.com/id/lazrknight,,"Posted May 3, 2014.",,72850,2 of 3 people (67%) found this review helpful,True,A game of pure epicness. Check it out if u hav...,
17629,STEAM0082987612,http://steamcommunity.com/id/STEAM0082987612,,"Posted May 26, 2014.",,4000,0 of 2 people (0%) found this review helpful,True,"If you don't own this game, I feel sorry for you",


In [21]:
# Agregación de los campos "date" y "year"

def extract_posted_date(posted_str):         # Función para extraer la fecha del campo "posted"
    pattern = r'Posted (\w+ \d{1,2}, \d{4})' # Definición del patrón observado
    match = re.search(pattern, posted_str)
    if match:
        return match.group(1)
    else:
        return None

# Aplicar la función para extraer la fecha del campo "posted"
df_reviews['posted_date'] = df_reviews['posted'].apply(lambda x: np.nan if pd.isna(x) else extract_posted_date(x))

df_reviews['posted_date'] = pd.to_datetime(df_reviews['posted_date'])
df_reviews['year'] = df_reviews['posted_date'].dt.year
df_reviews['year'] = df_reviews['year'].fillna(0)
df_reviews['year'] = df_reviews['year'].astype(int)

df_reviews.sample(2)

Unnamed: 0,user_id,user_url,funny,posted,last_edited,item_id,helpful,recommend,review,0,posted_date,year
21012,maarkthe,http://steamcommunity.com/id/maarkthe,,"Posted June 26, 2014.",,8930,No ratings yet,True,Esse é um jogo de estratégia nota 10 para se j...,,2014-06-26,2014
2119,fgtkms,http://steamcommunity.com/id/fgtkms,1 person found this review funny,"Posted January 22, 2015.",,222880,1 of 1 people (100%) found this review helpful,True,Good Game. 11/10 IGN,,2015-01-22,2015


In [22]:
# Análisis de sentimientos a partir del campo "review"

df_reviews['review'] = df_reviews['review'].astype(str)
df_reviews['polarity'] = df_reviews['review'].apply(lambda text: TextBlob(text).sentiment.polarity)
df_reviews['sentiment'] = pd.cut(df_reviews['polarity'], bins=[-float('inf'), -0.001, 0.0, float('inf')], labels=[0, 1, 2])

In [23]:
df_reviews.sample(5)

Unnamed: 0,user_id,user_url,funny,posted,last_edited,item_id,helpful,recommend,review,0,posted_date,year,polarity,sentiment
17818,qwertysmosh,http://steamcommunity.com/id/qwertysmosh,,Posted July 26.,,730,No ratings yet,True,Great game of my late great great grand dad.Pl...,,NaT,0,0.255579,2
8097,76561198084406709,http://steamcommunity.com/profiles/76561198084...,,Posted August 29.,,264710,No ratings yet,True,"Really like the game, could have a lower level...",,NaT,0,-0.1,0
7244,23745736,http://steamcommunity.com/id/23745736,,"Posted June 15, 2015.",,252490,No ratings yet,True,just one thing. LAGGGGGGGGGGGGGGGGGGGGGGGGGGGG...,,2015-06-15,2015,0.0,1
6452,nova_prime,http://steamcommunity.com/id/nova_prime,,"Posted December 21, 2014.",Last edited January 13.,230410,2 of 4 people (50%) found this review helpful,True,"Good gunplay, good frames, fun abilities, tons...",,2014-12-21,2014,0.20404,2
21662,coil71,http://steamcommunity.com/id/coil71,1 person found this review funny,"Posted December 4, 2013.",,230410,2 of 4 people (50%) found this review helpful,True,I like Cats,,2013-12-04,2013,0.0,1


In [24]:
# Filtrado de campos a utilizar
df_reviews = df_reviews[['item_id', 'user_id', 'recommend', 'year', 'polarity', 'sentiment']]
df_reviews.sample(5)

Unnamed: 0,item_id,user_id,recommend,year,polarity,sentiment
6096,230410,76561198094782691,True,2014,0.15,2
16391,313120,76561198020061745,True,2015,-0.4,0
4310,401920,MattBowles,True,2015,0.0,1
24062,4000,mog1,True,2014,0.35,2
18492,220,76561197977981554,True,0,0.0,1


### Verificación de tipos de datos - Dataset REVIEWS

In [25]:
df_reviews.info()

<class 'pandas.core.frame.DataFrame'>
Index: 59333 entries, 0 to 25798
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   item_id    59305 non-null  object  
 1   user_id    59333 non-null  object  
 2   recommend  59305 non-null  object  
 3   year       59333 non-null  int64   
 4   polarity   59333 non-null  float64 
 5   sentiment  59333 non-null  category
dtypes: category(1), float64(1), int64(1), object(3)
memory usage: 2.8+ MB


In [26]:
# Conversión de tipos de datos
df_reviews['item_id'] = pd.to_numeric(df_reviews['item_id'], errors='coerce')
df_reviews['recommend'] = df_reviews['recommend'].astype(bool)
df_reviews['sentiment'] = pd.to_numeric(df_reviews['sentiment'], errors='coerce')
df_reviews.info()

<class 'pandas.core.frame.DataFrame'>
Index: 59333 entries, 0 to 25798
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   item_id    59305 non-null  float64
 1   user_id    59333 non-null  object 
 2   recommend  59333 non-null  bool   
 3   year       59333 non-null  int64  
 4   polarity   59333 non-null  float64
 5   sentiment  59333 non-null  int64  
dtypes: bool(1), float64(2), int64(2), object(1)
memory usage: 2.8+ MB


### Verificación de valores duplicados - Dataset REVIEWS

In [27]:
df_reviews.duplicated().sum()

874

In [28]:
# Eliminación de duplicados
df_reviews = df_reviews.drop_duplicates().reset_index(drop=True)
df_reviews.duplicated().sum()

0

### Verificación de valores nulos - Dataset REVIEWS

In [29]:
df_reviews.isnull().sum()

item_id      28
user_id       0
recommend     0
year          0
polarity      0
sentiment     0
dtype: int64

In [30]:
df_reviews = df_reviews.dropna(subset=['item_id'])
df_reviews.isnull().sum()

item_id      0
user_id      0
recommend    0
year         0
polarity     0
sentiment    0
dtype: int64

## Dataset USAGE

### Ingeniería de características - Dataset USAGE

In [31]:
df_usage = user_items
df_usage.sample(2)

Unnamed: 0,user_id,items_count,steam_id,user_url,items
71679,76561198077073196,4,76561198077073196,http://steamcommunity.com/profiles/76561198077...,"[{'item_id': '4000', 'item_name': 'Garry's Mod..."
56116,76561198055786959,19,76561198055786959,http://steamcommunity.com/profiles/76561198055...,"[{'item_id': '6060', 'item_name': 'STAR WARS™ ..."


In [32]:
# Desagregación del campo "items"

df_usage = user_items.explode('items')

df_usage = df_usage.reset_index(drop=True)
def obtener_elemento(diccionario, clave_busqueda):
    if isinstance(diccionario, dict):
        return diccionario.get(clave_busqueda)
    else:
        return diccionario

# Desaagregaremos cada campo por separado para evitar tiempos excesivos de procesamiento
df_usage['item_id'] = df_usage['items'].apply(lambda x: obtener_elemento(x, 'item_id'))
df_usage['item_name'] = df_usage['items'].apply(lambda x: obtener_elemento(x, 'item_name'))
df_usage['playtime_forever'] = df_usage['items'].apply(lambda x: obtener_elemento(x, 'playtime_forever'))
df_usage['playtime_2weeks'] = df_usage['items'].apply(lambda x: obtener_elemento(x, 'playtime_2weeks'))

df_usage.sample(2)

Unnamed: 0,user_id,items_count,steam_id,user_url,items,item_id,item_name,playtime_forever,playtime_2weeks
3573797,76561197971402201,50,76561197971402201,http://steamcommunity.com/profiles/76561197971...,"{'item_id': '265930', 'item_name': 'Goat Simul...",265930,Goat Simulator,290.0,0.0
922600,76561198041784567,133,76561198041784567,http://steamcommunity.com/profiles/76561198041...,"{'item_id': '47810', 'item_name': 'Dragon Age:...",47810,Dragon Age: Origins - Ultimate Edition,71.0,0.0


In [33]:
# Filtrar campos a utilizar

df_usage = df_usage[['item_id', 'user_id', 'playtime_forever']]
df_usage.sample(2)

Unnamed: 0,item_id,user_id,playtime_forever
378357,440950,rawrvixen,0.0
4538272,253980,Aleatorias,0.0


### Verificación de tipos de datos - Dataset USAGE


In [34]:
df_usage.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5170015 entries, 0 to 5170014
Data columns (total 3 columns):
 #   Column            Dtype  
---  ------            -----  
 0   item_id           object 
 1   user_id           object 
 2   playtime_forever  float64
dtypes: float64(1), object(2)
memory usage: 118.3+ MB


In [35]:
# Conversión de tipos de datos
df_usage['item_id'] = pd.to_numeric(df_usage['item_id'], errors='coerce')
df_usage.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5170015 entries, 0 to 5170014
Data columns (total 3 columns):
 #   Column            Dtype  
---  ------            -----  
 0   item_id           float64
 1   user_id           object 
 2   playtime_forever  float64
dtypes: float64(2), object(1)
memory usage: 118.3+ MB


### Verficación de valores duplicados - Dataset USAGE

In [36]:
df_usage.duplicated().sum()

59209

In [37]:
# Eliminación de duplicados
df_usage = df_usage.drop_duplicates().reset_index(drop=True)
df_usage.duplicated().sum()

0

### Verificación de valores nulos - Dataset USAGE

In [38]:
df_usage.isnull().sum()

item_id             16714
user_id                 0
playtime_forever    16714
dtype: int64

In [39]:
df_usage = df_usage.dropna(subset=['item_id'])
df_usage.isnull().sum()

item_id             0
user_id             0
playtime_forever    0
dtype: int64

# Exportación de datasets limpios

In [40]:
df_games.to_csv('df_games.csv', index=False)
df_reviews.to_csv('df_reviews.csv', index=False)
df_usage.to_csv('df_usage.csv', index=False)

# Armado y exportación de dataframes para endpoints

## Samples

In [41]:
df_games.sample(2)

Unnamed: 0,item_id,app_name,genres,year,price,developer
49937,510050,You Have 10 Seconds,Free to Play,2016,,tamationgames
48706,529190,On A Roll 3D - Soundtrack,Action,2016,,Battenberg Software


In [42]:
df_reviews.sample(2)

Unnamed: 0,item_id,user_id,recommend,year,polarity,sentiment
44141,93200.0,burningfeetman,True,2011,0.23,2
10838,440.0,YumiHayashibiara,True,2014,-0.4,0


In [43]:
df_usage.sample(2)

Unnamed: 0,item_id,user_id,playtime_forever
2985623,281370.0,mixadance,31.0
1788295,220260.0,purplecubefruit,0.0


## Endpoint 1


def PlayTimeGenre(genre: str): Debe devolver año con mas horas jugadas para dicho género.

Ejemplo de retorno: {"Año de lanzamiento con más horas jugadas para Género X" : 2013}

In [44]:
df_e1 = pd.merge(df_games, df_usage, on="item_id", how="inner")
df_e1 = df_e1[['genres', 'year','playtime_forever']]

# Obtener el índice de la fila con el máximo valor de playtime_forever para cada género
df_e1_indmax = df_e1.groupby('genres')['playtime_forever'].idxmax()
# Usar los índices para obtener los años correspondientes
df_e1 = df_e1.loc[df_e1_indmax, ['genres', 'year', 'playtime_forever']]
# Mostrar los años con el máximo playtime_forever por género

df_e1 = df_e1[['genres', 'year']]
df_e1.to_csv('df_e1.csv', index=False)
df_e1

Unnamed: 0,genres,year
847814,Action,2012
1363800,Adventure,2015
1346922,Animation &amp; Modeling,2015
2389134,Audio Production,2014
3381479,Casual,2011
2905284,Design &amp; Illustration,2012
2273948,Early Access,2014
1054235,Education,2014
3253181,Free to Play,2012
27418,Indie,2006


## Endpoint 2

def UserForGenre( genero : str ): Debe devolver el usuario que acumula más horas jugadas para el género dado y una lista de la acumulación de horas jugadas por año.

Ejemplo de retorno: {"Usuario con más horas jugadas para Género X" : us213ndjss09sdf, "Horas jugadas":[{Año: 2013, Horas: 203}, {Año: 2012, Horas: 100}, {Año: 2011, Horas: 23}]}

In [45]:
df_e2 = pd.merge(df_games, df_usage, on="item_id", how="inner")
df_e2 = df_e2[['genres', 'year', 'user_id','playtime_forever']]

# Obtener el índice de la fila con el máximo valor de playtime_forever para cada género
df_e2_indmax = df_e2.groupby('genres')['playtime_forever'].idxmax()
# Usar los índices para obtener los años correspondientes
df_e2 = df_e2.loc[df_e2_indmax, ['genres', 'year', 'user_id', 'playtime_forever']]



In [46]:
df_e2_users = df_e2[['genres', 'user_id']]
df_e2_users.to_csv('df_e2_users.csv', index=False)
df_e2_users

Unnamed: 0,genres,user_id
847814,Action,Evilutional
1363800,Adventure,idonothack
1346922,Animation &amp; Modeling,ScottyG555
2389134,Audio Production,Lickidactyl
3381479,Casual,tsunamitad
2905284,Design &amp; Illustration,76561198035718256
2273948,Early Access,76561198084846677
1054235,Education,SeedyDog
3253181,Free to Play,76561198063368177
27418,Indie,wolop


In [47]:
df_e2_playtime = df_e2[['genres', 'year', 'user_id', 'playtime_forever']]
df_e2_playtime.to_csv('df_e2_playtime.csv', index=False)
df_e2_playtime

Unnamed: 0,genres,year,user_id,playtime_forever
847814,Action,2012,Evilutional,635295.0
1363800,Adventure,2015,idonothack,333482.0
1346922,Animation &amp; Modeling,2015,ScottyG555,168314.0
2389134,Audio Production,2014,Lickidactyl,109916.0
3381479,Casual,2011,tsunamitad,600068.0
2905284,Design &amp; Illustration,2012,76561198035718256,102554.0
2273948,Early Access,2014,76561198084846677,1241.0
1054235,Education,2014,SeedyDog,3082.0
3253181,Free to Play,2012,76561198063368177,439912.0
27418,Indie,2006,wolop,642773.0


## Endpoint 3

def UsersRecommend( año : int ): Devuelve el top 3 de juegos MÁS recomendados por usuarios para el año dado. (reviews.recommend = True y comentarios positivos/neutrales)

Ejemplo de retorno: [{"Puesto 1" : X}, {"Puesto 2" : Y},{"Puesto 3" : Z}]

## Endpoint 4

def UsersNotRecommend( año : int ): Devuelve el top 3 de juegos MENOS recomendados por usuarios para el año dado. (reviews.recommend = False y comentarios negativos)
Ejemplo de retorno: [{"Puesto 1" : X}, {"Puesto 2" : Y},{"Puesto 3" : Z}]

## Endpoint 5

def sentiment_analysis( año : int ): Según el año de lanzamiento, se devuelve una lista con la cantidad de registros de reseñas de usuarios que se encuentren categorizados con un análisis de sentimiento.
Ejemplo de retorno: {Negative = 182, Neutral = 120, Positive = 278}

In [48]:
#Test PUSH COMMIT yy
# changes
# make changes

In [1]:
##