# Feature Engineering

Este notebook se enfoca en el Feature Engineering, esencial para preparar los datos para el modelado de aprendizaje automático. Aquí, seleccionaremos, transformaremos o crearemos nuevas variables basadas en los datasets procesados previamente.

In [1]:
import pandas as pd
from textblob import TextBlob

## Carga de Datos Procesados


In [2]:
steam_games = pd.read_pickle("../data/steam_games.pkl")
user_reviews = pd.read_pickle("../data/user_reviews.pkl")
users_items = pd.read_pickle("../data/users_items.pkl")

In [6]:
print(steam_games)
print(user_reviews)
print(users_items)

           genres             app_name release_date  price      id  \
0          Action  Lost Summoner Kitty   2018-01-04   4.99  761140   
1          Casual  Lost Summoner Kitty   2018-01-04   4.99  761140   
2           Indie  Lost Summoner Kitty   2018-01-04   4.99  761140   
3      Simulation  Lost Summoner Kitty   2018-01-04   4.99  761140   
4        Strategy  Lost Summoner Kitty   2018-01-04   4.99  761140   
...           ...                  ...          ...    ...     ...   
67927       Indie        Russian Roads   2018-01-04   1.99  610660   
67928      Racing        Russian Roads   2018-01-04   1.99  610660   
67929  Simulation        Russian Roads   2018-01-04   1.99  610660   
67930      Casual    Exit 2 Directions   2017-09-02   4.99  658870   
67931       Indie    Exit 2 Directions   2017-09-02   4.99  658870   

                      developer  release_year  
0                     Kotoshiro          2018  
1                     Kotoshiro          2018  
2              

##  Análisis de Sentimientos en Reseñas de Usuarios

El propósito es evaluar el sentimiento de las reseñas, asignando una puntuación basada en la polaridad del texto.

In [54]:


def analyze_sentiment(review_text):
    if pd.isnull(review_text):
        return 1  # Neutral por defecto si no hay reseña
    analysis = TextBlob(review_text)
    if analysis.sentiment.polarity > 0.1:
        return 2  # Positivo
    elif analysis.sentiment.polarity < -0.1:
        return 0  # Negativo
    else:
        return 1  # Neutral


user_reviews['sentiment_analysis'] = user_reviews['review'].apply(
    analyze_sentiment)
# Guardar cambios 

user_reviews.to_pickle("../data/user_reviews.pkl")

In [17]:
# Muestra aletoria de reviews y sentimientos para comprobación

for i in range(10):
    print(
        f"Review: {user_reviews['review'].iloc[i]} \nSentiment: {user_reviews['sentiment_analysis'].iloc[i]} \n")

Review: Simple yet with great replayability. In my opinion does "zombie" hordes and team work better than left 4 dead plus has a global leveling system. Alot of down to earth "zombie" splattering fun for the whole family. Amazed this sort of FPS is so rare. 
Sentiment: 2 

Review: It's unique and worth a playthrough. 
Sentiment: 2 

Review: Great atmosphere. The gunplay can be a bit chunky at times but at the end of the day this game is definitely worth it and I hope they do a sequel...so buy the game so I get a sequel! 
Sentiment: 1 

Review: I know what you think when you see this title "Barbie Dreamhouse Party" but do not be intimidated by it's title, this is easily one of my GOTYs. You don't get any of that cliche game mechanics that all the latest games have, this is simply good core gameplay. Yes, you can't 360 noscope your friends, but what you can do is show them up with your bad ♥♥♥ dance moves and put them to shame as you show them what true fashion and color combinations are

## Funciones para la API 

Crearemos y testearemos las funciones que estaran disponibles como endpoints y seran consumidas.



In [None]:
def developer_info(desarrollador: str, steam_games_df):
    # Filtrar juegos por desarrollador
    dev_games = steam_games_df[steam_games_df['developer'] == desarrollador]
    # Agrupar por año y contar juegos
    games_per_year = dev_games.groupby('release_year').agg(
        {'app_name': 'count', 'price': lambda x: (x == 0).mean()})
    games_per_year.rename(columns={'app_name': 'Cantidad de Juegos',
                          'price': 'Porcentaje Juegos Gratuitos'}, inplace=True)
    games_per_year['Porcentaje Juegos Gratuitos'] = games_per_year['Porcentaje Juegos Gratuitos'] * 100
    return games_per_year.reset_index().to_dict('records')


developer_info('Valve', steam_games)

In [26]:
def user_data(user_id: str, user_reviews_df, user_items_df):
    # Calcular el gasto total del usuario
    user_games = user_items_df[user_items_df['user_id'] == user_id]
    # Simulación del gasto total
    total_spent = user_games['playtime_forever'].sum()

    # Calcular el porcentaje de recomendación
    user_recommendations = user_reviews_df[user_reviews_df['user_id'] == user_id]
    recommend_percentage = (user_recommendations['recommend'].mean()) * 100

    # Contar la cantidad de items
    items_count = user_games.shape[0]

    return {
        "Usuario": user_id,
        # Asumiendo que el gasto se relaciona con las horas de juego
        "Dinero gastado": f"{total_spent} horas (simulado)",
        "Porcentaje de recomendación": f"{recommend_percentage:.2f}%",
        "Cantidad de items": items_count
    }


user_data('76561197960265729', user_reviews, users_items)

{'Usuario': '76561197960265729',
 'Dinero gastado': '0 horas (simulado)',
 'Porcentaje de recomendación': 'nan%',
 'Cantidad de items': 0}

In [48]:
def user_for_genre(genre: str, steam_games_df, user_items_df):
    # Filtrar juegos por género
    genre_games = steam_games_df[steam_games_df['genres'].str.contains(
        genre, case=False, na=False)]

    # Unir con user_items para obtener los juegos jugados por los usuarios que coinciden con el género
    genre_user_items = user_items_df.merge(
        genre_games, left_on='item_id', right_on='id')

    # Agrupar por usuario y sumar el tiempo de juego total
    user_playtime = genre_user_items.groupby(
        'user_id')['playtime_forever'].sum().reset_index()

    # Encontrar el usuario con más tiempo de juego
    top_user = user_playtime.loc[user_playtime['playtime_forever'].idxmax()]

    # Filtrar los juegos jugados por el top user y agrupar por año
    top_user_games = genre_user_items[genre_user_items['user_id']
                                      == top_user['user_id']]
    hours_played_by_year = top_user_games.groupby(
        'release_year')['playtime_forever'].sum().reset_index()

    return {
        "Usuario con más horas jugadas para Género": top_user['user_id'],
        "Horas jugadas": hours_played_by_year.to_dict('records')
    }


user_for_genre('Action', steam_games, users_items)

{'Usuario con más horas jugadas para Género': 'Sp3ctre',
 'Horas jugadas': [{'release_year': 1995.0, 'playtime_forever': 217},
  {'release_year': 1999.0, 'playtime_forever': 44},
  {'release_year': 2000.0, 'playtime_forever': 70644},
  {'release_year': 2001.0, 'playtime_forever': 13},
  {'release_year': 2002.0, 'playtime_forever': 238},
  {'release_year': 2003.0, 'playtime_forever': 7673},
  {'release_year': 2004.0, 'playtime_forever': 127411},
  {'release_year': 2005.0, 'playtime_forever': 21339},
  {'release_year': 2006.0, 'playtime_forever': 896},
  {'release_year': 2007.0, 'playtime_forever': 112784},
  {'release_year': 2008.0, 'playtime_forever': 224},
  {'release_year': 2009.0, 'playtime_forever': 108326},
  {'release_year': 2010.0, 'playtime_forever': 78083},
  {'release_year': 2011.0, 'playtime_forever': 154896},
  {'release_year': 2012.0, 'playtime_forever': 378296},
  {'release_year': 2013.0, 'playtime_forever': 120306},
  {'release_year': 2014.0, 'playtime_forever': 130452},

In [46]:
steam_games['release_year'].unique()

array([2018., 2017., 1997., 1998., 2016., 2006., 2005., 2003., 2007.,
       2002., 2000., 1995., 1996., 1994., 2001., 1993., 2004., 2008.,
       2009.,   nan, 1999., 1992., 1989., 2010., 2011., 2013., 2012.,
       2014., 1983., 1984., 2015., 1990., 1988., 1991., 1987., 1986.,
       2021., 2019., 1985.])

In [47]:
def best_developer_year(year: int, steam_games_df, user_reviews_df):
    # Filtrar juegos por año
    year_games = steam_games_df[steam_games_df['release_year'] == year]
    
    # Unir con user_reviews para obtener las reseñas de los juegos de ese año
    reviews_year_games = user_reviews_df.merge(year_games, left_on='item_id', right_on='id')
    
    # Filtrar por recomendaciones positivas
    positive_reviews = reviews_year_games[reviews_year_games['recommend'] == True]
    
    # Agrupar por desarrollador y contar las recomendaciones
    developer_recommendations = positive_reviews.groupby('developer')['recommend'].count().reset_index()
    
    # Ordenar y obtener el top 3
    top_3_developers = developer_recommendations.sort_values(by='recommend', ascending=False).head(3)
    
    return top_3_developers.to_dict('records')

best_developer_year(1999, steam_games, user_reviews)

[{'developer': 'Irrational Games,Looking Glass Studios', 'recommend': 12},
 {'developer': 'Valve', 'recommend': 10},
 {'developer': 'Team17 Digital Ltd', 'recommend': 5}]

In [4]:
def developer_reviews_analysis(desarrolladora: str, steam_games_df, user_reviews_df):
    # Filtrar los juegos por el desarrollador
    dev_games = steam_games_df[steam_games_df['developer'] == desarrolladora]
    
    # Unir con user_reviews para obtener las reseñas de los juegos de ese desarrollador
    dev_reviews = user_reviews_df.merge(dev_games, left_on='item_id', right_on='id')
    
    # Contar las reseñas positivas y negativas
    sentiment_count = dev_reviews['sentiment_analysis'].value_counts()
    
    # Preparar el resultado
    result = {
        desarrolladora: {
            'Negative': sentiment_count.get(0, 0),  # Si no hay reseñas negativas, retorna 0
            'Positive': sentiment_count.get(2, 0)   # Si no hay reseñas positivas, retorna 0
        }
    }
    
    return result

developer_reviews_analysis('Kotoshiro', steam_games, user_reviews)

{'Kotoshiro': {'Negative': 0, 'Positive': 0}}