# **Creación de la columna 'sentiment_analysis'**

En este jupyter vamos a crear la columna 'sentiment_analysis' usando Procesamiento del Lenguaje Natural  
Usaremos el parquet ya transformado 'user_reviews.parquet'

In [1]:
import pandas as pd
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer

Lexico que usaremos en el NLP

In [2]:
nltk.download("vader_lexicon")

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Usuario\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

Creamos el metodo para clasificar las reviews

In [3]:
def analizar_sentimiento(texto):
    if isinstance(texto, str):
        if texto.strip() != '':

            sia = SentimentIntensityAnalyzer()
            sentimiento = sia.polarity_scores(texto)["compound"]

            if sentimiento > 0.2:
                return 2
            elif sentimiento < -0.2:
                return 0
            
            else: return 1

    return 1

Traemos el csv

In [4]:
ruta_user_reviews = r"C:\\Users\\Usuario\Desktop\\Labs\\Proyecto_1\\datasets\\user_reviews.parquet"

df = pd.read_parquet(ruta_user_reviews)
df

Unnamed: 0,user_id,item_id,recommend,review
0,76561197970982479,1250,True,Simple yet with great replayability. In my opi...
1,76561197970982479,22200,True,It's unique and worth a playthrough.
2,76561197970982479,43110,True,Great atmosphere. The gunplay can be a bit chu...
3,js41637,251610,True,I know what you think when you see this title ...
4,js41637,227300,True,For a simple (it's actually not all that simpl...
...,...,...,...,...
58453,76561198312638244,70,True,a must have classic from steam definitely wort...
58454,76561198312638244,362890,True,this game is a perfect remake of the original ...
58455,LydiaMorley,273110,True,had so much fun plaing this and collecting res...
58456,LydiaMorley,730,True,:D


In [5]:
df["sentimemt_analysis"] = df["review"].fillna('').apply(lambda i: analizar_sentimiento(i))
df

Unnamed: 0,user_id,item_id,recommend,review,sentimemt_analysis
0,76561197970982479,1250,True,Simple yet with great replayability. In my opi...,2
1,76561197970982479,22200,True,It's unique and worth a playthrough.,2
2,76561197970982479,43110,True,Great atmosphere. The gunplay can be a bit chu...,2
3,js41637,251610,True,I know what you think when you see this title ...,2
4,js41637,227300,True,For a simple (it's actually not all that simpl...,2
...,...,...,...,...,...
58453,76561198312638244,70,True,a must have classic from steam definitely wort...,2
58454,76561198312638244,362890,True,this game is a perfect remake of the original ...,2
58455,LydiaMorley,273110,True,had so much fun plaing this and collecting res...,2
58456,LydiaMorley,730,True,:D,2


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58458 entries, 0 to 58457
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   user_id             58458 non-null  object
 1   item_id             58430 non-null  object
 2   recommend           58430 non-null  object
 3   review              58430 non-null  object
 4   sentimemt_analysis  58458 non-null  int64 
dtypes: int64(1), object(4)
memory usage: 2.2+ MB


Eliminamos ahora la columna 'review'

In [7]:
df = df.drop("review", axis=1)
df

Unnamed: 0,user_id,item_id,recommend,sentimemt_analysis
0,76561197970982479,1250,True,2
1,76561197970982479,22200,True,2
2,76561197970982479,43110,True,2
3,js41637,251610,True,2
4,js41637,227300,True,2
...,...,...,...,...
58453,76561198312638244,70,True,2
58454,76561198312638244,362890,True,2
58455,LydiaMorley,273110,True,2
58456,LydiaMorley,730,True,2


Exportamos a un csv

In [8]:
df.to_parquet("./datasets_endpoints/user_reviews_sentiment.parquet")