# Análisis de Polaridad

En análisis de polaridad consiste en otorgar un valor numérico (positivo o negativo) a diferentes palabras o expresiones asociadas a su uso más común con respecto a las emociones.

### Vader Sentiment Analysis
VADER Sentiment Analysis es una herramienta de analisis de sentimiento basada en <b>reglas léxicas</b>. Funciona a partir de un conjunto de palabras o <b>lexicón</b> cuyo significado semántico está asociado mayormente a expresiones positivas o negativas. Al analizar una oración Vader tiene en cuenta todas las palabras que puedan afectar el sentimiento expresado y da como resultado un analisis del porcentaje estimado de expresiones positivas y negativas.

Las palabras reconocidas por Vader tienen asociadas un <b>valor numérico</b> positivo o negativo dependiendo de la emoción que expresan. Si una palabra no se encuentra en la lista de palabras valoradas esta no se tiene en cuenta para la evaluación.

Al realizar un análisis de sentimiento sobre un texto Vader devuelve 4 valores
- Porcentaje del contenido que cae dentro de la clasificación positiva
- Porcentaje del contenido que cae dentro de la clasificación neutral
- Porcentaje del contenido que cae dentro de la clasificación negativa
- Puntuación compuesta: Total de los valores obtenidos normalizado entre 1 y -1


In [3]:
import nltk
#nltk.download('vader_lexicon')
#conda install -c conda-forge vadersentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

### Obtener polaridad

In [4]:
sentence = "This is a great product! Very good quality"
# Instanciar Analizador
sentiment_analyzer = SentimentIntensityAnalyzer()
# Analizar polaridad de la oración
analisis = sentiment_analyzer.polarity_scores(sentence)
print(analisis)        

{'neg': 0.0, 'neu': 0.442, 'pos': 0.558, 'compound': 0.8217}


## Ejemplo

In [6]:
import pandas as pd
df = pd.read_csv("movie-reviews.csv")
df

Unnamed: 0,reviews
0,Excellent movie. The acting was great and it i...
1,A bit boring at first but enjoyable.
2,Very boring. You cant tell it has a low budget.
3,I loved it! Best movie of the year!
4,The special effects looked cheap and the plot ...
5,The actors did a good job but the script is te...
6,Boring. So boring.
7,Not the worst movie I've seen
8,A waste of time


In [7]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    #Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row['reviews'])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis['compound'] > 0.6 :
        row["result"] = "Positive"
    elif analisis['compound'] <  0.6:
        row["result"] = "Negative"
    else :
        row["result"] = "Neutral"
df

Unnamed: 0,reviews,negative,neutral,positive,result
0,Excellent movie. The acting was great and it i...,0.0,0.469,0.531,Positive
1,A bit boring at first but enjoyable.,0.157,0.476,0.367,Negative
2,Very boring. You cant tell it has a low budget.,0.37,0.63,0.0,Negative
3,I loved it! Best movie of the year!,0.0,0.409,0.591,Positive
4,The special effects looked cheap and the plot ...,0.0,0.803,0.197,Negative
5,The actors did a good job but the script is te...,0.275,0.596,0.129,Negative
6,Boring. So boring.,0.83,0.17,0.0,Negative
7,Not the worst movie I've seen,0.0,0.603,0.397,Negative
8,A waste of time,0.483,0.517,0.0,Negative


# Ejercicio

- Obtener de la API Tweets que no sean retweet y que contengan el hashtag #SpiderMan2 en inglés.
- Realizar limpieza en los datos
- Evaluar la polaridad 

In [8]:
# Obtener de la API Tweets que no sean retweet y que contengan el hashtag #SpiderMan2 en inglés.
import os
import pandas as pd
import requests
from dotenv import load_dotenv
# Cargar valores del archivo .env en las variables de entorno
load_dotenv()
# Cargar valor del token a variable
bearer_token = os.environ.get("BEARER_TOKEN")

url = "https://api.twitter.com/2/tweets/search/recent"
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent":"v2FullArchiveSearchPython"
} 

params = {
    'query': f'#SpiderMan2 lang:en -is:retweet',
    'max_results':50
}
def get_data(url,params):
    results = []

    for _ in range(10):
        response = requests.get(url, headers=headers, params=params)
        # Generar excepción si la respuesta no es exitosa
        if response.status_code != 200:
            raise Exception(response.status_code, response.text)
        data = response.json()['data']
        meta_data = dict(response.json())['meta']
        results.append(pd.json_normalize(data))
        if 'next_token' not in meta_data:
            break
        else:
            token = meta_data['next_token']
            params = {
                'query': f'#SpiderMan2 lang:en -is:retweet',
                'next_token':token,
                'max_results':50
            }
    return pd.concat(results)
df = get_data(url,params)
df.to_csv('polaridad.csv')  
df

Unnamed: 0,id,text
0,1451906290830102531,#SpiderMan2 Fusion Reactor 🤞🏻🙂🔥\n\n#SpiderManN...
1,1451755872976138252,yo wait what if Peter and Miles fight Venom fi...
2,1451752406178488325,Checkout our latest episode! #SpiderMan2 #DCTi...
3,1451748933869445128,Spiderman Miles Morales Appreciation Post🕷🕸🥰🥵🔥...
4,1451744923158470657,Marvel Friday: Swan Song (Spider-Man 2) https:...
...,...,...
16,1449483708793294850,@yeahmergo Can’t wait for #SpiderManNoWayHome!...
17,1449466843849183239,New #SpiderManNoWayHomeleak shows Tom Holland’...
18,1449445076510277635,#Spiderman2’s Yuri Lowenthal chats about fan l...
19,1449431178507952133,I wish #Venom is playable character in #Spider...


In [11]:
# Limpieza
df = pd.read_csv("polaridad.csv")  

# removiendo signos de pregunta y exclamación
df['text'] = df['text'].str.replace('¿', '')
df['text'] = df['text'].str.replace('?', '', regex=False) # Daba un warning
df['text'] = df['text'].str.replace('!', '')
df['text'] = df['text'].str.replace('¡', '')
# removiendo tildes
df['text'] = df['text'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')
# removiendo simbolo hashtag
df['text'] = df['text'].str.replace('#', '')
# removiendo menciones
df['text'] = df['text'].replace(r'(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)', '', regex=True)
# removiendo caracteres numéricos
df['text'] = df['text'].replace(r'[0-9]+', '', regex=True)
# removiendo emojis
df = df.astype(str).apply(lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))
# eliminando urls
df['text'] = df['text'].replace(r'http\S+', '', regex=True).replace(r'www\S+', '', regex=True)
# removiendo saltos de linea, espacios en blanco y tabs
df['text'] = df['text'].str.replace('\n', '')
df['text'] = df['text'].str.replace('\t', '')
df['text'] = df['text'].str.replace(' {2,}', ' ', regex=True)
df['text'] = df['text'].str.strip()
# convirtiendo texto a minúscula
df['text'] = df['text'].str.lower()
# removiendo filas vacias
df.dropna()
df['text'].astype(bool)
df = df[df['text'].astype(bool)]
df

Unnamed: 0.1,Unnamed: 0,id,text
0,0,1451906290830102531,spiderman fusion reactor spidermannowayhome
1,1,1451755872976138252,yo wait what if peter and miles fight venom fi...
2,2,1451752406178488325,checkout our latest episode spiderman dctitans...
3,3,1451748933869445128,spiderman miles morales appreciation postthose...
4,4,1451744923158470657,marvel friday swan song spiderman spiderman sp...
...,...,...,...
66,16,1449483708793294850,cant wait for spidermannowayhome plus the spid...
67,17,1449466843849183239,new spidermannowayhomeleak shows tom hollands ...
68,18,1449445076510277635,spidermans yuri lowenthal chats about fan love...
69,19,1449431178507952133,i wish venom is playable character in spiderma...


In [12]:
# Polarización

df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    #Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row['text'])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis['compound'] > 0.6 :
        row["result"] = "Positive"
    elif analisis['compound'] <  0.6:
        row["result"] = "Negative"
    else :
        row["result"] = "Neutral"
df

Unnamed: 0.1,Unnamed: 0,id,text,negative,neutral,positive,result
0,0,1451906290830102531,spiderman fusion reactor spidermannowayhome,0.0,1.0,0.0,Negative
1,1,1451755872976138252,yo wait what if peter and miles fight venom fi...,0.129,0.821,0.05,Negative
2,2,1451752406178488325,checkout our latest episode spiderman dctitans...,0.0,1.0,0.0,Negative
3,3,1451748933869445128,spiderman miles morales appreciation postthose...,0.052,0.658,0.291,Positive
4,4,1451744923158470657,marvel friday swan song spiderman spiderman sp...,0.0,0.865,0.135,Negative
...,...,...,...,...,...,...,...
66,16,1449483708793294850,cant wait for spidermannowayhome plus the spid...,0.0,1.0,0.0,Negative
67,17,1449466843849183239,new spidermannowayhomeleak shows tom hollands ...,0.087,0.913,0.0,Negative
68,18,1449445076510277635,spidermans yuri lowenthal chats about fan love...,0.0,0.667,0.333,Positive
69,19,1449431178507952133,i wish venom is playable character in spiderma...,0.0,0.56,0.44,Positive
