# Análisis de Polaridad

En análisis de polaridad consiste en otorgar un valor numérico (positivo o negativo) a diferentes palabras o expresiones asociadas a su uso más común con respecto a las emociones.

### Vader Sentiment Analysis
VADER Sentiment Analysis es una herramienta de analisis de sentimiento basada en <b>reglas léxicas</b>. Funciona a partir de un conjunto de palabras o <b>lexicón</b> cuyo significado semántico está asociado mayormente a expresiones positivas o negativas. Al analizar una oración Vader tiene en cuenta todas las palabras que puedan afectar el sentimiento expresado y da como resultado un analisis del porcentaje estimado de expresiones positivas y negativas.

Las palabras reconocidas por Vader tienen asociadas un <b>valor numérico</b> positivo o negativo dependiendo de la emoción que expresan. Si una palabra no se encuentra en la lista de palabras valoradas esta no se tiene en cuenta para la evaluación.

Al realizar un análisis de sentimiento sobre un texto Vader devuelve 4 valores
- Porcentaje del contenido que cae dentro de la clasificación positiva
- Porcentaje del contenido que cae dentro de la clasificación neutral
- Porcentaje del contenido que cae dentro de la clasificación negativa
- Puntuación compuesta: Total de los valores obtenidos normalizado entre 1 y -1


In [1]:
import nltk
nltk.download('vader_lexicon')
#conda install -c conda-forge vadersentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/lucas/nltk_data...


### Obtener polaridad

In [2]:
sentence = "This is a great product! Very good quality"
# Instanciar Analizador
sentiment_analyzer = SentimentIntensityAnalyzer()
# Analizar polaridad de la oración
analisis = sentiment_analyzer.polarity_scores(sentence)
print(analisis)        

{'neg': 0.0, 'neu': 0.442, 'pos': 0.558, 'compound': 0.8217}


## Ejemplo

In [6]:
import pandas as pd
df = pd.read_csv("movie-reviews.csv")
df

Unnamed: 0,reviews
0,Excellent movie. The acting was great and it i...
1,A bit boring at first but enjoyable.
2,Very boring. You cant tell it has a low budget.
3,I loved it! Best movie of the year!
4,The special effects looked cheap and the plot ...
5,The actors did a good job but the script is te...
6,Boring. So boring.
7,Not the worst movie I've seen
8,A waste of time


In [7]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    #Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row['reviews'])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis['compound'] > 0.6 :
        row["result"] = "Positive"
    elif analisis['compound'] <  0.6:
        row["result"] = "Negative"
    else :
        row["result"] = "Neutral"
df

Unnamed: 0,reviews,negative,neutral,positive,result
0,Excellent movie. The acting was great and it i...,0.0,0.469,0.531,Positive
1,A bit boring at first but enjoyable.,0.157,0.476,0.367,Negative
2,Very boring. You cant tell it has a low budget.,0.37,0.63,0.0,Negative
3,I loved it! Best movie of the year!,0.0,0.409,0.591,Positive
4,The special effects looked cheap and the plot ...,0.0,0.803,0.197,Negative
5,The actors did a good job but the script is te...,0.275,0.596,0.129,Negative
6,Boring. So boring.,0.83,0.17,0.0,Negative
7,Not the worst movie I've seen,0.0,0.603,0.397,Negative
8,A waste of time,0.483,0.517,0.0,Negative


# Ejercicio

- Obtener de la API Tweets que no sean retweet y que contengan el hashtag #SpiderMan2 en inglés.
- Realizar limpieza en los datos
- Evaluar la polaridad 

In [9]:
import os
from dotenv import load_dotenv
import pandas as pd
import requests
# Cargar valores del archivo .env en las variables de entorno
load_dotenv()
# Cargar valor del token a variable
bearer_token = os.environ.get("BEARER_TOKEN")
url = "https://api.twitter.com/2/tweets/search/recent"
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent":"v2FullArchiveSearchPython"
} 
hashtag='#SpiderMan2'
params = {
    'query': f'{hashtag} -is:retweet lang:en',
    'max_results':100
}
response = requests.get(url, headers=headers, params=params)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
json_response = response.json()['data']
df = pd.json_normalize(json_response)
df

Unnamed: 0,id,text
0,1450253980819496961,"Happy birthday, @NajJeter. I can't wait for yo..."
1,1450231450872262667,Day 37 of asking Insomniac to please add the C...
2,1450176713380573184,@NetflixUK @netflix where is #spiderman2?? How...
3,1450120733972119552,@RaggySays hello Jason how are you today I wan...
4,1450120372527960068,@AdamNoonchester hello Adam how are you today ...
...,...,...
95,1448125476749332488,@insomniacgames \nI have one important questio...
96,1448125117494607879,“Let me slip away!!!” 🔥🔥🔥 #SpiderMan2 #Quarant...
97,1448123375218462731,@ToastDraws @PhaseZeroCB thanks for hanging. t...
98,1448123153406906368,Allways good to see you Normie #avengeme #Spid...


In [44]:
import nltk
from nltk.corpus import stopwords
# nltk.download('stopwords')
print(stopwords.words('english'))
stop_words = set(stopwords.words('english'))
import re
from nltk.tokenize import TweetTokenizer
# Instanciar Tokenizer
tt = TweetTokenizer()
# Aplicar Tokenizer a la columna
tokenized_text = df['text'].apply(tt.tokenize)
def clean(x):
    regrex_pattern = re.compile(pattern = "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           "]+", flags = re.UNICODE)
    n = regrex_pattern.sub(r'', x)
    if (not x.lower() in stop_words) and (x[0] != '@') and (n != '') and (x[0] != '#'):
        return True
    else:
        return False
for index, tweet in enumerate(tokenized_text):
    tweet = [x for x in tweet if clean(x)]
    tokenized_text[index] = tweet
df["tokenized_text"] = tokenized_text
df

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

Unnamed: 0,id,text,tokenized_text,polaridad,negative,neutral,positive,result
0,1450253980819496961,"Happy birthday, @NajJeter. I can't wait for yo...","[Happy, birthday, ,, ., can't, wait, return, M...","Happy birthday , . can't wait return Miles #Sp...",0.0,0.802,0.198,Neutral
1,1450231450872262667,Day 37 of asking Insomniac to please add the C...,"[Day, 37, asking, Insomniac, please, add, Capt...",Day 37 asking Insomniac please add Captain Uni...,0.0,0.881,0.119,Neutral
2,1450176713380573184,@NetflixUK @netflix where is #spiderman2?? How...,"[?, ?, could, leave, ?, Preparing]",#spiderman2 ? ? could leave ? Preparing #Spide...,0.179,0.821,0.0,Negative
3,1450120733972119552,@RaggySays hello Jason how are you today I wan...,"[hello, Jason, today, want, present, new, suit...",hello Jason today want present new suit Peter ...,0.0,0.767,0.233,Positive
4,1450120372527960068,@AdamNoonchester hello Adam how are you today ...,"[hello, Adam, today, want, present, new, suit,...",hello Adam today want present new suit Peter P...,0.0,0.789,0.211,Positive
...,...,...,...,...,...,...,...,...
95,1448125476749332488,@insomniacgames \nI have one important questio...,"[one, important, question, ., know, said, woul...",one important question . know said #Spiderman2...,0.0,0.661,0.339,Positive
96,1448125117494607879,“Let me slip away!!!” 🔥🔥🔥 #SpiderMan2 #Quarant...,"[“, Let, slip, away, !, !, !, ”, https://t.co/...",“ Let slip away ! ! ! ” #SpiderMan2 #Quarantin...,0.0,1.0,0.0,Negative
97,1448123375218462731,@ToastDraws @PhaseZeroCB thanks for hanging. t...,"[thanks, hanging, ., fun, ., love, movie, .]",thanks hanging . fun . love movie . #SpiderMan...,0.0,0.402,0.598,Positive
98,1448123153406906368,Allways good to see you Normie #avengeme #Spid...,"[Allways, good, see, Normie, https://t.co/3yB4...",Allways good see Normie #avengeme #SpiderManNo...,0.0,0.734,0.266,Neutral


In [45]:
l = []
for i in df["tokenized_text"]:
    l.append(" ".join(i))
df["polaridad"] = l
df

Unnamed: 0,id,text,tokenized_text,polaridad,negative,neutral,positive,result
0,1450253980819496961,"Happy birthday, @NajJeter. I can't wait for yo...","[Happy, birthday, ,, ., can't, wait, return, M...","Happy birthday , . can't wait return Miles PS5...",0.0,0.802,0.198,Neutral
1,1450231450872262667,Day 37 of asking Insomniac to please add the C...,"[Day, 37, asking, Insomniac, please, add, Capt...",Day 37 asking Insomniac please add Captain Uni...,0.0,0.881,0.119,Neutral
2,1450176713380573184,@NetflixUK @netflix where is #spiderman2?? How...,"[?, ?, could, leave, ?, Preparing]",? ? could leave ? Preparing,0.179,0.821,0.0,Negative
3,1450120733972119552,@RaggySays hello Jason how are you today I wan...,"[hello, Jason, today, want, present, new, suit...",hello Jason today want present new suit Peter ...,0.0,0.767,0.233,Positive
4,1450120372527960068,@AdamNoonchester hello Adam how are you today ...,"[hello, Adam, today, want, present, new, suit,...",hello Adam today want present new suit Peter P...,0.0,0.789,0.211,Positive
...,...,...,...,...,...,...,...,...
95,1448125476749332488,@insomniacgames \nI have one important questio...,"[one, important, question, ., know, said, woul...",one important question . know said would dark ...,0.0,0.661,0.339,Positive
96,1448125117494607879,“Let me slip away!!!” 🔥🔥🔥 #SpiderMan2 #Quarant...,"[“, Let, slip, away, !, !, !, ”, https://t.co/...",“ Let slip away ! ! ! ” https://t.co/V3yxiBwsAK,0.0,1.0,0.0,Negative
97,1448123375218462731,@ToastDraws @PhaseZeroCB thanks for hanging. t...,"[thanks, hanging, ., fun, ., love, movie, .]",thanks hanging . fun . love movie .,0.0,0.402,0.598,Positive
98,1448123153406906368,Allways good to see you Normie #avengeme #Spid...,"[Allways, good, see, Normie, https://t.co/3yB4...",Allways good see Normie https://t.co/3yB4fNOv2Y,0.0,0.734,0.266,Neutral


In [49]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    #Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row['polaridad'])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis['compound'] > 0.4 :
        row["result"] = "Positive"
    elif analisis['compound'] <  -0.2:
        row["result"] = "Negative"
    else :
        row["result"] = "Neutral"
df[["text", "result"]]

Unnamed: 0,text,result
0,"Happy birthday, @NajJeter. I can't wait for yo...",Positive
1,Day 37 of asking Insomniac to please add the C...,Neutral
2,@NetflixUK @netflix where is #spiderman2?? How...,Neutral
3,@RaggySays hello Jason how are you today I wan...,Positive
4,@AdamNoonchester hello Adam how are you today ...,Positive
...,...,...
95,@insomniacgames \nI have one important questio...,Positive
96,“Let me slip away!!!” 🔥🔥🔥 #SpiderMan2 #Quarant...,Neutral
97,@ToastDraws @PhaseZeroCB thanks for hanging. t...,Positive
98,Allways good to see you Normie #avengeme #Spid...,Positive


In [50]:
df.to_csv('SpiderMan.csv')

In [51]:
import csv
with open('SpiderMan.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(row['polaridad'], "----", row['result'])

Happy birthday , . can't wait return Miles PS5 . - Giovanni De La Cruz ---- Positive
Day 37 asking Insomniac please add Captain Universe suit Spider-Man 2 either Peter Miles https://t.co/FlgmlqE6f2 ---- Neutral
? ? could leave ? Preparing ---- Neutral
hello Jason today want present new suit Peter Parker fusion spidey CAP hope like approve 2023 ️ ️ ️ ️ ️ https://t.co/3pESDyz5Lk ---- Positive
hello Adam today want present new suit Peter Parker fusion spidey CAP hope like approve 2023 ️ ️ ️ ️ ️ ️ ️ ️ https://t.co/kmjOizrl5C ---- Positive
hello ryan today want present new suit Peter Parker fusion spidey CAP hope like approve 2023 ️ ️ ️ ️ ️ https://t.co/Cx5BtIKH3M ---- Positive
forget games . https://t.co/6m6XKiqvGk ---- Negative
Day 36 asking Insomniac please add Captain Universe suit Spider-Man 2 either Peter Miles https://t.co/BJ51Xs5Jmk ---- Neutral
Found Symbiote Suit leaks . :P , https://t.co/wL1Z9GjJzs ---- Positive
OMG ! ! Ready https://t.co/hKStmeSLrT ---- Positive
Spiderman 2 Leak