<h1 align='center'>pyentimento: Toolkit para multitarefas para análise de sentimentos e SocialNLP<h1/>

Data Scientist Jr.: Karina Gonçalves Soares

Link de estudo:

* [Github: pysentimiento](https://github.com/pysentimiento/pysentimiento)

* [pysentimiento: A Python Toolkit for Sentiment
Analysis and SocialNLP tasks](https://arxiv.org/pdf/2106.09462.pdf)

* [More Scraped Data, Greater Bias](https://www.deeplearning.ai/the-batch/research-shows-that-training-on-larger-datasets-can-increase-social-bias/?utm_campaign=The%20Batch&utm_content=267121221&utm_medium=social&utm_source=facebook&hss_channel=fbp-1027125564106325)


* [ON HATE SCALING LAWS FOR DATA-SWAMPS](https://arxiv.org/pdf/2306.13141.pdf)

> Neste Notebook estudamos a `biblioteca pysentimiento`, um `Toolkit Multilingual` para extração de opiniões e análises de Sentimentos `(centrado no Idioma Espanhol)`.

`pysentimiento` é uma biblioteca que utiliza modelos pré-treinado de transformers para diferentes tarefas de SocialNLP.Usa como modelos bases a BETO: Spanish BERT e RoBERTuito em Espanhol, BERTweet em Inglês e outros modelos similares em Italiano e Português.



In [None]:
#%pip install pysentimiento

Vamos `criar um analisador`. O `create_analyzer` recebe a tarefa e o idioma como parâmetros.



In [2]:
from pysentimiento import create_analyzer

import transformers

transformers.logging.set_verbosity(transformers.logging.ERROR)

analyzer = create_analyzer(task="sentiment", lang="pt")

Downloading model.safetensors: 100%|██████████| 540M/540M [03:16<00:00, 2.74MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 562/562 [00:00<00:00, 2.36MB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 799k/799k [00:01<00:00, 462kB/s]
Downloading (…)solve/main/bpe.codes: 100%|██████████| 1.04M/1.04M [00:03<00:00, 331kB/s]
Downloading (…)in/added_tokens.json: 100%|██████████| 22.0/22.0 [00:00<00:00, 21.6kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 167/167 [00:00<00:00, 388kB/s]


`Vejamos alguns exemplos:`

## **<font color="red">Exemplo1</font>**

In [3]:
exemplo1 = analyzer.predict("O Ronaldinho é muito bom como jogador!")

exemplo1

AnalyzerOutput(output=POS, probas={POS: 0.972, NEU: 0.021, NEG: 0.007})

In [6]:
# Obtendo a nossa saída 'exemplo1', de maneira mais explícita: 
output_label = exemplo1.output  # Obtemos o rótulo (POS, NEU, NEG)
probabilities = exemplo1.probas  # Obtemos as probabilidades associadas

# Probabilidades individuais para cada rótulo:
positive_probability = probabilities.get("POS", 0.0)
neutral_probability = probabilities.get("NEU", 0.0)
negative_probability = probabilities.get("NEG", 0.0)

# Teríamos o seguinte:
print(f"Rótulo: {output_label}")
print(f"Probabilidade Positiva: {round(positive_probability, 4)*100}%")
print(f"Probabilidade Neutra: {neutral_probability}%")
print(f"Probabilidade Negativa: {negative_probability}%")

Rótulo: POS
Probabilidade Positiva: 97.22%
Probabilidade Neutra: 0.020972739905118942%
Probabilidade Negativa: 0.006807348690927029%


## **<font color="red">Exemplo 2</font>**

In [7]:
analyzer.predict("A tinta da minha caneta acabou, nossa!")

AnalyzerOutput(output=NEG, probas={NEG: 0.939, NEU: 0.056, POS: 0.005})

## **<font color="red">Exemplo 3</font>**

In [8]:
analyzer.predict("Que dia é hoje?")

AnalyzerOutput(output=NEU, probas={NEU: 0.931, NEG: 0.049, POS: 0.021})

## **<font color="red">Exemplo 4</font>**

In [9]:
analyzer.predict("Hoje Eu estou 😭")

AnalyzerOutput(output=NEG, probas={NEG: 0.955, NEU: 0.026, POS: 0.019})

# <font color="yellow">Predição em batch</font>

Se temos um conjunto de orações, `pysentimiento` faz a predição em conjunto e de maneira eficiente:

In [12]:
%%time

from tqdm.auto import tqdm

oraciones = [
    "Amo ser Cientista de Dados",
    "Ver tanta pobreza me dá tristeza.",
    "O Sol está muito distante",    
] * 5

for sent in tqdm(oraciones):
    analyzer.predict(sent)

100%|██████████| 15/15 [00:01<00:00,  9.09it/s]

CPU times: user 3.06 s, sys: 0 ns, total: 3.06 s
Wall time: 1.66 s





In [13]:
%%time

rets = analyzer.predict(oraciones)

rets 

Map: 100%|██████████| 15/15 [00:01<00:00, 14.89 examples/s]


CPU times: user 4.71 s, sys: 846 ms, total: 5.55 s
Wall time: 10.3 s


[AnalyzerOutput(output=POS, probas={POS: 0.989, NEU: 0.009, NEG: 0.002}),
 AnalyzerOutput(output=NEG, probas={NEG: 0.987, NEU: 0.007, POS: 0.006}),
 AnalyzerOutput(output=NEU, probas={NEU: 0.847, NEG: 0.139, POS: 0.014}),
 AnalyzerOutput(output=POS, probas={POS: 0.989, NEU: 0.009, NEG: 0.002}),
 AnalyzerOutput(output=NEG, probas={NEG: 0.987, NEU: 0.007, POS: 0.006}),
 AnalyzerOutput(output=NEU, probas={NEU: 0.847, NEG: 0.139, POS: 0.014}),
 AnalyzerOutput(output=POS, probas={POS: 0.989, NEU: 0.009, NEG: 0.002}),
 AnalyzerOutput(output=NEG, probas={NEG: 0.987, NEU: 0.007, POS: 0.006}),
 AnalyzerOutput(output=NEU, probas={NEU: 0.847, NEG: 0.139, POS: 0.014}),
 AnalyzerOutput(output=POS, probas={POS: 0.989, NEU: 0.009, NEG: 0.002}),
 AnalyzerOutput(output=NEG, probas={NEG: 0.987, NEU: 0.007, POS: 0.006}),
 AnalyzerOutput(output=NEU, probas={NEU: 0.847, NEG: 0.139, POS: 0.014}),
 AnalyzerOutput(output=POS, probas={POS: 0.989, NEU: 0.009, NEG: 0.002}),
 AnalyzerOutput(output=NEG, probas={NE

# <font color="yellow">Emojis</font>

Suporta, também, o uso de `emojis` através da Biblioteca [emoji](https://pypi.org/project/emoji/).

In [14]:
analyzer.predict("🤢")

AnalyzerOutput(output=NEG, probas={NEG: 0.976, NEU: 0.016, POS: 0.008})

In [15]:
analyzer.predict(":)")

AnalyzerOutput(output=POS, probas={POS: 0.925, NEU: 0.069, NEG: 0.006})

`Hashtags, também:`

In [16]:
analyzer.predict("#IstoéUmaMerda")

AnalyzerOutput(output=NEG, probas={NEG: 0.992, POS: 0.004, NEU: 0.004})

# <font color="yellow">Análise Emocional</font>

`pysentimiento` fornece análise Emocional por meio de modelos pré-treinados com conjuntos de dados [EmoEvent](https://github.com/fmplaza/EmoEvent)

In [17]:
# Instanciamos o Objeto:
emotion_analyzer = create_analyzer(task="emotion", lang="pt")

Downloading (…)lve/main/config.json: 100%|██████████| 2.12k/2.12k [00:00<00:00, 64.0kB/s]
Downloading pytorch_model.bin: 100%|██████████| 436M/436M [02:36<00:00, 2.79MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 367/367 [00:00<00:00, 358kB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 210k/210k [00:00<00:00, 1.20MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 678k/678k [00:00<00:00, 2.91MB/s]
Downloading (…)in/added_tokens.json: 100%|██████████| 61.0/61.0 [00:00<00:00, 60.4kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 117kB/s]


In [18]:
emotion_1 = emotion_analyzer.predict("Eu machuquei meu joelho jogando bola, isso é terrível!")

emotion_1

AnalyzerOutput(output=['fear'], probas={admiration: 0.013, amusement: 0.002, anger: 0.009, annoyance: 0.013, approval: 0.006, caring: 0.005, confusion: 0.004, curiosity: 0.003, desire: 0.004, disappointment: 0.028, disapproval: 0.012, disgust: 0.160, embarrassment: 0.016, excitement: 0.002, fear: 0.890, gratitude: 0.005, grief: 0.008, joy: 0.001, love: 0.007, nervousness: 0.049, optimism: 0.006, pride: 0.002, realization: 0.005, relief: 0.001, remorse: 0.001, sadness: 0.074, surprise: 0.003, neutral: 0.009})

In [19]:
# A Emoção é:
emotion_1.output

['fear']

In [20]:
# Temos várias Emoções e só uma tem uma alta probabildiade:
emotion_1.probas

{'admiration': 0.013477839529514313,
 'amusement': 0.0018875550013035536,
 'anger': 0.009365188889205456,
 'annoyance': 0.013239799067378044,
 'approval': 0.00569023285061121,
 'caring': 0.005219749640673399,
 'confusion': 0.004016146529465914,
 'curiosity': 0.0025675150100141764,
 'desire': 0.004219981841742992,
 'disappointment': 0.028135400265455246,
 'disapproval': 0.011510009877383709,
 'disgust': 0.16009768843650818,
 'embarrassment': 0.01590888574719429,
 'excitement': 0.001899012946523726,
 'fear': 0.8895401954650879,
 'gratitude': 0.005402678158134222,
 'grief': 0.008150695823132992,
 'joy': 0.0013532412704080343,
 'love': 0.006999167148023844,
 'nervousness': 0.04916279762983322,
 'optimism': 0.006161699071526527,
 'pride': 0.0018651897553354502,
 'realization': 0.005212029907852411,
 'relief': 0.000912616727873683,
 'remorse': 0.0012565108481794596,
 'sadness': 0.0744243785738945,
 'surprise': 0.0028698265086859465,
 'neutral': 0.008908512070775032}

In [21]:
name_of_emotion = max(emotion_1.probas, key=emotion_1.probas.get)

value_of_emotion = emotion_1.probas[name_of_emotion]

print(f"A Emoção da nossa sentença é: {name_of_emotion} com uma probabilidade de: {value_of_emotion}")

A Emoção da nossa sentença é: fear com uma probabilidade de: 0.8895401954650879


In [22]:
emotion_analyzer.predict("Oh meu Deus!") # Em Inglês e: "omg"

AnalyzerOutput(output=['surprise'], probas={admiration: 0.012, amusement: 0.007, anger: 0.005, annoyance: 0.010, approval: 0.002, caring: 0.001, confusion: 0.003, curiosity: 0.011, desire: 0.003, disappointment: 0.004, disapproval: 0.002, disgust: 0.001, embarrassment: 0.005, excitement: 0.143, fear: 0.002, gratitude: 0.003, grief: 0.002, joy: 0.006, love: 0.003, nervousness: 0.001, optimism: 0.002, pride: 0.002, realization: 0.031, relief: 0.003, remorse: 0.001, sadness: 0.002, surprise: 0.944, neutral: 0.034})

In [23]:
emotion_2 = emotion_analyzer.predict("Gol do Brasil!")

emotion_2

AnalyzerOutput(output=[], probas={admiration: 0.095, amusement: 0.261, anger: 0.016, annoyance: 0.016, approval: 0.005, caring: 0.001, confusion: 0.001, curiosity: 0.001, desire: 0.001, disappointment: 0.003, disapproval: 0.001, disgust: 0.001, embarrassment: 0.002, excitement: 0.089, fear: 0.000, gratitude: 0.003, grief: 0.001, joy: 0.168, love: 0.001, nervousness: 0.000, optimism: 0.001, pride: 0.006, realization: 0.014, relief: 0.004, remorse: 0.001, sadness: 0.001, surprise: 0.035, neutral: 0.231})