# pysentimiento: A multilingual toolkit for Sentiment Analysis and SocialNLP tasks

# Sentiment Analysis and Opinion Mining in Italian



In this notebook we show a brief example of how to use [pysentimiento](https://github.com/pysentimiento/pysentimiento/), a multilingual toolkit for opinion mining and sentiment analysis.

`pysentimiento` supports the following tasks for Italian:

- Sentiment Analysis 
- Hate Speech Detection
- Irony Detection
- Emotion Analysis

First, let's install the library

-- 

In questa notebook mostriamo un breve esempio di come utilizzare [pysentimiento](https://github.com/pysentimiento/pysentimiento/), un toolkit multilingue per l'estrazione di opinioni e l'analisi del sentiment.

`pysentimiento` supporta le seguenti attività per l'italiano:

- Analisi del sentimento
- Rilevamento di incitamento all'odio 
- Rilevamento dell'ironia
- Analisi delle emozioni

Innanzitutto, installiamo la libreria


In [1]:
!pip install pysentimiento


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Let's create an analyzer. The `create_analyzer` receives the task and the language as parameters.

In [2]:
from pysentimiento import create_analyzer
analyzer = create_analyzer(task="sentiment", lang="it")



The sentiment analysis module for Italian is based on the [SENTIPOLC@EvalITA](http://www.di.unito.it/~tutreeb/sentipolc-evalita16/) dataset, which is annotated with two labels:

- has the text a positive sentiment? (pos)
- has the text a negative sentiment? (neg)

`pos` and `neg` are binary variables, thus this results in 4 combinations:

- if `pos` is 0 and `neg` is 0, the text is neutral
- if `pos` is 1 and `neg` is 0, the text has a positive sentiment}
- if `pos` is 0 and `neg` is 1, the text has a negative sentiment
- if `pos` is 1 and `neg` is 1, the text has both a positive and a negative sentiment (we call this a mixed sentiment)

(see [TASKS](docs/TASKS.md) for more information)



In [3]:
# A positive text in Italian
analyzer.predict("Questo è fantastico")

AnalyzerOutput(output=['pos'], probas={pos: 0.969, neg: 0.009})

We have `pos` output alone

In [5]:
analyzer.predict("Questo è una merda")

AnalyzerOutput(output=['neg'], probas={pos: 0.029, neg: 0.989})

In [9]:
# A mixed-sentiment text in Italian
analyzer.predict("Sono contento che il Bayern abbia vinto, ma sono triste per Messi")

AnalyzerOutput(output=['pos', 'neg'], probas={pos: 0.827, neg: 0.988})

In [10]:
# A neutral text in Italian
analyzer.predict("Cosa è questo?")

AnalyzerOutput(output=[], probas={pos: 0.010, neg: 0.018})

### Emojis

It supports the use of emojis through the [emoji](https://pypi.org/project/emoji/) library.


In [11]:
analyzer.predict("🤢")

AnalyzerOutput(output=['neg'], probas={pos: 0.069, neg: 0.992})

## Hate Speech

`pysentimiento` also supports hate speech detection for Italian, by training models using the [HaSpeeDe@EvalITA](http://www.di.unito.it/~tutreeb/haspeede-evalita20/index.html) dataset.

In this case, we have a multi-label approach as well, where the outputs are:

- does the text contains hate speech? (`hate`)
- does the text contains a stereotype? (`stereotype`)

In [12]:
hate_speech_analyzer = create_analyzer(task="hate_speech", lang="it")

loading configuration file config.json from cache at /users/jmperez/.cache/huggingface/hub/models--pysentimiento--bert-it-hate-speech/snapshots/9a60ac39953b872bca7f904729b0c421de42ecc9/config.json
Model config BertConfig {
  "_name_or_path": "pysentimiento/bert-it-hate-speech",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "hateful",
    "1": "stereotype"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "hateful": 0,
    "stereotype": 1
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "problem_type": "multi_label_classification",
  "torch_dtype": "float32",
  "transformers_version": "4.26.0",
  "type_vocab_size"

In [18]:
hate_speech_analyzer.predict("Non voglio vedere un altro immigrato nel mio paese! Spero che li restituiscano presto in barca")

AnalyzerOutput(output=['hateful'], probas={hateful: 0.907, stereotype: 0.155})

In [19]:
hate_speech_analyzer.predict("Hai mai visto un povero ebreo? Sono tutti ricchi e avidi")

AnalyzerOutput(output=['stereotype'], probas={hateful: 0.184, stereotype: 0.898})

In [20]:
hate_speech_analyzer.predict("Tutti i musulmani portano dentro il terrorista. non puoi fidarti di loro")

AnalyzerOutput(output=['hateful', 'stereotype'], probas={hateful: 0.838, stereotype: 0.770})

## Emotion detection

We use [FEEL-IT](https://github.com/MilaNLProc/feel-it) dataset for emotion detection. This dataset is annotated with 4 emotions:

- anger
- fear
- joy
- sadness



In [30]:
emotion_analyzer = create_analyzer("emotion", "it")


loading configuration file config.json from cache at /users/jmperez/.cache/huggingface/hub/models--pysentimiento--bert-it-emotion/snapshots/24abd20917487cc870876f4e936f56d9641dfe7e/config.json
Model config BertConfig {
  "_name_or_path": "pysentimiento/bert-it-emotion",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "joy",
    "1": "anger",
    "2": "sadness",
    "3": "fear"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "anger": 1,
    "fear": 3,
    "joy": 0,
    "sadness": 2
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "torch_dtype": "float32",
  "transfo

In [31]:
emotion_analyzer.predict("Sono felice di essere qui")

AnalyzerOutput(output=joy, probas={joy: 0.972, sadness: 0.019, anger: 0.004, fear: 0.004})

In [32]:
emotion_analyzer.predict("Siamo fuori della Copa. E un giorno tristísimo..")

AnalyzerOutput(output=sadness, probas={sadness: 0.956, fear: 0.027, joy: 0.013, anger: 0.004})

## Preprocessing

`pysentimiento` features a preprocessing module with various options for manipulating hashtags, emojis, character repetition, and so on.

In [21]:
from pysentimiento.preprocessing import preprocess_tweet

preprocess_tweet("📢 @MatteoSalvini dice che \"l'Italia non è un Paese razzista\"", lang="it")

'emoji altoparlante emoji  ##user dice che "l\'Italia non è un Paese razzista"'

In [23]:
preprocess_tweet(
    "📢 @MatteoSalvini dice che \"l'Italia non è un Paese razzista\"", 
    lang="it", preprocess_handles=False, demoji=False)

'📢 @MatteoSalvini dice che "l\'Italia non è un Paese razzista"'