# pysentimiento: A multilingual toolkit for Sentiment Analysis and SocialNLP tasks

En esta notebook mostramos un breve ejemplo de cómo usar [pysentimiento](https://github.com/pysentimiento/pysentimiento/), un toolkit multilingual para extracción de opiniones y análisis de sentimientos (aunque centrado en el idioma español)

`pysentimiento` es un una librería que utiliza modelos pre-entrenados de [transformers](https://github.com/huggingface/transformers) para distintas tareas de SocialNLP. Usa como modelos bases a [BETO](https://github.com/dccuchile/beto) y [RoBERTuito](https://github.com/pysentimiento/robertuito) en Español, y BERTweet en inglés.

-- 

In this notebook we show a brief example of how to use [pysentimiento](https://github.com/pysentimiento/pysentimiento/), a multilingual toolkit for opinion mining and sentiment analysis.

`pysentimiento` is a library that uses pre-trained models of [transformers] (https://github.com/huggingface/transformers) for different SocialNLP tasks. It uses as base models [BETO] (https://github.com/dccuchile/beto) and [RoBERTuito] (https://github.com/pysentimiento/robertuito) in Spanish, and BERTweet in English.

 
First, let's install the library

In [1]:
!pip install pysentimiento

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pysentimiento
  Downloading pysentimiento-0.4.0-py3-none-any.whl (30 kB)
Collecting emoji<2.0.0,>=1.6.1
  Downloading emoji-1.7.0.tar.gz (175 kB)
[K     |████████████████████████████████| 175 kB 7.9 MB/s 
[?25hCollecting datasets<2.0.0,>=1.13.3
  Downloading datasets-1.18.4-py3-none-any.whl (312 kB)
[K     |████████████████████████████████| 312 kB 35.9 MB/s 
[?25hCollecting transformers==4.13
  Downloading transformers-4.13.0-py3-none-any.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 50.5 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101 kB 8.0 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 k

Let's create an analyzer. The `create_analyzer` receives the task and the language as parameters (currently supports "es" and "en").

In [2]:
from pysentimiento import create_analyzer
analyzer = create_analyzer(task="sentiment", lang="es")


Downloading:   0%|          | 0.00/925 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/415M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/334 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/838k [00:00<?, ?B/s]


Let's check out some examples:

Veamos algunos ejemplos:

In [3]:
analyzer.predict("Qué gran jugador es Messi")

AnalyzerOutput(output=POS, probas={POS: 0.994, NEG: 0.003, NEU: 0.003})

In [4]:
analyzer.predict("Esto es pésimo")

AnalyzerOutput(output=NEG, probas={NEG: 0.948, NEU: 0.048, POS: 0.004})

In [5]:
analyzer.predict("Qué es esto?")

AnalyzerOutput(output=NEU, probas={NEU: 0.802, NEG: 0.188, POS: 0.010})

### Predicción en batch

Si tenemos un conjunto de oraciones, `pysentimiento` hace la predicción en conjunto de manera eficiente

In [6]:
%%time
from tqdm.auto import tqdm
oraciones = [
    "Qué gran jugador es Messi",
    "Esto es pésimo",
    "No sé, cómo se llama?",    
] * 20
for sent in tqdm(oraciones):
    analyzer.predict(sent)

  0%|          | 0/60 [00:00<?, ?it/s]

CPU times: user 7.94 s, sys: 85.4 ms, total: 8.02 s
Wall time: 8.46 s


In [7]:
%%time
rets = analyzer.predict(oraciones)



  0%|          | 0/2 [00:00<?, ?ba/s]

The following columns in the test set  don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text.
***** Running Prediction *****
  Num examples = 60
  Batch size = 32


CPU times: user 1.93 s, sys: 20.3 ms, total: 1.95 s
Wall time: 2 s


### Emojis

It supports the use of emojis through the [emoji](https://pypi.org/project/emoji/) library.

Soporta también el uso de emojis

In [8]:
analyzer.predict("🤢")

AnalyzerOutput(output=NEG, probas={NEG: 0.981, NEU: 0.017, POS: 0.002})

O de hashtags

In [9]:
analyzer.predict("#EstoEsUnaMierda")

AnalyzerOutput(output=NEG, probas={NEG: 0.998, NEU: 0.001, POS: 0.001})

## Emotion Analysis

`pysentimiento` provee análisis de emociones a través de modelos pre-entrenados con los datasets de [EmoEvent](https://github.com/fmplaza/EmoEvent-multilingual-corpus/)

In [10]:
emotion_analyzer = create_analyzer(task="emotion", lang="en")

https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpncx_4g9u


Downloading:   0%|          | 0.00/999 [00:00<?, ?B/s]

storing https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/c246eed05359b1a49c45955b0265b488e35b0cbd2628e3ead7dd54c8815162ee.a2dff24b4e0a884c6d58a09968c5b68e7391e749eb698ad92541818d420fd01b
creating metadata file for /root/.cache/huggingface/transformers/c246eed05359b1a49c45955b0265b488e35b0cbd2628e3ead7dd54c8815162ee.a2dff24b4e0a884c6d58a09968c5b68e7391e749eb698ad92541818d420fd01b
loading configuration file https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/c246eed05359b1a49c45955b0265b488e35b0cbd2628e3ead7dd54c8815162ee.a2dff24b4e0a884c6d58a09968c5b68e7391e749eb698ad92541818d420fd01b
Model config RobertaConfig {
  "_name_or_path": "finiteautomata/bertweet-base-emotion-analysis",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0

Downloading:   0%|          | 0.00/515M [00:00<?, ?B/s]

storing https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/61c5894a0aca5ed63159e2ec6a5501db48124c1e6de287b82bc634334f031203.9c3c4c16d0dd174434d42471b9d4670734d982be506a06fc3111c12bee4380c7
creating metadata file for /root/.cache/huggingface/transformers/61c5894a0aca5ed63159e2ec6a5501db48124c1e6de287b82bc634334f031203.9c3c4c16d0dd174434d42471b9d4670734d982be506a06fc3111c12bee4380c7
loading weights file https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/61c5894a0aca5ed63159e2ec6a5501db48124c1e6de287b82bc634334f031203.9c3c4c16d0dd174434d42471b9d4670734d982be506a06fc3111c12bee4380c7
All model checkpoint weights were used when initializing RobertaForSequenceClassification.

All the weights of RobertaForSequenceClassification were initialized from the model checkpoint at finiteautomata/bertweet-

Downloading:   0%|          | 0.00/295 [00:00<?, ?B/s]

storing https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/1740697312c59fe96586f476c7765cd6f08516a6102ea96f22ffee64f7553234.c260b44e952f7f2a825aac395f2ebbed4ac9553800d1e320af246e81a548f37c
creating metadata file for /root/.cache/huggingface/transformers/1740697312c59fe96586f476c7765cd6f08516a6102ea96f22ffee64f7553234.c260b44e952f7f2a825aac395f2ebbed4ac9553800d1e320af246e81a548f37c
loading configuration file https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/c246eed05359b1a49c45955b0265b488e35b0cbd2628e3ead7dd54c8815162ee.a2dff24b4e0a884c6d58a09968c5b68e7391e749eb698ad92541818d420fd01b
Model config RobertaConfig {
  "_name_or_path": "finiteautomata/bertweet-base-emotion-analysis",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_to

Downloading:   0%|          | 0.00/824k [00:00<?, ?B/s]

storing https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/b7837213513a9f3852dcb04048f70c13cbd0590be030e534734ffd42cbdcf45a.f8a4dfe5c3c45a26f9df849d732decb191dc0c05ab270799695430332d143982
creating metadata file for /root/.cache/huggingface/transformers/b7837213513a9f3852dcb04048f70c13cbd0590be030e534734ffd42cbdcf45a.f8a4dfe5c3c45a26f9df849d732decb191dc0c05ab270799695430332d143982
https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/bpe.codes not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp787o27rp


Downloading:   0%|          | 0.00/1.03M [00:00<?, ?B/s]

storing https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/bpe.codes in cache at /root/.cache/huggingface/transformers/76e357e2554ebe053d1c4c613506bc2cc19d66ae27fec8218261a7f73c6456b9.75877d86011e5d5d46614d3a21757b705e9d20ed45a019805d25159b4837b0a4
creating metadata file for /root/.cache/huggingface/transformers/76e357e2554ebe053d1c4c613506bc2cc19d66ae27fec8218261a7f73c6456b9.75877d86011e5d5d46614d3a21757b705e9d20ed45a019805d25159b4837b0a4
https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/added_tokens.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpwh3p8_48


Downloading:   0%|          | 0.00/17.0 [00:00<?, ?B/s]

storing https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/added_tokens.json in cache at /root/.cache/huggingface/transformers/c4b7522f44ed8adb95e62288c6458da591654f7466e3ce2f9c730bb4087411d2.c1e7052e39d2135302ec27455f6db22e1520e6539942ff60a849c7f83f8ec6dc
creating metadata file for /root/.cache/huggingface/transformers/c4b7522f44ed8adb95e62288c6458da591654f7466e3ce2f9c730bb4087411d2.c1e7052e39d2135302ec27455f6db22e1520e6539942ff60a849c7f83f8ec6dc
https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpqk0trirb


Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

storing https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/01581144d9bf96cb9c7d8a77ee93c8b1f1095af5c1204b1b038a8cb0e3247aa8.0dc5b1041f62041ebbd23b1297f2f573769d5c97d8b7c28180ec86b8f6185aa8
creating metadata file for /root/.cache/huggingface/transformers/01581144d9bf96cb9c7d8a77ee93c8b1f1095af5c1204b1b038a8cb0e3247aa8.0dc5b1041f62041ebbd23b1297f2f573769d5c97d8b7c28180ec86b8f6185aa8
loading file https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/b7837213513a9f3852dcb04048f70c13cbd0590be030e534734ffd42cbdcf45a.f8a4dfe5c3c45a26f9df849d732decb191dc0c05ab270799695430332d143982
loading file https://huggingface.co/finiteautomata/bertweet-base-emotion-analysis/resolve/main/bpe.codes from cache at /root/.cache/huggingface/transformers/76e357e2554ebe053d1c4c613506bc2cc19d66ae27fec8218261a7f73c6456b9.75

In [11]:
emotion_analyzer.predict("This is so terrible...")

AnalyzerOutput(output=sadness, probas={sadness: 0.978, fear: 0.013, disgust: 0.003, others: 0.002, surprise: 0.002, anger: 0.001, joy: 0.001})

In [12]:
emotion_analyzer.predict("omg")

AnalyzerOutput(output=surprise, probas={surprise: 0.982, others: 0.007, fear: 0.003, joy: 0.003, sadness: 0.002, anger: 0.002, disgust: 0.001})

In [13]:
emotion_analyzer.predict("yayyyy")

AnalyzerOutput(output=joy, probas={joy: 0.879, others: 0.106, surprise: 0.005, anger: 0.005, sadness: 0.002, disgust: 0.002, fear: 0.002})

In [14]:
emotion_analyzer.predict("People in the world is really worried because of Coronavirus")

AnalyzerOutput(output=fear, probas={fear: 0.939, others: 0.043, surprise: 0.005, joy: 0.004, disgust: 0.004, sadness: 0.002, anger: 0.002})

## Hate Speech

In [15]:
hate_speech_analyzer = create_analyzer(task="hate_speech", lang="es")

https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmprsk4alm1


Downloading:   0%|          | 0.00/956 [00:00<?, ?B/s]

storing https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/c798fcd43b0ff166a11ccb606e916eed37ea5800b4dd20ff2edd880fdb8a5cbd.29b427f0b44d35a50b9f730f9e93dbf914ed3fb60537c82b311e1f4732dfdcca
creating metadata file for /root/.cache/huggingface/transformers/c798fcd43b0ff166a11ccb606e916eed37ea5800b4dd20ff2edd880fdb8a5cbd.29b427f0b44d35a50b9f730f9e93dbf914ed3fb60537c82b311e1f4732dfdcca
loading configuration file https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/c798fcd43b0ff166a11ccb606e916eed37ea5800b4dd20ff2edd880fdb8a5cbd.29b427f0b44d35a50b9f730f9e93dbf914ed3fb60537c82b311e1f4732dfdcca
Model config RobertaConfig {
  "_name_or_path": "pysentimiento/robertuito-hate-speech",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": n

Downloading:   0%|          | 0.00/415M [00:00<?, ?B/s]

storing https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/0f783b30471b13a04879e5c7f59ffb7e46b2aa6070bfc1de054e22f3730876b4.55428a0f24699e85f25e62ebe82b655dc72cfe0abf1f37e9b20c3d41978bbcf0
creating metadata file for /root/.cache/huggingface/transformers/0f783b30471b13a04879e5c7f59ffb7e46b2aa6070bfc1de054e22f3730876b4.55428a0f24699e85f25e62ebe82b655dc72cfe0abf1f37e9b20c3d41978bbcf0
loading weights file https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/0f783b30471b13a04879e5c7f59ffb7e46b2aa6070bfc1de054e22f3730876b4.55428a0f24699e85f25e62ebe82b655dc72cfe0abf1f37e9b20c3d41978bbcf0
All model checkpoint weights were used when initializing RobertaForSequenceClassification.

All the weights of RobertaForSequenceClassification were initialized from the model checkpoint at pysentimiento/robertuito-hate-speech.
If y

Downloading:   0%|          | 0.00/334 [00:00<?, ?B/s]

storing https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/f151cdb0362e2cc8c419bd5878232bea313009ac93445c0f66bf43ea764cd71c.50a2bcf7668df2ff5a82b7b0455533bb4c0db21e6e33565fa20fd7dc8a3be740
creating metadata file for /root/.cache/huggingface/transformers/f151cdb0362e2cc8c419bd5878232bea313009ac93445c0f66bf43ea764cd71c.50a2bcf7668df2ff5a82b7b0455533bb4c0db21e6e33565fa20fd7dc8a3be740
https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpo8y6krgl


Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

storing https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/29bbf6eb913fdd3d8aaabeec43896b21a574167c27d3a9a85fc9dc57eea39ace.0dc5b1041f62041ebbd23b1297f2f573769d5c97d8b7c28180ec86b8f6185aa8
creating metadata file for /root/.cache/huggingface/transformers/29bbf6eb913fdd3d8aaabeec43896b21a574167c27d3a9a85fc9dc57eea39ace.0dc5b1041f62041ebbd23b1297f2f573769d5c97d8b7c28180ec86b8f6185aa8
https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpg563f43k


Downloading:   0%|          | 0.00/838k [00:00<?, ?B/s]

storing https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/82bf0643b37bd3be5cb16b6646358a7e6c7de8829f035e79cdc14ad408ebf3ce.0843b07596b388e054bae078721182b4846b9e28a7bbf04d7079b274f8613ae3
creating metadata file for /root/.cache/huggingface/transformers/82bf0643b37bd3be5cb16b6646358a7e6c7de8829f035e79cdc14ad408ebf3ce.0843b07596b388e054bae078721182b4846b9e28a7bbf04d7079b274f8613ae3
loading file https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/29bbf6eb913fdd3d8aaabeec43896b21a574167c27d3a9a85fc9dc57eea39ace.0dc5b1041f62041ebbd23b1297f2f573769d5c97d8b7c28180ec86b8f6185aa8
loading file https://huggingface.co/pysentimiento/robertuito-hate-speech/resolve/main/tokenizer_config.json from c

In [16]:
hate_speech_analyzer.predict("Esto es una mierda pero no es odio")

AnalyzerOutput(output=[], probas={hateful: 0.022, targeted: 0.009, aggressive: 0.018})

In [17]:
hate_speech_analyzer.predict("Esto es odio porque los inmigrantes deben ser aniquilados")

AnalyzerOutput(output=['hateful'], probas={hateful: 0.835, targeted: 0.008, aggressive: 0.476})

In [18]:
hate_speech_analyzer.predict("Vaya guarra barata y de poca monta es Juana Pérez!")

AnalyzerOutput(output=['hateful', 'targeted', 'aggressive'], probas={hateful: 0.985, targeted: 0.983, aggressive: 0.973})

## Token Labeling tasks

`pysentimiento` also features POS tagging & NER analyzers

`pysentimiento` cuenta con analizadores para POS tagging & NER gracias al dataset multilingual [LinCE](https://ritual.uh.edu/lince/)


In [20]:
ner_analyzer = create_analyzer("ner", lang="es")



loading configuration file https://huggingface.co/pysentimiento/robertuito-ner/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/8e3442ee4063bdccf546d4da801bf43f770896561e78239261330822f60aa87d.066f1f125c1e097cd6f10758c65624898eef2e3bbeda61f530be70380009c5a2
Model config RobertaConfig {
  "_name_or_path": "pysentimiento/robertuito-ner",
  "architectures": [
    "RobertaForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "O",
    "1": "B-EVENT",
    "2": "I-EVENT",
    "3": "B-GROUP",
    "4": "I-GROUP",
    "5": "B-LOC",
    "6": "I-LOC",
    "7": "B-ORG",
    "8": "I-ORG",
    "9": "B-OTHER",
    "10": "I-OTHER",
    "11": "B-PER",
    "12": "I-PER",
    "13": "B-PROD",
    "14": "I-PROD",
    "15": "B-TIME",
    "16": "I-TIME",
    "17":

[{'end': 43,
  'score': 0.9965726,
  'start': 22,
  'text': 'República Dominicana',
  'type': 'LOC',
  'word': 'república dominicana'}]

In [21]:
ner_analyzer.predict("Me voy de vacaciones a República Dominicana 😎")

[{'end': 43,
  'score': 0.9965726,
  'start': 22,
  'text': 'República Dominicana',
  'type': 'LOC',
  'word': 'república dominicana'}]

In [24]:
ner_analyzer.predict("Me llamo Juan Manuel Pérez y vivo en 🇦🇷😎")

[{'end': 26,
  'score': 0.9978905,
  'start': 8,
  'text': 'Juan Manuel Pérez',
  'type': 'PER',
  'word': 'juan manuel pérez'},
 {'end': 61,
  'score': 0.9908846,
  'start': 51,
  'text': 'argentina',
  'type': 'LOC',
  'word': 'argentina'}]

In [25]:
pos_tagger = create_analyzer("pos", "es")

https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp16gmw4ng


Downloading:   0%|          | 0.00/1.36k [00:00<?, ?B/s]

storing https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/6b0540fedc4421540d640fa64d44dddd3b6847a6040973679cd2d550b6d82f9d.4389d5405652125a0dab1e98dd1034d53caf5fa140c75c4e5ab3906bcc6fa2f4
creating metadata file for /root/.cache/huggingface/transformers/6b0540fedc4421540d640fa64d44dddd3b6847a6040973679cd2d550b6d82f9d.4389d5405652125a0dab1e98dd1034d53caf5fa140c75c4e5ab3906bcc6fa2f4
loading configuration file https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b0540fedc4421540d640fa64d44dddd3b6847a6040973679cd2d550b6d82f9d.4389d5405652125a0dab1e98dd1034d53caf5fa140c75c4e5ab3906bcc6fa2f4
Model config RobertaConfig {
  "_name_or_path": "pysentimiento/robertuito-pos",
  "architectures": [
    "RobertaForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
 

Downloading:   0%|          | 0.00/413M [00:00<?, ?B/s]

storing https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/b2daad9c5a1bb770c17cae02cc724cf68bba5f8ec469aea81b141b591588d02e.2d7c3a5171c7f9293d970135dceb0f5a5f45930123ca627c73ea4a5981ba9d5b
creating metadata file for /root/.cache/huggingface/transformers/b2daad9c5a1bb770c17cae02cc724cf68bba5f8ec469aea81b141b591588d02e.2d7c3a5171c7f9293d970135dceb0f5a5f45930123ca627c73ea4a5981ba9d5b
loading weights file https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/b2daad9c5a1bb770c17cae02cc724cf68bba5f8ec469aea81b141b591588d02e.2d7c3a5171c7f9293d970135dceb0f5a5f45930123ca627c73ea4a5981ba9d5b
All model checkpoint weights were used when initializing RobertaForTokenClassification.

All the weights of RobertaForTokenClassification were initialized from the model checkpoint at pysentimiento/robertuito-pos.
If your task is similar to the tas

Downloading:   0%|          | 0.00/330 [00:00<?, ?B/s]

storing https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/3e0bdb5a2179219ba00c5a2d087da7c61e205c6a7aa01e51904537f4f7139992.39f9e412daff59401be8139c0b1a64506a1bb71c8b32274d02b368a06d684a04
creating metadata file for /root/.cache/huggingface/transformers/3e0bdb5a2179219ba00c5a2d087da7c61e205c6a7aa01e51904537f4f7139992.39f9e412daff59401be8139c0b1a64506a1bb71c8b32274d02b368a06d684a04
https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpy7t7vrtp


Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

storing https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/6c7c31b3591e7320eba8b87ea03709dec92dfd68194017dfc1c94734857b5500.0dc5b1041f62041ebbd23b1297f2f573769d5c97d8b7c28180ec86b8f6185aa8
creating metadata file for /root/.cache/huggingface/transformers/6c7c31b3591e7320eba8b87ea03709dec92dfd68194017dfc1c94734857b5500.0dc5b1041f62041ebbd23b1297f2f573769d5c97d8b7c28180ec86b8f6185aa8
https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpa3qjvluc


Downloading:   0%|          | 0.00/809k [00:00<?, ?B/s]

storing https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/fc824f5183534d412530d835b97f1bae6f6f4650920001edada83ab6e8b7e1ba.6b3860d12a12d30eaa25b278e7b6d6bb0cb9ad270ef595b9bc9dfecad0e5957a
creating metadata file for /root/.cache/huggingface/transformers/fc824f5183534d412530d835b97f1bae6f6f4650920001edada83ab6e8b7e1ba.6b3860d12a12d30eaa25b278e7b6d6bb0cb9ad270ef595b9bc9dfecad0e5957a
loading file https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/6c7c31b3591e7320eba8b87ea03709dec92dfd68194017dfc1c94734857b5500.0dc5b1041f62041ebbd23b1297f2f573769d5c97d8b7c28180ec86b8f6185aa8
loading file https://huggingface.co/pysentimiento/robertuito-pos/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface

In [27]:
pos_tagger.predict("Me llamo Juan Manuel Pérez y vivo en Argentina")

[{'end': 2,
  'score': 0.9308241,
  'start': 0,
  'text': 'Me',
  'type': 'PRON',
  'word': 'Me'},
 {'end': 8,
  'score': 0.99886525,
  'start': 2,
  'text': 'llamo',
  'type': 'VERB',
  'word': 'llamo'},
 {'end': 13,
  'score': 0.9999205,
  'start': 8,
  'text': 'Juan',
  'type': 'PROPN',
  'word': 'Juan'},
 {'end': 20,
  'score': 0.99989665,
  'start': 13,
  'text': 'Manuel',
  'type': 'PROPN',
  'word': 'Manuel'},
 {'end': 26,
  'score': 0.99976057,
  'start': 20,
  'text': 'Pérez',
  'type': 'PROPN',
  'word': 'Pérez'},
 {'end': 28,
  'score': 0.9998472,
  'start': 26,
  'text': 'y',
  'type': 'CONJ',
  'word': 'y'},
 {'end': 33,
  'score': 0.9996673,
  'start': 28,
  'text': 'vivo',
  'type': 'VERB',
  'word': 'vivo'},
 {'end': 36,
  'score': 0.99951565,
  'start': 33,
  'text': 'en',
  'type': 'ADP',
  'word': 'en'},
 {'end': 46,
  'score': 0.9998746,
  'start': 36,
  'text': 'Argentina',
  'type': 'PROPN',
  'word': 'Argentina'}]

## Preprocessing

`pysentimiento` tiene un módulo de preprocesamiento de tweets con varias opciones para manipular hashtags, emojis, repetición de caracteres y demás.

In [None]:
from pysentimiento.preprocessing import preprocess_tweet

preprocess_tweet("📢 @realDonaldTrump ha sido banneado de Twitter #BreakingNews")

'emoji altavoz de mano emoji  @usuario ha sido banneado de Twitter breaking news'