# Lab3.4 Sentiment Classification using transformer models

Copyright: Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

This notebook explains how you can use a transformer model that is fine-tuned for sentiment analysis. Fine-tuned transformer models are published regularly on the huggingface platform: https://huggingface.co/models

These models are very big (Gigabytes) and require a computer with sufficient memory to load. Furthermore, loading these models takes some time as well. It is also possible to copy such a model to your disk and to load the local copy. Still a substantial memory is needed to load it.

Huggingface transfomers provides an option to create an **pipeline** to perform a NLP task with a pretrained model: 

"The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering."

More information can be found here: https://huggingface.co/transformers/v3.0.2/main_classes/pipelines.html

We will use the pipeline module to load a fine-tuned model to perform sentiment analysis

In [2]:
from transformers import pipeline

We load a transformer model 'distilbert-base-uncased-finetuned-sst-2-english' that is fine-tuned for binary classification from the Hugging face repository:

https://huggingface.co/models

We need to load the model for the sequence classifcation and the tokenizer to convert the sentences into tokens according to the vocabulary of the model.

Loading the model takes some time.

## Using an English fine-tuned transformer model

In [4]:
sentimentenglish = pipeline("sentiment-analysis", 
                            model="distilbert-base-uncased-finetuned-sst-2-english", 
                            tokenizer="distilbert-base-uncased-finetuned-sst-2-english")

We now created an instantiation of a pipeline that can tokenize any sentence, obtain a sententence embedding from the transformer language model and perform the **sentiment-analysis** task. Let's try it out on an example sentence.

In [5]:
sentence_pos_en = "Nice hotel and the service is great"

In [6]:
sentimentenglish(sentence_pos_en)

[{'label': 'POSITIVE', 'score': 0.9998814463615417}]

In [7]:
sentence_neg_en = "The rooms are dirty and the wifi does not work"

In [8]:
sentimentenglish(sentence_neg_en)

[{'label': 'NEGATIVE', 'score': 0.9997869729995728}]

This is easy and seems to work very well. 

## Using a Dutch fine-tuned transformer model

We can use a fine-tuned Dutch model for Dutch sentiment analysis by creating another pipeline. Again loading this model takes some time. Also note that after loading, both models are loaded in memory. So if you have issues loading, you may want to start over and try again just with the Dutch pipeline.

In [10]:
sentimentdutch = pipeline("sentiment-analysis", 
                          model="wietsedv/bert-base-dutch-cased-finetuned-sentiment", 
                          tokenizer="wietsedv/bert-base-dutch-cased-finetuned-sentiment")

We test it on two similar Dutch sentences:

In [11]:
sentence_pos_nl="Mooi hotel en de service is geweldig"
sentence_neg_nl="De kamers zijn smerig en de wifi doet het niet"

In [12]:
sentimentdutch(sentence_pos_nl)

[{'label': 'pos', 'score': 0.9999955892562866}]

In [13]:
sentimentdutch(sentence_neg_nl)

[{'label': 'neg', 'score': 0.6675440073013306}]

This seems to work fine too although the score for negative in the second example is much lower.

##  BERT Finetuned for emotion detection with GO dataset

We will now load the language model BERT that is finetuned for emotion detection using the *go_emotions* data set. Go_emotions has 28 nuanced emotion labels including neutral, so many more than the basic Ekman emotion that we have seen before. 

We will load the finetuned model from the huggingface.co platform as part of a so-called transformer *pipeline*. Pipelines are predefined NLP tasks that deploy a trained model for a specific type of task. See the website for an overview of the different pipelines defined by huggingface.co:

https://huggingface.co/docs/transformers/main_classes/pipelines

The pipelines are abstractions from specific task such as sentiment-analysis and entity recognition. In the case of sentiment-analysis, the complete sentence representation of the model is taken as the input and classified for the the defined labels. In the case of entity recognition, each token in a sentence is classified separately in a sequence, i.e. a sequence classification task. Whereas a finetuned model can be used for a task depends on the way it was fine tuned with labeled data. 

We will define a *sentiment-analysis* pipeline and load the BERT model that was finetuned to classify sentences with the 28 GO_EMOTION labels. It will return a score for all the labels when we set the parameter *return_all_scores* to True.

In [3]:
model_name = "bhadresh-savani/bert-base-go-emotion" 
emotion_pipeline = pipeline('sentiment-analysis', 
                    model=model_name, return_all_scores=True, truncation=True)

We now created an instance *emotion* of a transformer pipeline in analogy of an sentiment analysis classification task that we can apply to any utterance. The pipeline will use the tokenizer of the finetuned model and feed the sentence representation to the classifier as a sequence of contextualized token representations.

In [6]:
emotion_labels = emotion_pipeline("Thanks for using it.")
for result in emotion_labels[0]:
    print(result)

{'label': 'admiration', 'score': 0.0007500764913856983}
{'label': 'amusement', 'score': 0.00011047106818296015}
{'label': 'anger', 'score': 9.69245083979331e-05}
{'label': 'annoyance', 'score': 0.0002597433340270072}
{'label': 'approval', 'score': 0.0011426000855863094}
{'label': 'caring', 'score': 0.00030970710213296115}
{'label': 'confusion', 'score': 0.00014959769032429904}
{'label': 'curiosity', 'score': 0.00015838834224268794}
{'label': 'desire', 'score': 0.0001385686337016523}
{'label': 'disappointment', 'score': 0.00016352151578757912}
{'label': 'disapproval', 'score': 0.00020030527957715094}
{'label': 'disgust', 'score': 5.9684312873287126e-05}
{'label': 'embarrassment', 'score': 5.588319982052781e-05}
{'label': 'excitement', 'score': 0.00018467492191120982}
{'label': 'fear', 'score': 5.239497113507241e-05}
{'label': 'gratitude', 'score': 0.9934592247009277}
{'label': 'grief', 'score': 2.022587250394281e-05}
{'label': 'joy', 'score': 0.0003203642263542861}
{'label': 'love', 'sc

# End of this notebook