# Lab3.4 Sentiment Classification using transformer models

Copyright: Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

This notebook explains how you can use a transformer model that is fine-tuned for sentiment analysis. Fine-tuned transformer models are published regularly on the huggingface platform: https://huggingface.co/models

These models are very big (Gigabytes) and require a computer with sufficient memory to load. Furthermore, loading these models takes some time as well. It is also possible to copy such a model to your disk and to load the local copy. Still a substantial memory is needed to load it.

Huggingface transfomers provides an option to create an **pipeline** to perform a NLP task with a pretrained model: 

"The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering."

More information can be found here: https://huggingface.co/transformers/v3.0.2/main_classes/pipelines.html

We will use the pipeline module to load a fine-tuned model to perform sentiment analysis

In [2]:
from transformers import pipeline

We load a transformer model 'distilbert-base-uncased-finetuned-sst-2-english' that is fine-tuned for binary classification from the Hugging face repository:

https://huggingface.co/models

We need to load the model for the sequence classifcation and the tokenizer to convert the sentences into tokens according to the vocabulary of the model.

Loading the model takes some time.

In [4]:
sentimentenglish = pipeline("sentiment-analysis", 
                            model="distilbert-base-uncased-finetuned-sst-2-english", 
                            tokenizer="distilbert-base-uncased-finetuned-sst-2-english")

We now created an instantiation of a pipeline that can tokenize any sentence, obtain a sententence embedding from the transformer language model and perform the **sentiment-analysis** task. Let's try it out on an example sentence.

In [5]:
sentence_pos_en = "Nice hotel and the service is great"

In [6]:
sentimentenglish(sentence_pos_en)

[{'label': 'POSITIVE', 'score': 0.9998814463615417}]

In [7]:
sentence_neg_en = "The rooms are dirty and the wifi does not work"

In [8]:
sentimentenglish(sentence_neg_en)

[{'label': 'NEGATIVE', 'score': 0.9997869729995728}]

This is easy and seems to work very well. 

## Using a Dutch fine-tuned transformer model

We can use a fine-tuned Dutch model for Dutch sentiment analysis by creating another pipeline. Again loading this model takes some time. Also note that after loading, both models are loaded in memory. So if you have issues loading, you may want to start over and try again just with the Dutch pipeline.

In [10]:
sentimentdutch = pipeline("sentiment-analysis", 
                          model="wietsedv/bert-base-dutch-cased-finetuned-sentiment", 
                          tokenizer="wietsedv/bert-base-dutch-cased-finetuned-sentiment")

We test it on two similar Dutch sentences:

In [11]:
sentence_pos_nl="Mooi hotel en de service is geweldig"
sentence_neg_nl="De kamers zijn smerig en de wifi doet het niet"

In [12]:
sentimentdutch(sentence_pos_nl)

[{'label': 'pos', 'score': 0.9999955892562866}]

In [13]:
sentimentdutch(sentence_neg_nl)

[{'label': 'neg', 'score': 0.6675440073013306}]

This seems to work fine too although the score for negative in the second example is much lower.

# End of this notebook