# Lab4.4 GO Emotion Classification using a fine-tuned BERT model

Copyright: Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

This notebook shows how you can use a BERT model that was fine-tuned for emotion detection using the GO dataset from Google.
The GO dataset consists of 8k English Reddit comments, labeled for 27 emotion categories or Neutral:

https://github.com/google-research/google-research/tree/master/goemotions

REFERENCE:
Demszky, Dorottya, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, and Sujith Ravi. "GoEmotions: A dataset of fine-grained emotions." arXiv preprint arXiv:2005.00547 (2020).


The GO emotions are more nuanced than the six basic Ekman emotions derived from facial expressions. They are based on a diverse range of emotion data and not just facial expression:

https://ai.googleblog.com/2021/10/goemotions-dataset-for-fine-grained.html


```
LABELS = [
    'admiration',
    'amusement',
    'anger',
    'annoyance',
    'approval',
    'caring',
    'confusion',
    'curiosity',
    'desire',
    'disappointment',
    'disapproval',
    'disgust',
    'embarrassment',
    'excitement',
    'fear',
    'gratitude',
    'grief',
    'joy',
    'love',
    'nervousness',
    'optimism',
    'pride',
    'realization',
    'relief',
    'remorse',
    'sadness',
    'surprise',
    'neutral',
]```


We will load a BERT language model that is finetuned for emotion detection using this *go_emotions* data set. The **pipeline** task for using this model is *sentiment-analysis*. The pipeline **sentiment-analysis** should be understood here as a name for the type of classification that is carried out. In this case **text classification** in which is a label is assigned to the text as a whole and not to individual tokens. The **sentiment-analysis** pipeline can be used as an alias for using any model that is fine-tuned for text classification, regardless of the labels that have been used. The labels themselves do not carry any meaning for the model.

Furthermore, we will use a parameter to either get the highest scoring label or all labels with a score.
Below we set the parameter *return_all_scores* to True to get a score over all labels.

In [1]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
model_name = "bhadresh-savani/bert-base-go-emotion" 
emotion_pipeline = pipeline('sentiment-analysis', 
                    model=model_name, return_all_scores=True, truncation=True)

We now created an instance *emotion_pipeline* of a transformer pipeline in analogy of an sentiment analysis classification task that we can apply to any utterance. The pipeline will use the tokenizer of the finetuned model and feed the sentence representation to the classifier as a sequence of contextualized token representations.

In [3]:
emotion_labels = emotion_pipeline("Thanks for using it.")

In [4]:
for result in emotion_labels[0]:
    print(result)

{'label': 'admiration', 'score': 0.0007500767824240029}
{'label': 'amusement', 'score': 0.00011047106818296015}
{'label': 'anger', 'score': 9.69246102613397e-05}
{'label': 'annoyance', 'score': 0.0002597436250653118}
{'label': 'approval', 'score': 0.0011426006676629186}
{'label': 'caring', 'score': 0.00030970710213296115}
{'label': 'confusion', 'score': 0.00014959769032429904}
{'label': 'curiosity', 'score': 0.00015838850231375545}
{'label': 'desire', 'score': 0.0001385686337016523}
{'label': 'disappointment', 'score': 0.00016352151578757912}
{'label': 'disapproval', 'score': 0.00020030527957715094}
{'label': 'disgust', 'score': 5.9684312873287126e-05}
{'label': 'embarrassment', 'score': 5.588319982052781e-05}
{'label': 'excitement', 'score': 0.00018467512563802302}
{'label': 'fear', 'score': 5.239497113507241e-05}
{'label': 'gratitude', 'score': 0.9934592247009277}
{'label': 'grief', 'score': 2.022587250394281e-05}
{'label': 'joy', 'score': 0.0003203645464964211}
{'label': 'love', 'sc

Although the GO emotions are a lot more nuanced than the Ekman emotions, it is possible to map the more specific emotions to Ekman's more basic ones and even to sentiments. The next mapping from the original Github of goemotions just do that:

In [6]:
### Mapping GO_Emotions to sentiment values
sentiment_map={
"positive": ["amusement", "excitement", "joy", "love", "desire", "optimism", "caring", "pride", "admiration", "gratitude", "relief", "approval"],
"negative": ["fear", "nervousness", "remorse", "embarrassment", "disappointment", "sadness", "grief", "disgust", "anger", "annoyance", "disapproval"],
"ambiguous": ["realization", "surprise", "curiosity", "confusion"]
}

### Mapping GO_Emotions to Ekman values
ekman_map={
"anger": ["anger", "annoyance", "disapproval"],
"disgust": ["disgust"],
"fear": ["fear", "nervousness"],
"joy": ["joy", "amusement", "approval", "excitement", "gratitude",  "love", "optimism", "relief", "pride", "admiration", "desire", "caring"],
"sadness": ["sadness", "disappointment", "embarrassment", "grief",  "remorse"],
"surprise": ["surprise", "realization", "confusion", "curiosity"],
"neutral": ["neutral"]
}

We can now make a few simple auxiliary functions that can translate GO emotions to Ekman or to sentiment values. The functions sort the predictions by their score and select the scores above a certain threshold. Only the ones above the threshold are mapped.

In [7]:
### Sort a list of results in JSON format by the value of the score element
def sort_predictions(predictions):
    return sorted(predictions, key=lambda x: x['score'], reverse=True)


### Use a mapping to get a dictionary of the mapped GO_emotion scores above the thereshold
def get_mapped_scores(emotion_map, go_emotion_scores, threshold):
    mapped_scores = {}

    for prediction in go_emotion_scores[0]:
        if prediction['score']>=threshold:
            go_emotion=prediction['label']
            for key in emotion_map:
                if go_emotion in emotion_map[key]:
                    if not key in mapped_scores:
                        mapped_scores[key]= [prediction['score']]
                    else:
                        mapped_scores[key].append(prediction['score'])
    return mapped_scores

### Get the averaged score for an emotion or sentiment from the GO_emotion scores above a threshold
### mapped according to the emotion_map
def get_averaged_mapped_scores_by_threshold(emotion_map, go_emotion_scores, threshold):
    averaged_mapped_scores = []
    mapped_scores = get_mapped_scores(emotion_map, go_emotion_scores, threshold)
    for emotion in mapped_scores:
        lst = mapped_scores[emotion]
        averaged_score= sum(lst)/len(lst)
        averaged_mapped_scores.append({'label':emotion, 'score':averaged_score})
    return sort_predictions(averaged_mapped_scores)

Using these function, we can print the averaged Ekman and the averaged sentiment score for any GO emotion classifition result:

In [13]:
threshold = 0.05
print('Threshold', threshold)
ekman_labels = get_averaged_mapped_scores_by_threshold(ekman_map, emotion_labels, threshold)
for ekman in ekman_labels:
    print(ekman)

print()
threshold = 0.0001
print('Threshold', threshold)
ekman_labels = get_averaged_mapped_scores_by_threshold(ekman_map, emotion_labels, threshold)
for ekman in ekman_labels:
    print(ekman)

Threshold 0.05
{'label': 'joy', 'score': 0.9934592247009277}

Threshold 0.0001
{'label': 'joy', 'score': 0.09065005526248239}
{'label': 'neutral', 'score': 0.000802881782874465}
{'label': 'anger', 'score': 0.00023002445232123137}
{'label': 'surprise', 'score': 0.00021510607984964736}
{'label': 'sadness', 'score': 0.00016352151578757912}


In [14]:
threshold = 0.05
print('Threshold', threshold)
sentiment_scores = get_averaged_mapped_scores_by_threshold(sentiment_map, emotion_labels, threshold)
for sentiment in sentiment_scores:
    print(sentiment)

print()
threshold = 0.0001
print('Threshold', threshold)
sentiment_scores = get_averaged_mapped_scores_by_threshold(sentiment_map, emotion_labels, threshold)
for sentiment in sentiment_scores:
    print(sentiment)

Threshold 0.05
{'label': 'positive', 'score': 0.9934592247009277}

Threshold 0.0001
{'label': 'positive', 'score': 0.09065005526248239}
{'label': 'ambiguous', 'score': 0.00021510607984964736}
{'label': 'negative', 'score': 0.00020785680681001395}


# End of this notebook

In [None]:
ßß