# Lab5 Final assignment: putting things together

The final assignment is an individual assignment. You need to have an emotional conversation with Eliza

## 1. Loading a conversation saved on disk

Using the notebook **eliza-chat.ipynb**, you can create a conversation with Eliza and save it to disk. For this final assignment, we ask each group 

In [6]:
import pandas as pd

In [5]:
file = 'conversation.csv'
df = pd.read_csv(file)
df.head()

Unnamed: 0.1,Unnamed: 0,utterance,speaker,turn_id
0,0,Hello Piek. How are you feeling today?,Eliza,1
1,1,I am sad,Piek,1
2,2,How do you feel about being sad?,Eliza,1
3,3,Bad,Piek,1
4,4,How do you feel when you say that?,Eliza,1


In [None]:
# 2. Annotate the data with an emotion

## 3. BERT Finetuned for emotion detection with GO dataset

We will now load the language model BERT that is finetuned for emotion detection using the *go_emotions* data set. Go_emotions has 28 nuanced emotion labels including neutral, so many more than the basic Ekman emotion that we have seen before. 

We will load the finetuned model from the huggingface.co platform as part of a so-called transformer *pipeline*. Pipelines are predefined NLP tasks that deploy a trained model for a specific type of task. See the website for an overview of the different pipelines defined by huggingface.co:

https://huggingface.co/docs/transformers/main_classes/pipelines

The pipelines are abstractions from specific task such as sentiment-analysis and entity recognition. In the case of sentiment-analysis, the complete sentence representation of the model is taken as the input and classified for the the defined labels. In the case of entity recognition, each token in a sentence is classified separately in a sequence, i.e. a sequence classification task. Whereas a finetuned model can be used for a task depends on the way it was fine tuned with labeled data. 

We will define a *sentiment-analysis* pipeline and load the BERT model that was finetuned to classify sentences with the 28 GO_EMOTION labels. It will return a score for all the labels when we set the parameter *return_all_scores* to True.

In [4]:
#import numpy as np
from transformers import pipeline

In [8]:
model_name = "bhadresh-savani/bert-base-go-emotion" 
emotion = pipeline('sentiment-analysis', 
                    model=model_name, return_all_scores=True, truncation=True)

We now created an instance *emotion* of a transformer pipeline in analogy of an sentiment analysis classification task that we can apply to any utterance. The pipeline will use the tokenizer of the finetuned model and feed the sentence representation to the classifier as a sequence of contextualized token representations.

In [9]:
emotion_labels = emotion("Thanks for using it.")
print(emotion_labels[0])

[{'label': 'admiration', 'score': 0.0007500764913856983}, {'label': 'amusement', 'score': 0.00011047106818296015}, {'label': 'anger', 'score': 9.69245083979331e-05}, {'label': 'annoyance', 'score': 0.0002597433340270072}, {'label': 'approval', 'score': 0.0011426000855863094}, {'label': 'caring', 'score': 0.00030970710213296115}, {'label': 'confusion', 'score': 0.00014959769032429904}, {'label': 'curiosity', 'score': 0.00015838834224268794}, {'label': 'desire', 'score': 0.0001385686337016523}, {'label': 'disappointment', 'score': 0.00016352151578757912}, {'label': 'disapproval', 'score': 0.00020030527957715094}, {'label': 'disgust', 'score': 5.9684312873287126e-05}, {'label': 'embarrassment', 'score': 5.588319982052781e-05}, {'label': 'excitement', 'score': 0.00018467492191120982}, {'label': 'fear', 'score': 5.239497113507241e-05}, {'label': 'gratitude', 'score': 0.9934592247009277}, {'label': 'grief', 'score': 2.022587250394281e-05}, {'label': 'joy', 'score': 0.0003203642263542861}, {'

## 3. Applying emotion classification to Eliza conversations

In the next part, we will apply the GO_EMOTION classifier *emotion* to the conversation that we stored in a Pandas frame. We will also map the GO_EMOTIONS to the 6 basic Ekman emotion and to neutral as well as to sentiment values. For the mappings, we defined a few simple utility functions in the next cell. We also define a sort function to list the emotions from the highest score down.

In [10]:
import lab5_util as util

In [11]:
sentiment_emotions = []
sentiment_scores = []
ekman_emotions = []
ekman_scores = []
go_emotions = []
go_scores = []

for index, utterance in enumerate(df['utterance']):
    emotion_labels = emotion(utterance)
    sorted_emotion_labels = sort_predictions(emotion_labels[0])
    go_emotions.append(sorted_emotion_labels[0]['label'])
    go_scores.append(sorted_emotion_labels[0]['score'])

    ekman_labels = get_averaged_mapped_scores(ekman, emotion_labels)
    ekman_emotions.append(ekman_labels[0]['label'])
    ekman_scores.append(ekman_labels[0]['score'])

    sentiment_labels = get_averaged_mapped_scores(sentiment, emotion_labels)
    sentiment_emotions.append(sentiment_labels[0]['label'])
    sentiment_scores.append(sentiment_labels[0]['score'])


df['Sentiment']=sentiment_emotions
df['SentimentScore']=sentiment_scores
df['Ekman']=ekman_emotions
df['EkmanScore']=ekman_scores
df['Go']=go_emotions
df['GoScore']=go_scores
df.head()

Unnamed: 0,utterance,speaker,turn_id,Sentiment,SentimentScore,Ekman,EkmanScore,Go,GoScore
0,Hello Piek. How are you feeling today?,Eliza,1,ambiguous,0.126107,neutral,0.287125,curiosity,0.330824
1,I am bored,Piek,1,negative,0.067721,neutral,0.13914,annoyance,0.167206
2,Did you come to me because you are bored?,Eliza,1,ambiguous,0.134255,neutral,0.249972,curiosity,0.383398
3,Yes,Piek,1,positive,0.029085,neutral,0.611555,neutral,0.611555
4,You seem quite sure.,Eliza,1,positive,0.059274,neutral,0.19611,approval,0.570112


In [2]:
file = "conversation_with_emotion.csv"
df.to_csv(file)

## End of Notebook