# STAT 441 Project Proposal - Self Attention LSTM Implementation

# Problem Statement

This project aims to implement this technique described in this paper to classify happy, sad, angry and other from a conversational dataset.

# Data Set Format

The training dataset contains 5 columns with unique ID, turn 1, turn 2, turn 3 and label, where turn 2 is the reponse to turn 1 and turn 3 is the response to turn 2. The dev set and the test set contain the first four columns, but the "label" is absent. The proportion of each happy, sad and angry takes 18.2%, 14.1% and 18.1% of the train dataset, and the rest is "others". However, the proportion of each category takes 5% in dev set.

Data provided by Microsoft for SemEval2019 Task 3 - EmoContext.

In [1]:
import pandas as pd

In [2]:
!ls

Email Confirmation.pdf EmoContext.ipynb       [34mdata[m[m


In [33]:
df = pd.read_csv("train.txt", sep="\t")
df_dev = pd.read_csv("dev.txt", sep="\t")

In [34]:
df

Unnamed: 0,id,turn1,turn2,turn3,label
0,0,Don't worry I'm girl,hmm how do I know if you are,What's ur name?,others
1,1,When did I?,saw many times i think -_-,No. I never saw you,angry
2,2,By,by Google Chrome,Where you live,others
3,3,U r ridiculous,I might be ridiculous but I am telling the truth.,U little disgusting whore,angry
4,4,Just for time pass,wt do u do 4 a living then,Maybe,others
...,...,...,...,...,...
30155,30155,I don't work,I could take your shift,I am a student,others
30156,30156,I'm not getting you 😭😭😭,Why are you crying??,Because you are not making any sense,sad
30157,30157,Haha,"no, seriously. What is up with that o-o",Had your breakfast?,others
30158,30158,Do you sing?,yea a lil,Nice,others


It's not hard to see that the data contains emoji, text faces, slangs and typos.

The proportion of others takes about 50% of the data, and other emotions are distributed roughly even.

In [35]:
df.groupby("label").count()[["id"]]/len(df)

Unnamed: 0_level_0,id
label,Unnamed: 1_level_1
angry,0.18256
happy,0.140683
others,0.495623
sad,0.181134


In [36]:
df_dev.groupby("label").count()[["id"]]/len(df_dev)

Unnamed: 0_level_0,id
label,Unnamed: 1_level_1
angry,0.054446
happy,0.051543
others,0.848639
sad,0.045372


# Plan of Attack

## Stage 1 - Data Cleaning
The mis-spelled words and abbreviations will be handled through regular expression. The regex script from [this](https://github.com/iamgroot42/nelec/blob/master/regex.py) will be helpful to achieve this task. The words will be processed with pretrained embeddings from [GloVe](https://nlp.stanford.edu/projects/glove/), which is a pretrained word embeddings that reveals the semantic similarity between words. [ELMo](https://arxiv.org/abs/1802.05365), a pretrained embedding that adapts to the different meanings of words under different context, will also be used to enhance the emotion classification accuracy. The emoji and text faces will be handled by [DeepMoji](https://www.aclweb.org/anthology/D17-1169/)

During this stage, the word will also be transformed into a word matrix. Each word will have a vector that represents the feature of this word. Similar words will be close together in the high dimensional space.


## Stage 2 - Model Training

Due to the restrictions on computin power, I decided to implement one of their baseline model - 
[SA-LSTM](https://arxiv.org/pdf/1904.00132.pdf).

The model first encode the all the words appreared in the conversation into a vector $x$, and the GloVe will be applied on to this vector as $G(x)$. The concatenated conversation will be directly passed to the ElMo model as $E(x)$. Then, a two-layer bi-directional LSTM encoder will be used ona concatenation of $G(x)$ and $E(x)$ as a hidden state, $h_t^e = LSTM^e([G(x_t); E_t], h_{t-1}^e)$. Then, a self-attention model is added to the model $h_x^ {sa} = SA(h_x^e)$. In the end, a fully connected layer is appended to the projected emotion space, $output_x^{SL} = output(h_x^{sa})$, then an $argmax$ function is applied to find out which emotion this model predicts

## Stage 3 - Evaluation and Conclusion

The result of the model will be evaluated as follows.

for Classification
1. Precision
2. Recall
3. F1 Score

I will constructed a [multi-class confusion matrix](https://www.youtube.com/watch?v=FAr2GmWNbT0) which looks like


|         | Anger | Sad     |Happy | ... |
|---------|-------|---------|-----|-----|
| Anger   | Tpa   | Eas     | Eha | ... |
| Sad     | Eas   | Tps     | Ehs | ... |
| Happy   | Eah   | Esh     | Tph | ... |
| ...     | ...   | ...     | ... | ... |

**Example Precision**

The precision of Anger is Pa = Tpa/(Eas+Eja+...)

**Example Recall**

The recall of Anger is Ra = Tpa/(Tpa+Eas+Eja+...)

**Example F1**

The F1 score of Anger is 2\*PaRa/(Pa+Ra)

