# Using LSTM Network With Attention To Recognize Emotions
In this notebook, we will build an emotion classifier based on LSTMs and Attention mechanism using Keras library and a publicly available [dataset](https://github.com/huseinzol05/NLP-Dataset/tree/master/emotion-english).

Recurrent neural networks, LSTM and GRU in particular,  are widely used in many natural language processing applications such as classification and language modeling. Attention mechanism is also very popular in these days especially in machine translation where the words in source and target sentences need to be aligned.  
In classification tasks (such as emotion recognition), the input words are processed by LSTM networks sequentially and the last output of the LSTM represents the meaning of sentence. However,  in an attention mechanism, weighted average is taken over the outputs in each time step. The model learns to how to generate the weigths according to input sequence. In this way, the model learns where to attend in input sentence. This is very useful when you are tranlating a sentence. The translation model attends a position in source language while generating each words in target language.


![Attention Mechanism](https://image.ibb.co/iJ7WRL/attention.jpg)
**Attention Mechanism**

*Picture is taken from [Feed-forward networks with attention can solve some long-term memory problems](https://arxiv.org/pdf/1512.08756)*


In this study, we will see attention mechanism can be useful for classification tasks as well. The prediction of the model can be interpretable with attention because we can highlight the attended words to understand why the model makes these predictions. As we will see in the last chapter, the outputs will look more fascinating.

![](https://preview.ibb.co/exVpgL/Capture.png)

by [Eray Yildiz](https://twitter.com/erayildiz)

## Emotion Dataset
In this notebook, we are working on an emotion classification dataset which contains tweets labeled into 6 categories (joy, sadness, anger, fear, love, surprise).

### Let's start exploring the dataset

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


#Loading the dataset
dataset = pd.read_csv("../input/emotion.data")

In [None]:
# Plot label histogram
dataset.emotions.value_counts().plot.bar()

In [None]:
# Prin some samples
dataset.head(10)

## Preparing data for model training
### Tokenization
Since the data is already tokenized and lowercased, we just need to split the words


In [None]:
input_sentences = [text.split(" ") for text in dataset["text"].values.tolist()]
labels = dataset["emotions"].values.tolist()

### Creating Vocabulary (word index)

In [None]:
# Initialize word2id and label2id dictionaries that will be used to encode words and labels
word2id = dict()
label2id = dict()

max_words = 0 # maximum number of words in a sentence

# Construction of word2id dict
for sentence in input_sentences:
    for word in sentence:
        # Add words to word2id dict if not exist
        if word not in word2id:
            word2id[word] = len(word2id)
    # If length of the sentence is greater than max_words, update max_words
    if len(sentence) > max_words:
        max_words = len(sentence)
    
# Construction of label2id and id2label dicts
label2id = {l: i for i, l in enumerate(set(labels))}
id2label = {v: k for k, v in label2id.items()}
id2label

### Encoding samples with corresponing integer values

In [None]:
import keras

# Encode input words and labels
X = [[word2id[word] for word in sentence] for sentence in input_sentences]
Y = [label2id[label] for label in labels]

# Apply Padding to X
from keras.preprocessing.sequence import pad_sequences
X = pad_sequences(X, max_words)

# Convert Y to numpy array
Y = keras.utils.to_categorical(Y, num_classes=len(label2id), dtype='float32')

# Print shapes
print("Shape of X: {}".format(X.shape))
print("Shape of Y: {}".format(Y.shape))


## Build LSTM model with attention 

In [None]:
embedding_dim = 100 # The dimension of word embeddings

# Define input tensor
sequence_input = keras.Input(shape=(max_words,), dtype='int32')

# Word embedding layer
embedded_inputs =keras.layers.Embedding(len(word2id) + 1,
                                        embedding_dim,
                                        input_length=max_words)(sequence_input)

# Apply dropout to prevent overfitting
embedded_inputs = keras.layers.Dropout(0.2)(embedded_inputs)

# Apply Bidirectional LSTM over embedded inputs
lstm_outs = keras.layers.wrappers.Bidirectional(
    keras.layers.LSTM(embedding_dim, return_sequences=True)
)(embedded_inputs)

# Apply dropout to LSTM outputs to prevent overfitting
lstm_outs = keras.layers.Dropout(0.2)(lstm_outs)

# Attention Mechanism - Generate attention vectors
input_dim = int(lstm_outs.shape[2])
permuted_inputs = keras.layers.Permute((2, 1))(lstm_outs)
attention_vector = keras.layers.TimeDistributed(keras.layers.Dense(1))(lstm_outs)
attention_vector = keras.layers.Reshape((max_words,))(attention_vector)
attention_vector = keras.layers.Activation('softmax', name='attention_vec')(attention_vector)
attention_output = keras.layers.Dot(axes=1)([lstm_outs, attention_vector])

# Last layer: fully connected with softmax activation
fc = keras.layers.Dense(embedding_dim, activation='relu')(attention_output)
output = keras.layers.Dense(len(label2id), activation='softmax')(fc)

# Finally building model
model = keras.Model(inputs=[sequence_input], outputs=output)
model.compile(loss="categorical_crossentropy", metrics=["accuracy"], optimizer='adam')

# Print model summary
model.summary()





## Training the model

In [None]:
# Train model 10 iterations
model.fit(X, Y, epochs=2, batch_size=64, validation_split=0.1, shuffle=True)

The accuracy on validation data about 93%. Very good result for a classification task with six-classes.
The performance can be further improved by training the model a few more iteration.

**Let's look closer to model predictions and attentions**

In [None]:
# Re-create the model to get attention vectors as well as label prediction
model_with_attentions = keras.Model(inputs=model.input,
                                    outputs=[model.output, 
                                             model.get_layer('attention_vec').output])

In [None]:
import random
import math

# Select random samples to illustrate
sample_text = random.choice(dataset["text"].values.tolist())

# Encode samples
tokenized_sample = sample_text.split(" ")
encoded_samples = [[word2id[word] for word in tokenized_sample]]

# Padding
encoded_samples = keras.preprocessing.sequence.pad_sequences(encoded_samples, maxlen=max_words)

# Make predictions
label_probs, attentions = model_with_attentions.predict(encoded_samples)
label_probs = {id2label[_id]: prob for (label, _id), prob in zip(label2id.items(),label_probs[0])}

# Get word attentions using attenion vector
token_attention_dic = {}
max_score = 0.0
min_score = 0.0
for token, attention_score in zip(tokenized_sample, attentions[0][-len(tokenized_sample):]):
    token_attention_dic[token] = math.sqrt(attention_score)


# VISUALIZATION
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
from IPython.core.display import display, HTML

def rgb_to_hex(rgb):
    return '#%02x%02x%02x' % rgb
    
def attention2color(attention_score):
    r = 255 - int(attention_score * 255)
    color = rgb_to_hex((255, r, r))
    return str(color)
    
# Build HTML String to viualize attentions
html_text = "<hr><p style='font-size: large'><b>Text:  </b>"
for token, attention in token_attention_dic.items():
    html_text += "<span style='background-color:{};'>{} <span> ".format(attention2color(attention),
                                                                        token)
html_text += "</p>"
# Display text enriched with attention scores 
display(HTML(html_text))

# PLOT EMOTION SCORES
emotions = [label for label, _ in label_probs.items()]
scores = [score for _, score in label_probs.items()]
plt.figure(figsize=(5,2))
plt.bar(np.arange(len(emotions)), scores, align='center', alpha=0.5, color=['black', 'red', 'green', 'blue', 'cyan', "purple"])
plt.xticks(np.arange(len(emotions)), emotions)
plt.ylabel('Scores')
plt.show()



**We have used an attention mechanism with an LSTM network to recognize emotions in given text.
We show that attention mechanism can be useful for classification tasks as well as sequence labeling tasks.
We have illustrated the attentions in order to make model predictions interpretable and look fancy.
Enjoy attentions mechanism in different applications...**

*All feedbacks are welcome.*

