# Emotion Analysis from English Tweets Using BERT

### Workflow: 
1. Import Data
2. Data preprocessing and downloading BERT
3. Training and validation
4. Saving the model

Multiclass text classification with BERT and ktrain. 

In [None]:
!pip install ktrain

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import ktrain
from ktrain import text

## 1. Import Data

In [None]:
data_train = pd.read_csv('/content/drive/MyDrive/tweets-dataset/en-train.csv')
data_test = pd.read_csv('/content/drive/MyDrive/tweets-dataset/en-test.csv')
data_val = pd.read_csv('/content/drive/MyDrive/tweets-dataset/en-dev.csv')

# concat training and validation data
data_train = pd.concat([data_train, data_val], ignore_index=True)


class_names = ['joy', 'sadness', 'fear', 'anger']

In [None]:
# set the hyperparameters
maxlen = 64
batch_size = 16
lr = 5e-5
epochs = 3

## 2. Data Preprocessing

In [None]:
(x_train,  y_train), (x_test, y_test), preproc = text.texts_from_df(data_train,
                                                                'Text',
                                                                label_columns=['Emotion'],
                                                                val_df = data_test,
                                                                max_features=35000,
                                                                maxlen=maxlen,
                                                                val_pct=0.125,
                                                                preprocess_mode='bert',
                                                                lang='en', is_regression=False)

['anger', 'fear', 'joy', 'sadness']
   anger  fear  joy  sadness
0    0.0   0.0  1.0      0.0
1    0.0   1.0  0.0      0.0
2    0.0   0.0  0.0      1.0
3    0.0   0.0  1.0      0.0
4    1.0   0.0  0.0      0.0
['anger', 'fear', 'joy', 'sadness']
   anger  fear  joy  sadness
0    0.0   0.0  1.0      0.0
1    0.0   0.0  0.0      1.0
2    0.0   0.0  1.0      0.0
3    0.0   0.0  1.0      0.0
4    0.0   1.0  0.0      0.0
downloading pretrained BERT model (uncased_L-12_H-768_A-12.zip)...
[██████████████████████████████████████████████████]
extracting pretrained BERT model...
done.

cleanup downloaded zip...
done.

preprocessing train...
language: en


Is Multi-Label? False
preprocessing test...
language: en


## 2. Training and validation

In [None]:
model = text.text_classifier('bert', train_data=(x_train, y_train), preproc=preproc)

learner = ktrain.get_learner(model, train_data=(x_train, y_train), 
                             val_data=(x_test, y_test),
                             batch_size=batch_size)

Is Multi-Label? False
maxlen is 64
done.


In [None]:
# train the model
history = learner.fit_onecycle(lr, epochs)



begin training using onecycle policy with max lr of 5e-05...
Epoch 1/3
Epoch 2/3
Epoch 3/3


In [None]:
# validate
learner.validate(val_data=(x_test, y_test), class_names=class_names)

              precision    recall  f1-score   support

         joy       0.89      0.78      0.83       618
     sadness       0.80      0.89      0.84       605
        fear       0.91      0.90      0.91       592
       anger       0.82      0.85      0.84       404

    accuracy                           0.86      2219
   macro avg       0.86      0.86      0.85      2219
weighted avg       0.86      0.86      0.86      2219



array([[484,  74,  19,  41],
       [ 28, 537,  21,  19],
       [ 14,  30, 534,  14],
       [ 17,  29,  14, 344]])

Testing with other inputs

In [None]:
predictor = ktrain.get_predictor(learner.model, preproc)

In [None]:
message = 'I can\'t wait to watch the new movie'
prediction = predictor.predict(message)
print(' Message: {}\n Predicted: {}'.format(message, prediction))

 Message: I can't wait to watch the new movie
 Predicted: joy


## 4. Saving the model
To reload the predictor use: ktrain.load_predictor


In [None]:
predictor.save("/content/drive/MyDrive/models/en-bert-model")