## Sentiment Analysis with Deep Learning using BERT

### * Run on Google Colab using GPU

We will use the SMILE Twitter dataset.

_Wang, Bo; Tsakalidis, Adam; Liakata, Maria; Zubiaga, Arkaitz; Procter, Rob; Jensen, Eric (2016): SMILE Twitter Emotion dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.3187909.v2_

In [1]:
import pandas as pd

In [2]:
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
path = "/content/drive/My Drive/data_colab/smile_annotations_final.csv"
df = pd.read_csv(path, names=["id", "text", "emotion"], index_col="id")

df.head()

Unnamed: 0_level_0,text,emotion
id,Unnamed: 1_level_1,Unnamed: 2_level_1
611857364396965889,@aandraous @britishmuseum @AndrewsAntonio Merc...,nocode
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy
614877582664835073,@Sofabsports thank you for following me back. ...,happy
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy


### 1. Exploratory Data Analysis & Pre-processing

In [4]:
# discovered severe class imbalance and classes with multiple sentiments
df.emotion.value_counts()

nocode               1572
happy                1137
not-relevant          214
angry                  57
surprise               35
sad                    32
happy|surprise         11
happy|sad               9
disgust|angry           7
disgust                 6
sad|disgust             2
sad|angry               2
sad|disgust|angry       1
Name: emotion, dtype: int64

In [5]:
targets = ["nocode", "|"]

df2 = df.loc[~df.emotion.apply(lambda sentence: any(word in sentence for word in targets))].copy()
df2.head()

Unnamed: 0_level_0,text,emotion
id,Unnamed: 1_level_1,Unnamed: 2_level_1
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy
614877582664835073,@Sofabsports thank you for following me back. ...,happy
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy
611570404268883969,@NationalGallery @ThePoldarkian I have always ...,happy


In [6]:
# win: 4.53 ms ± 253 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# %timeit df.loc[~df.emotion.apply(lambda sentence: any(word in sentence for word in targets))]

In [7]:
# 6.72 ms ± 275 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# %timeit df.loc[df.emotion.str.contains("nocode") | df.emotion.str.contains("\|")]

In [8]:
df2.emotion.value_counts()

happy           1137
not-relevant     214
angry             57
surprise          35
sad               32
disgust            6
Name: emotion, dtype: int64

In [9]:
emotion_dict = {}
emotion = df2.emotion.unique()

for index, emotion in enumerate(emotion):
    emotion_dict[emotion] = index

emotion_dict

{'angry': 2,
 'disgust': 3,
 'happy': 0,
 'not-relevant': 1,
 'sad': 4,
 'surprise': 5}

In [10]:
# 2.93 ms ± 427 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# %timeit df2["label"] = df2.emotion.replace(emotion_dict)

In [11]:
# win: 1.15 ms ± 36.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
df2["label"] = df2["emotion"].map(emotion_dict)
df2.tail(10)

Unnamed: 0_level_0,text,emotion,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
611227963976253440,THIS is why #ilovemuseums ! Torres Strait danc...,happy,0
612242969035411456,@britishmuseum on INSTA. See this magnificent ...,happy,0
614900716960915456,This was definitely one of the best cocktails ...,happy,0
614053885733412864,Good to see @liveatlica's art collection @Leed...,happy,0
610405281604993024,"@RAMMuseum thanks, we'll have a look next week...",happy,0
611258135270060033,@_TheWhitechapel @Campaignforwool @SlowTextile...,not-relevant,1
612214539468279808,“@britishmuseum: Thanks for ranking us #1 in @...,happy,0
613678555935973376,MT @AliHaggett: Looking forward to our public ...,happy,0
615246897670922240,@MrStuchbery @britishmuseum Mesmerising.,happy,0
613016084371914753,@NationalGallery The 2nd GENOCIDE against #Bia...,not-relevant,1


### 2. Training/ Validation Split

In [12]:
df2.index.values

array([614484565059596288, 614746522043973632, 614877582664835073, ...,
       613678555935973376, 615246897670922240, 613016084371914753])

In [13]:
df2.label.values

array([0, 0, 0, ..., 0, 0, 1])

In [14]:
from sklearn.model_selection import train_test_split

In [15]:
X_train, X_test, y_train, y_test = train_test_split(
    df2.index.values,
    df2.label.values,
    test_size = 0.15,
    random_state = 42,
    stratify = df2.label.values  # ensure each class is in both train and test sets - dataset is significantly imbalance
)

In [16]:
X_train[:3]  # the id as index

array([612587098240090112, 613306427340419072, 613086766199894016])

In [17]:
X_test[:3]

array([611118985791324161, 610411898186649601, 611796344047554560])

In [18]:
df2["data_type"] = ["no_data"]*df2.shape[0]  # dummy data placeholder temporarily
df2.head()

Unnamed: 0_level_0,text,emotion,label,data_type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy,0,no_data
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy,0,no_data
614877582664835073,@Sofabsports thank you for following me back. ...,happy,0,no_data
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy,0,no_data
611570404268883969,@NationalGallery @ThePoldarkian I have always ...,happy,0,no_data


In [19]:
df2.loc[X_train, "data_type"] = "train"
df2.loc[X_test, "data_type"] = "test"

In [20]:
df2.groupby(["emotion", "label", "data_type"]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,text
emotion,label,data_type,Unnamed: 3_level_1
angry,2,test,9
angry,2,train,48
disgust,3,test,1
disgust,3,train,5
happy,0,test,171
happy,0,train,966
not-relevant,1,test,32
not-relevant,1,train,182
sad,4,test,5
sad,4,train,27


### 3. Loading Tokeniser & Encoding Data

In [21]:
# !pip install transformers

In [22]:
from transformers import BertTokenizer
from torch.utils.data import TensorDataset

In [23]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", 
                                          do_lower_case=True)

In [24]:
df2[df2.data_type == "train"].text.values[0]

'Dorian Gray with Rainbow Scarf #LoveWins (from @britishmuseum http://t.co/Q4XSwL0esu) http://t.co/h0evbTBWRq'

In [25]:
encoded_data_train = tokenizer.batch_encode_plus(
    df2[df2.data_type == "train"].text.values,  # contents to be analysed
    add_special_tokens=True,  # "CLS" and "SEP" special tokens
    truncation=True,
    max_length=256,
    pad_to_max_length=True,
    return_attention_mask=True,  # to indicate which are real contents/ text to be analysed, which are paddings
    return_tensors="pt"  # pt = pytorch
    
)

encoded_data_test = tokenizer.batch_encode_plus(
    df2[df2.data_type == "test"].text.values,
    add_special_tokens=True,
    truncation=True,
    max_length=256,
    pad_to_max_length=True,
    return_attention_mask=True,
    return_tensors="pt"
)

In [26]:
type(encoded_data_train)

transformers.tokenization_utils_base.BatchEncoding

In [27]:
encoded_data_train

{'input_ids': tensor([[  101, 16092,  3897,  ...,     0,     0,     0],
        [  101,  1030, 27034,  ...,     0,     0,     0],
        [  101,  1030, 10682,  ...,     0,     0,     0],
        ...,
        [  101, 11047,  1030,  ...,     0,     0,     0],
        [  101,  1030,  3680,  ...,     0,     0,     0],
        [  101,  1030,  2120,  ...,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]])}

In [28]:
encoded_data_train["input_ids"]

tensor([[  101, 16092,  3897,  ...,     0,     0,     0],
        [  101,  1030, 27034,  ...,     0,     0,     0],
        [  101,  1030, 10682,  ...,     0,     0,     0],
        ...,
        [  101, 11047,  1030,  ...,     0,     0,     0],
        [  101,  1030,  3680,  ...,     0,     0,     0],
        [  101,  1030,  2120,  ...,     0,     0,     0]])

In [29]:
encoded_data_train["attention_mask"]

tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]])

In [30]:
df2[df2.data_type == "train"].label.values

array([0, 0, 0, ..., 0, 0, 1])

In [31]:
import torch

torch.tensor(df2[df2.data_type == "train"].label.values)  # `torch.tensor` converts array to tensor

tensor([0, 0, 0,  ..., 0, 0, 1])

In [32]:
input_id_train = encoded_data_train["input_ids"]
attention_mask_train = encoded_data_train["attention_mask"]
label_train = torch.tensor(df2[df2.data_type == "train"].label.values)

input_id_test = encoded_data_test["input_ids"]
attention_mask_test = encoded_data_test["attention_mask"]
label_test = torch.tensor(df2[df2.data_type == "test"].label.values)

In [33]:
dataset_train = TensorDataset(input_id_train, attention_mask_train, label_train)
dataset_test = TensorDataset(input_id_test, attention_mask_test, label_test)

In [34]:
display(len(dataset_train), len(dataset_test))

1258

223

### 4. Creating Data Loaders

In [35]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

In [36]:
batch_size = 32

dataloader_train = DataLoader(
    dataset_train,
    sampler=RandomSampler(dataset_train),  # prevent model from learning common sequences of input
    batch_size=batch_size
)

dataloader_test = DataLoader(
    dataset_test,
    sampler=SequentialSampler(dataset_test),
    batch_size=batch_size
)

### 5. Setting Up BERT Pre-trained Model

In [37]:
from transformers import BertForSequenceClassification

In [38]:
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=len(emotion_dict),  # 6 emotions
    output_attentions=False,  # keywords used for prediction
    output_hidden_states=False
)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

### 6. Setting Up Optimiser & Scheduler

In [39]:
from transformers import AdamW, get_linear_schedule_with_warmup

In [40]:
optimizer = AdamW(
    model.parameters(),
    lr=1e-5,
    eps=1e-8
)

In [41]:
epochs = 10

scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=0,
    num_training_steps=len(dataloader_train)*epochs
)

### 7. Defining Performance Metrics

In [42]:
import numpy as np

In [43]:
from sklearn.metrics import f1_score

In [44]:
np.argmax([1,2,3,4,5,8,9,22,3])  # Returns the indices of the maximum values along an axis

7

In [45]:
# Evaluate model's overall performance

def f1_score_func(pred, label):
    # `argmax` returns index/ position of the highest probability which corresponds to index of emotion
    pred_flat = np.argmax(pred, axis=1).flatten()  # axis=1 represents columns/ it means apply computation across columns/ row by row
    label_flat = label.flatten()
    
    return f1_score(label_flat, pred_flat, average="weighted")

In [46]:
emotion_dict

{'angry': 2,
 'disgust': 3,
 'happy': 0,
 'not-relevant': 1,
 'sad': 4,
 'surprise': 5}

In [47]:
emotion_dict.items()  # `items` returns key-value pair of tuple

dict_items([('happy', 0), ('not-relevant', 1), ('angry', 2), ('disgust', 3), ('sad', 4), ('surprise', 5)])

In [48]:
emo_inv = {v: k for k, v in emotion_dict.items()}  # inversed
emo_inv

{0: 'happy',
 1: 'not-relevant',
 2: 'angry',
 3: 'disgust',
 4: 'sad',
 5: 'surprise'}

In [49]:
emo_inv[0]

'happy'

In [50]:
len([0, 0, 0])

3

In [51]:
# Evaluate model's performance per class

def accuracy_per_class(pred, label):
    emotion_dict_inverse = {v: k for k, v in emotion_dict.items()}
    
    pred_flat = np.argmax(pred, axis=1).flatten()
    label_flat = label.flatten()
      
    for label in np.unique(label_flat):  # `label` here is the number instead of emotion in words
        y_pred = pred_flat[label_flat == label]
        y_true = label_flat[label_flat == label]
        
        accuracy = len(y_pred[y_pred == label])/ len(y_true)
        print(f'Class: {emotion_dict_inverse[label]}')
        print(f'Accuracy: {len(y_pred[y_pred==label])}/{len(y_true)}')
        print(f'Accuracy score: {accuracy}\n')

### 8. Creating Training Loop

In [52]:
import random

# for the sake of reproducibility
seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)

torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

In [53]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

print(device)

cuda


Unlike in training, in testing...
- we don't change any gradient/ don't do backpropagation/ freeze weights
- we only care about loss and logit/ prediction value

In [54]:
def test(dataloader_test):
    model.eval()
    
    loss_test_total = 0
    prediction, true_val = [], []  # numpy arrays
    
    for batch in dataloader_test:
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {
            "input_ids": batch[0],
            "attention_mask": batch[1],
            "labels": batch[2]
        }
        
        with torch.no_grad():
            outputs = model(**inputs)
        
        # output component #1: loss
        loss = outputs[0]
        loss_test_total += loss.item()
        
        # output component #2: logit/ prediction
        logit = outputs[1]
        logit = logit.detach().cpu().numpy()  # if use GPU - pull predictions from GPU and send to CPU
        prediction.append(logit)
        
        # label/ the answer
        label_id = inputs["labels"].cpu().numpy()
        true_val.append(label_id)
        

    loss_test_avg = loss_test_total/ len(dataloader_test)
    
    predictions = np.concatenate(prediction, axis=0)
    true_vals = np.concatenate(true_val, axis=0)
    
    return loss_test_avg, predictions, true_vals

In [55]:
from tqdm.notebook import tqdm

for epoch in tqdm(range(1, epochs+1)):
    model.train()  # enter training mode, wherein propagation can occur
    
    loss_train_total = 0
    progress_bar = tqdm(
        dataloader_train, 
        desc='Epochs: {:1d}'.format(epoch), 
        leave=False,  # not leaving means we're overwriting the existing progress bar for every new epoch
        disable=False
    )
    
    for batch in progress_bar:
        model.zero_grad()  # no need gradient as we're not using recurrent network but transformer
        
        # dataloader: dataset, sampler, batch size
        ## dataset: input id (tokenised text to be analyzed), attention mask, label
        # to make sure each record in dataset of dataloader (input id, attention mask, label) is on correct device; important if use GPU     
        batch = tuple(b.to(device) for b in batch)
        inputs = {
            'input_ids': batch[0],  # tokenised text to be analysed for sentiment
            'attention_mask': batch[1],
            'labels': batch[2]
        }
        outputs = model(**inputs)  # unpack dict strings of input
         
        loss = outputs[0]
        loss_train_total += loss.item()
        loss.backward()  # backpropagation
        
        # prevent gradient from becoming too small/ big, promote generalization of dataset
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        
        optimizer.step()
        scheduler.step()
        
        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/ len(batch))})
    
    
    path = "/content/drive/My Drive/data_colab/"
    torch.save(model.state_dict(), f"{path}finetuned_BERT_epoch_{epoch}.model")  # save model after each epoch
    
    # Epoch performance
    tqdm.write(f"\nEpoch {epoch}")
    
    loss_train_avg = loss_train_total/ len(dataloader_train)
    tqdm.write(f"Training loss: {loss_train_avg}")
    
    test_loss, prediction, true_val = test(dataloader_test)
    test_f1 = f1_score_func(prediction, true_val)
    tqdm.write(f"Test loss: {test_loss}")
    tqdm.write(f"F1 Score (weighted): {test_f1}")

HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, description='Epochs: 1', max=40.0, style=ProgressStyle(description_wid…


Epoch 1
Training loss: 1.3548611104488373
Test loss: 0.9456564017704555
F1 Score (weighted): 0.6656119824269878


HBox(children=(FloatProgress(value=0.0, description='Epochs: 2', max=40.0, style=ProgressStyle(description_wid…


Epoch 2
Training loss: 0.804074002802372
Test loss: 0.7020841922078814
F1 Score (weighted): 0.7452248875722216


HBox(children=(FloatProgress(value=0.0, description='Epochs: 3', max=40.0, style=ProgressStyle(description_wid…


Epoch 3
Training loss: 0.5993424527347088
Test loss: 0.5989199791635785
F1 Score (weighted): 0.7605075783146086


HBox(children=(FloatProgress(value=0.0, description='Epochs: 4', max=40.0, style=ProgressStyle(description_wid…


Epoch 4
Training loss: 0.49051300510764123
Test loss: 0.5124728764806475
F1 Score (weighted): 0.7879741347273492


HBox(children=(FloatProgress(value=0.0, description='Epochs: 5', max=40.0, style=ProgressStyle(description_wid…


Epoch 5
Training loss: 0.41701488718390467
Test loss: 0.5085259633404868
F1 Score (weighted): 0.7930194559811127


HBox(children=(FloatProgress(value=0.0, description='Epochs: 6', max=40.0, style=ProgressStyle(description_wid…


Epoch 6
Training loss: 0.3759343532845378
Test loss: 0.4858198698077883
F1 Score (weighted): 0.8003348079020725


HBox(children=(FloatProgress(value=0.0, description='Epochs: 7', max=40.0, style=ProgressStyle(description_wid…


Epoch 7
Training loss: 0.3315363831818104
Test loss: 0.49274374331746784
F1 Score (weighted): 0.7988818795233631


HBox(children=(FloatProgress(value=0.0, description='Epochs: 8', max=40.0, style=ProgressStyle(description_wid…


Epoch 8
Training loss: 0.3047343537211418
Test loss: 0.49030993240220205
F1 Score (weighted): 0.7979789048190489


HBox(children=(FloatProgress(value=0.0, description='Epochs: 9', max=40.0, style=ProgressStyle(description_wid…


Epoch 9
Training loss: 0.2917662085965276
Test loss: 0.46849004711423603
F1 Score (weighted): 0.8129769602425505


HBox(children=(FloatProgress(value=0.0, description='Epochs: 10', max=40.0, style=ProgressStyle(description_wi…


Epoch 10
Training loss: 0.2814688391983509
Test loss: 0.4854891278914043
F1 Score (weighted): 0.808076459058371



### 9. Loading and Evaluating Model

In [56]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(emotion_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

model.to(device)
pass

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [60]:
path = "/content/drive/My Drive/data_colab/"
model.load_state_dict(torch.load(f'{path}finetuned_BERT_epoch_9.model', map_location=torch.device('cpu')))

<All keys matched successfully>

In [61]:
_, predictions, true_vals = test(dataloader_test)

In [62]:
accuracy_per_class(predictions, true_vals)

Class: happy
Accuracy: 166/171
Accuracy score: 0.9707602339181286

Class: not-relevant
Accuracy: 21/32
Accuracy score: 0.65625

Class: angry
Accuracy: 1/9
Accuracy score: 0.1111111111111111

Class: disgust
Accuracy: 0/1
Accuracy score: 0.0

Class: sad
Accuracy: 0/5
Accuracy score: 0.0

Class: surprise
Accuracy: 0/5
Accuracy score: 0.0



In [63]:
path = "/content/drive/My Drive/data_colab/"
model.load_state_dict(torch.load(f'{path}finetuned_BERT_epoch_10.model', map_location=torch.device('cpu')))

<All keys matched successfully>

In [64]:
_, predictions, true_vals = test(dataloader_test)

In [65]:
accuracy_per_class(predictions, true_vals)

Class: happy
Accuracy: 166/171
Accuracy score: 0.9707602339181286

Class: not-relevant
Accuracy: 20/32
Accuracy score: 0.625

Class: angry
Accuracy: 1/9
Accuracy score: 0.1111111111111111

Class: disgust
Accuracy: 0/1
Accuracy score: 0.0

Class: sad
Accuracy: 0/5
Accuracy score: 0.0

Class: surprise
Accuracy: 0/5
Accuracy score: 0.0

