# Sentiment Analysis with Deep Learning using BERT

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Task 2: Exploratory Data Analysis and Preprocessing

We will use the SMILE Twitter dataset.

_Wang, Bo; Tsakalidis, Adam; Liakata, Maria; Zubiaga, Arkaitz; Procter, Rob; Jensen, Eric (2016): SMILE Twitter Emotion dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.3187909.v2_

In [2]:
import torch
import pandas as pd
from tqdm.notebook import tqdm

In [3]:
df = pd.read_csv('/content/drive/MyDrive/Sentiment-Analyst/smile-annotations-final.csv', names = ['id', 'text', 'category'])
df.set_index('id', inplace = True)

In [4]:
df.head()

Unnamed: 0_level_0,text,category
id,Unnamed: 1_level_1,Unnamed: 2_level_1
611857364396965889,@aandraous @britishmuseum @AndrewsAntonio Merc...,nocode
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy
614877582664835073,@Sofabsports thank you for following me back. ...,happy
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy


In [5]:
df.text.iloc[0]

'@aandraous @britishmuseum @AndrewsAntonio Merci pour le partage! @openwinemap'

In [6]:
df.category.value_counts()

category
nocode               1572
happy                1137
not-relevant          214
angry                  57
surprise               35
sad                    32
happy|surprise         11
happy|sad               9
disgust|angry           7
disgust                 6
sad|disgust             2
sad|angry               2
sad|disgust|angry       1
Name: count, dtype: int64

In [7]:
df = df[~df.category.str.contains('\|')]

In [8]:
df = df[df.category != 'nocode']

In [9]:
df.category.value_counts()

category
happy           1137
not-relevant     214
angry             57
surprise          35
sad               32
disgust            6
Name: count, dtype: int64

In [10]:
possible_labels = df.category.unique()

In [11]:
len(possible_labels)

6

In [12]:
label_dict = {}
for index, label in enumerate(possible_labels):
    label_dict[label] = index

In [13]:
label_dict

{'happy': 0,
 'not-relevant': 1,
 'angry': 2,
 'disgust': 3,
 'sad': 4,
 'surprise': 5}

In [14]:
df['label'] = df.category.replace(label_dict)
df.head()

Unnamed: 0_level_0,text,category,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy,0
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy,0
614877582664835073,@Sofabsports thank you for following me back. ...,happy,0
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy,0
611570404268883969,@NationalGallery @ThePoldarkian I have always ...,happy,0


## Task 3: Training/Validation Split

In [15]:
from sklearn.model_selection import train_test_split

In [16]:
X_train, X_val, y_train, y_val = train_test_split(df.index.values, df.label.values, test_size = 0.15, random_state = 17, stratify = df.label.values)

In [17]:
df['data_type'] = ['not_set'] * df.shape[0]

In [18]:
df.loc[X_train, 'data_type'] = 'train'
df.loc[X_val, 'data_type'] = 'val'

In [19]:
df.groupby(['category', 'label', 'data_type']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,text
category,label,data_type,Unnamed: 3_level_1
angry,2,train,48
angry,2,val,9
disgust,3,train,5
disgust,3,val,1
happy,0,train,966
happy,0,val,171
not-relevant,1,train,182
not-relevant,1,val,32
sad,4,train,27
sad,4,val,5


## Task 4: Loading Tokenizer and Encoding our Data

In [20]:
from transformers import BertTokenizer
from torch.utils.data import TensorDataset

In [21]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case = True)

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [22]:
encoded_data_train = tokenizer.batch_encode_plus(
    df[df.data_type == 'train'].text.values,
    add_special_tokens = True,
    return_attention_mask = True,
    pad_to_max_length = True,
    max_length = 256,
    return_tensors = 'pt'
)

encoded_data_val = tokenizer.batch_encode_plus(
    df[df.data_type == 'val'].text.values,
    add_special_tokens = True,
    return_attention_mask = True,
    pad_to_max_length = True,
    max_length = 256,
    return_tensors = 'pt'
)

input_ids_train = encoded_data_train['input_ids']
attention_masks_train = encoded_data_train['attention_mask']
labels_train = torch.tensor(df[df.data_type == 'train'].label.values)

input_ids_val = encoded_data_val['input_ids']
attention_masks_val = encoded_data_val['attention_mask']
labels_val = torch.tensor(df[df.data_type == 'val'].label.values)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


In [23]:
dataset_train = TensorDataset(input_ids_train, attention_masks_train, labels_train)
dataset_val = TensorDataset(input_ids_val, attention_masks_val, labels_val)

In [24]:
len(dataset_train)

1258

In [25]:
len(dataset_val)

223

## Task 5: Setting up BERT Pretrained Model

In [26]:
from transformers import BertForSequenceClassification

In [27]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(label_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Task 6: Creating Data Loaders

In [28]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

In [29]:
batch_size = 32

dataloader_train = DataLoader(dataset_train,
                              sampler = RandomSampler(dataset_train),
                              batch_size = batch_size)

dataloader_validation = DataLoader(dataset_val,
                                   sampler = SequentialSampler(dataset_val),
                                   batch_size = batch_size)

## Task 7: Setting Up Optimizer and Scheduler

In [30]:
from transformers import AdamW, get_linear_schedule_with_warmup

In [31]:
optimizer = AdamW(model.parameters(),
                  lr = 1e-5,
                  eps = 1e-8)



In [41]:
epochs = 30

scheduler = get_linear_schedule_with_warmup(optimizer,
                                            num_warmup_steps = 0,
                                            num_training_steps = len(dataloader_train) * epochs)

## Task 8: Defining our Performance Metrics

Accuracy metric approach originally used in accuracy function in [this tutorial](https://mccormickml.com/2019/07/22/BERT-fine-tuning/#41-bertforsequenceclassification).

In [33]:
import numpy as np

In [34]:
from sklearn.metrics import f1_score

In [35]:
def f1_score_func(preds, labels):
    preds_flat = np.argmax(preds, axis = 1).flatten()
    labels_flat = labels.flatten()
    return f1_score(labels_flat, preds_flat, average = 'weighted')
    pass

In [36]:
def accuracy_per_class(preds, labels):
    label_dict_inverse = {v: k for k, v in label_dict.items()}

    preds_flat = np.argmax(preds, axis = 1).flatten()
    labels_flat = labels.flatten()

    for label in np.unique(labels_flat):
        y_preds = preds_flat[labels_flat == label]
        y_true = labels_flat[labels_flat == label]

        print(f'Class: {label_dict_inverse[label]}')
        print(f'Accuracy: {len(y_preds[y_preds == label])}/{len(y_true)}\n')
    pass

## Task 9: Creating our Training Loop

Approach adapted from an older version of HuggingFace's `run_glue.py` script. Accessible [here](https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128).

In [37]:
import random

seed_val = 17
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

In [54]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

print(device)

cuda


In [55]:
def evaluate(dataloader_val):

    model.eval()

    loss_val_total = 0
    predictions, true_vals = [], []

    for batch in dataloader_val:

        batch = tuple(b.to(device) for b in batch)

        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }

        with torch.no_grad():
            outputs = model(**inputs)

        loss = outputs[0]
        logits = outputs[1]
        loss_val_total += loss.item()

        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)

    loss_val_avg = loss_val_total/len(dataloader_val)

    predictions = np.concatenate(predictions, axis=0)
    true_vals = np.concatenate(true_vals, axis=0)

    return loss_val_avg, predictions, true_vals


In [42]:
for epoch in tqdm(range(1, epochs+1)):

    model.train()
    loss_train_total = 0

    progress_bar = tqdm(dataloader_train, desc='Epoch {:1d}'.format(epoch), leave=False, disable=False)

    for batch in progress_bar:

      model.zero_grad()
      batch = tuple(b.to(device) for b in batch)
      inputs = {'input_ids':      batch[0],
                'attention_mask': batch[1],
                'labels':         batch[2],
               }

      outputs = model(**inputs)
      loss = outputs[0]
      loss_train_total += loss.item()
      loss.backward()

      torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

      optimizer.step()
      scheduler.step()
      progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
    torch.save(model.state_dict(), f'finetuned_BERT_epoch_{epoch}.model')

    tqdm.write(f'\nEpoch {epoch}')

    loss_train_avg = loss_train_total/len(dataloader_train)
    tqdm.write(f'Training loss: {loss_train_avg}')

    val_loss, predictions, true_vals = evaluate(dataloader_validation)
    val_f1 = f1_score_func(predictions, true_vals)
    tqdm.write(f'Validation loss: {val_loss}')
    tqdm.write(f'F1 Score (Weighted): {val_f1}')


  0%|          | 0/30 [00:00<?, ?it/s]

Epoch 1:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 1
Training loss: 0.29696322586387397
Validation loss: 0.6326609253883362
F1 Score (Weighted): 0.7858849957352656


Epoch 2:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 2
Training loss: 0.2633219677954912
Validation loss: 0.6511100062302181
F1 Score (Weighted): 0.7785173794006669


Epoch 3:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 3
Training loss: 0.23405554555356503
Validation loss: 0.6246594701494489
F1 Score (Weighted): 0.7887115001187657


Epoch 4:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 4
Training loss: 0.20960419476032258
Validation loss: 0.6325259251253945
F1 Score (Weighted): 0.8218466137748649


Epoch 5:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 5
Training loss: 0.1773056940641254
Validation loss: 0.6190956788403648
F1 Score (Weighted): 0.8433262991295174


Epoch 6:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 6
Training loss: 0.1412730866111815
Validation loss: 0.581586514200483
F1 Score (Weighted): 0.8520107370987863


Epoch 7:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 7
Training loss: 0.10652122220490128
Validation loss: 0.590651701603617
F1 Score (Weighted): 0.8415064759033736


Epoch 8:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 8
Training loss: 0.0761635162634775
Validation loss: 0.5685746158872332
F1 Score (Weighted): 0.853128053460266


Epoch 9:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 9
Training loss: 0.055633841059170665
Validation loss: 0.6097104251384735
F1 Score (Weighted): 0.8457040816058667


Epoch 10:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 10
Training loss: 0.038746275240555406
Validation loss: 0.6209697265710149
F1 Score (Weighted): 0.8513791869224728


Epoch 11:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 11
Training loss: 0.02833210953976959
Validation loss: 0.617911017366818
F1 Score (Weighted): 0.854945550802738


Epoch 12:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 12
Training loss: 0.022095848922617733
Validation loss: 0.6624106775437083
F1 Score (Weighted): 0.8458462082316635


Epoch 13:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 13
Training loss: 0.016960937005933374
Validation loss: 0.6618872976728848
F1 Score (Weighted): 0.8477448704988301


Epoch 14:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 14
Training loss: 0.014725930686108769
Validation loss: 0.6380120973501887
F1 Score (Weighted): 0.8554747875457435


Epoch 15:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 15
Training loss: 0.012041952763684093
Validation loss: 0.6554695538112095
F1 Score (Weighted): 0.8605009415099101


Epoch 16:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 16
Training loss: 0.010624463163549081
Validation loss: 0.675890034862927
F1 Score (Weighted): 0.8458851652270276


Epoch 17:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 17
Training loss: 0.009859781572595238
Validation loss: 0.6796422877482006
F1 Score (Weighted): 0.8517524429085724


Epoch 18:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 18
Training loss: 0.00872826508129947
Validation loss: 0.6856160249028888
F1 Score (Weighted): 0.8482085428273769


Epoch 19:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 19
Training loss: 0.007974665379151702
Validation loss: 0.682513667004449
F1 Score (Weighted): 0.8450161503702558


Epoch 20:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 20
Training loss: 0.007845946052111686
Validation loss: 0.6874155168022428
F1 Score (Weighted): 0.8525365277219805


Epoch 21:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 21
Training loss: 0.006655804000911303
Validation loss: 0.7042980790138245
F1 Score (Weighted): 0.8508538755460547


Epoch 22:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 22
Training loss: 0.006045964115764945
Validation loss: 0.7153518157345908
F1 Score (Weighted): 0.8497211682189262


Epoch 23:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 23
Training loss: 0.005988801931380294
Validation loss: 0.7187087046248573
F1 Score (Weighted): 0.8525365277219805


Epoch 24:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 24
Training loss: 0.005545883954619057
Validation loss: 0.7216807987008776
F1 Score (Weighted): 0.8458718665000368


Epoch 25:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 25
Training loss: 0.005797095719026401
Validation loss: 0.7150924674102238
F1 Score (Weighted): 0.8492506581356623


Epoch 26:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 26
Training loss: 0.005633360520005226
Validation loss: 0.7171597863946643
F1 Score (Weighted): 0.8492506581356623


Epoch 27:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 27
Training loss: 0.0051873236574465405
Validation loss: 0.7155688468899045
F1 Score (Weighted): 0.8486357562285582


Epoch 28:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 28
Training loss: 0.005089835976832546
Validation loss: 0.717156823192324
F1 Score (Weighted): 0.8486357562285582


Epoch 29:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 29
Training loss: 0.005331626735278405
Validation loss: 0.7178554322038379
F1 Score (Weighted): 0.8469540698464465


Epoch 30:   0%|          | 0/40 [00:00<?, ?it/s]


Epoch 30
Training loss: 0.005224104144144803
Validation loss: 0.7193094534533364
F1 Score (Weighted): 0.8486357562285582


## Task 10: Loading and Evaluating our Model

In [43]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(label_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [53]:
model.load_state_dict(torch.load('/content/finetuned_BERT_epoch_30.model', map_location = torch.device('cuda')))

<All keys matched successfully>

In [56]:
_, predictions, true_vals = evaluate(dataloader_validation)

In [57]:
accuracy_per_class(predictions, true_vals)

Class: happy
Accuracy: 161/171

Class: not-relevant
Accuracy: 19/32

Class: angry
Accuracy: 7/9

Class: disgust
Accuracy: 0/1

Class: sad
Accuracy: 1/5

Class: surprise
Accuracy: 2/5

