#**Sentiment Analysis using BERT**

##What is BERT ?

BERT (Bidirectional Encoder Representations from Transformers) adalah algoritma deep learning yang dirancang untuk mengolah natural language processing. BERT adalah teknik atau sistem berbasis neural network.
Neural network sendiri adalah jaringan saraf tiruan dalam machine learning dan artificial intelligence yang mencoba meniru sistem kerja otak manusia.
Sistem ini digunakan untuk pre-training natural language processing, di mana mesin bisa belajar dan meningkatkan kemampuannya

**1. Exploratory Data Analysis and Preprocessing**

In [3]:
! pip install torch torchvision



In [4]:
! pip install tqdm



In [5]:
import torch
import pandas as pd
from tqdm.notebook import tqdm

In [6]:
from google.colab import files
uploaded = files.upload()

Saving smileannotationsfinal.csv to smileannotationsfinal.csv


In [7]:
df = pd.read_csv('smileannotationsfinal.csv')
df.head(10)

Unnamed: 0,id,text,category
0,611857364396965000,@aandraous @britishmuseum @AndrewsAntonio Merc...,nocode
1,614484565059596000,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy
2,614746522043973000,@SelectShowcase @Tate_StIves ... Replace with ...,happy
3,614877582664835000,@Sofabsports thank you for following me back. ...,happy
4,611932373039644000,@britishmuseum @TudorHistory What a beautiful ...,happy
5,611570404268883000,@NationalGallery @ThePoldarkian I have always ...,happy
6,614456889863208000,"@britishmuseum say wot, mate?",nocode
7,614016385442807000,Two workshops on evaluating audience engagemen...,nocode
8,610916556751642000,"A Forest Road, by Thomas Gainsborough 1750 Oil...",nocode
9,614499696015503000,Lucky @FitzMuseum_UK! Good luck @MirandaStearn...,happy


In [8]:
df.category.value_counts() 

nocode               1572
happy                1137
not-relevant          214
angry                  57
surprise               35
sad                    32
happy|surprise         11
happy|sad               9
disgust|angry           7
disgust                 6
sad|disgust             2
sad|angry               2
sad|disgust|angry       1
Name: category, dtype: int64

In [9]:
df = df[~df.category.str.contains('\|')]

In [10]:
df = df[df.category != 'nocode']

In [11]:
df.category.value_counts()

happy           1137
not-relevant     214
angry             57
surprise          35
sad               32
disgust            6
Name: category, dtype: int64

In [12]:
possible_labels = df.category.unique()

In [13]:
label_dict = {}
for index, possible_label in enumerate(possible_labels):
    label_dict[possible_label] = index

In [14]:
label_dict

{'angry': 2,
 'disgust': 3,
 'happy': 0,
 'not-relevant': 1,
 'sad': 4,
 'surprise': 5}

In [15]:
df['label'] = df.category.replace(label_dict)

In [16]:
df.head(10)

Unnamed: 0,id,text,category,label
1,614484565059596000,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy,0
2,614746522043973000,@SelectShowcase @Tate_StIves ... Replace with ...,happy,0
3,614877582664835000,@Sofabsports thank you for following me back. ...,happy,0
4,611932373039644000,@britishmuseum @TudorHistory What a beautiful ...,happy,0
5,611570404268883000,@NationalGallery @ThePoldarkian I have always ...,happy,0
9,614499696015503000,Lucky @FitzMuseum_UK! Good luck @MirandaStearn...,happy,0
12,613601881441570000,Yr 9 art students are off to the @britishmuseu...,happy,0
15,613696526297210000,@RAMMuseum Please vote for us as @sainsbury #s...,not-relevant,1
16,610746718641102000,#AskTheGallery Have you got plans to privatise...,not-relevant,1
18,612648200588038000,@BarbyWT @britishmuseum so beautiful,happy,0


**2. Training/Validation Split**

In [17]:
from sklearn.model_selection import train_test_split

In [18]:
x_train, x_val, y_train, y_val =  train_test_split(df.index.values,
                                                   df.label.values,
                                                   test_size=0.15,
                                                   random_state=17,
                                                   stratify=df.label.values
)

In [19]:
df['data_type'] = ['not_set']*df.shape[0]

In [20]:
df.head()

Unnamed: 0,id,text,category,label,data_type
1,614484565059596000,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy,0,not_set
2,614746522043973000,@SelectShowcase @Tate_StIves ... Replace with ...,happy,0,not_set
3,614877582664835000,@Sofabsports thank you for following me back. ...,happy,0,not_set
4,611932373039644000,@britishmuseum @TudorHistory What a beautiful ...,happy,0,not_set
5,611570404268883000,@NationalGallery @ThePoldarkian I have always ...,happy,0,not_set


In [21]:
df.loc[x_train, 'data_type'] = 'train'
df.loc[x_val, 'data_type'] = 'val'

In [22]:
df.groupby(['category', 'label', 'data_type']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,id,text
category,label,data_type,Unnamed: 3_level_1,Unnamed: 4_level_1
angry,2,train,48,48
angry,2,val,9,9
disgust,3,train,5,5
disgust,3,val,1,1
happy,0,train,966,966
happy,0,val,171,171
not-relevant,1,train,182,182
not-relevant,1,val,32,32
sad,4,train,27,27
sad,4,val,5,5


**3. Loading Tokenizer and Encoding our Data**

In [23]:
! pip install transformers

Collecting transformers
  Downloading transformers-4.12.5-py3-none-any.whl (3.1 MB)
[K     |████████████████████████████████| 3.1 MB 13.2 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 46.1 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.46-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 36.9 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 52.2 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.2.1-py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 538 kB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Atte

In [24]:
from transformers import BertTokenizer
from torch.utils.data import TensorDataset

**Tokenizer**

In [25]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', 
                                          do_lower_case=True)

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

**Encoding**

In [26]:
# Encoding the Training data
encoded_data_train = tokenizer.batch_encode_plus(
    df[df.data_type=='train'].text.values, 
    add_special_tokens=True, 
    return_attention_mask=True, 
    pad_to_max_length=True, 
    max_length=256, 
    return_tensors='pt'
)

# Encoding the Validation data
encoded_data_val = tokenizer.batch_encode_plus(
    df[df.data_type=='val'].text.values, 
    add_special_tokens=True, 
    return_attention_mask=True, 
    pad_to_max_length=True, 
    max_length=256, 
    return_tensors='pt'
)

# Spliting the data for the BERT training
input_ids_train = encoded_data_train['input_ids']
attention_masks_train = encoded_data_train['attention_mask']
labels_train = torch.tensor(df[df.data_type=='train'].label.values)

input_ids_val = encoded_data_val['input_ids']
attention_masks_val = encoded_data_val['attention_mask']
labels_val = torch.tensor(df[df.data_type=='val'].label.values)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


**Mengubah input ke fitur yang dipahami oleh BERT**

In [27]:
# Creating two different dataset
dataset_train = TensorDataset(input_ids_train, attention_masks_train, labels_train)
dataset_val = TensorDataset(input_ids_val, attention_masks_val, labels_val)

In [28]:
len(dataset_train)

1258

In [29]:
len(dataset_val)

223

**4. Setting up BERT Pretrained Model**

In [30]:
from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(label_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

**5. Creating Data Loaders**

In [31]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

In [32]:
batch_size = 32

# We Need two different dataloder
dataloader_train = DataLoader(dataset_train, 
                              sampler=RandomSampler(dataset_train),
                              batch_size=batch_size)

dataloader_validation = DataLoader(dataset_val, 
                              sampler=RandomSampler(dataset_val),
                              batch_size=batch_size)

**6. Setting Up Optimiser and Scheduler**

In [33]:
from transformers import AdamW, get_linear_schedule_with_warmup

In [34]:
optimizer = AdamW(model.parameters(),
                  lr=1e-5, 
                  eps=1e-8)

In [35]:
epochs = 10

scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps=0,
                                            num_training_steps=len(dataloader_train)*epochs)

**7. Defining our Performance Metrics**

In [36]:
import numpy as np

In [37]:
from sklearn.metrics import f1_score

In [38]:
def f1_score_func(preds, labels):

    # Setting up the preds to axis=1
    # Flatting it to a single iterable list of array
    preds_flat = np.argmax(preds, axis=1).flatten()

    # Flattening the labels
    labels_flat = labels.flatten()

    # Returning the f1_score as define by sklearn
    return f1_score(labels_flat, preds_flat, average='weighted')

In [39]:
def accuracy_per_class(preds, labels):
    label_dict_inverse = {v: k for k, v in label_dict.items()}
    
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()

    # Iterating over all the unique labels
    # label_flat are the --> True labels
    for label in np.unique(labels_flat):
        # Taking out all the pred_flat where the True alable is the lable we care about.
        # e.g. for the label Happy -- we Takes all Prediction for true happy flag
        y_preds = preds_flat[labels_flat==label]
        y_true = labels_flat[labels_flat==label]
        print(f'Class: {label_dict_inverse[label]}')
        print(f'Accuracy: {len(y_preds[y_preds==label])}/{len(y_true)}\n')

**8. Create a training loop to control PyTorch finetuning of BERT using CPU or GPU acceleration**

In [40]:
import random

seed_val = 17
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

In [41]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

print(device)

cuda


In [42]:
def evaluate(dataloader_val):

    model.eval()
    
    loss_val_total = 0
    predictions, true_vals = [], []
    
    for batch in tqdm(dataloader_val):
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }

        with torch.no_grad():        
            outputs = model(**inputs)
            
        loss = outputs[0]
        logits = outputs[1]
        loss_val_total += loss.item()

        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)
    
    loss_val_avg = loss_val_total/len(dataloader_val) 
    
    predictions = np.concatenate(predictions, axis=0)
    true_vals = np.concatenate(true_vals, axis=0)
            
    return loss_val_avg, predictions, true_vals

In [45]:
for epoch in tqdm(range(1, epochs+1)):
    
    model.train()          
    
    loss_train_total = 0   

    # Setting up the Progress bar to Moniter the progress of training
    progress_bar = tqdm(dataloader_train, desc='Epoch {:1d}'.format(epoch), leave=False, disable=False)
    for batch in progress_bar:

        model.zero_grad() # As we not working with thew RNN's
        
        # As our dataloader has '3' iteams so batches will be the Tuple of '3'
        batch = tuple(b.to(device) for b in batch)
        
        # INPUTS
        # Pulling out the inputs in the form of dictionary
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }       

        # OUTPUTS
        outputs = model(**inputs) # '**' Unpacking the dictionary stright into the input
        
        loss = outputs[0]
        loss_train_total += loss.item()
        loss.backward()           # backpropagation

        # Gradient Clipping -- Taking the Grad. & gives it a NORM value ~ 1 
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        optimizer.step()
        scheduler.step()
        
        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
         
        
    torch.save(model.state_dict(), f'finetuned_BERT_epoch_{epoch}.model')
        
    tqdm.write(f'\nEpoch {epoch}')
    
    loss_train_avg = loss_train_total/len(dataloader_train)            
    tqdm.write(f'Training loss: {loss_train_avg}')
    
    val_loss, predictions, true_vals = evaluate(dataloader_validation)
    val_f1 = f1_score_func(predictions, true_vals)
    tqdm.write(f'Validation loss: {val_loss}')
    tqdm.write(f'F1 Score (Weighted): {val_f1}')

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch 1:   0%|          | 0/37 [00:00<?, ?it/s]


Epoch 1
Training loss: 1.184937820241258


  0%|          | 0/11 [00:00<?, ?it/s]

Validation loss: 0.8800227208571001
F1 Score (Weighted): 0.6583524464831804


Epoch 2:   0%|          | 0/37 [00:00<?, ?it/s]


Epoch 2
Training loss: 0.7992182494820775


  0%|          | 0/11 [00:00<?, ?it/s]

Validation loss: 0.7030731060288169
F1 Score (Weighted): 0.6583524464831804


Epoch 3:   0%|          | 0/37 [00:00<?, ?it/s]


Epoch 3
Training loss: 0.6369195056928171


  0%|          | 0/11 [00:00<?, ?it/s]

Validation loss: 0.6269495541399176
F1 Score (Weighted): 0.729676232764156


Epoch 4:   0%|          | 0/37 [00:00<?, ?it/s]


Epoch 4
Training loss: 0.5107632214958603


  0%|          | 0/11 [00:00<?, ?it/s]

Validation loss: 0.5763687572695992
F1 Score (Weighted): 0.7670198919156909


Epoch 5:   0%|          | 0/37 [00:00<?, ?it/s]


Epoch 5
Training loss: 0.4409492164850235


  0%|          | 0/11 [00:00<?, ?it/s]

Validation loss: 0.5318177884275263
F1 Score (Weighted): 0.8066912282149007


Epoch 6:   0%|          | 0/37 [00:00<?, ?it/s]


Epoch 6
Training loss: 0.38097117982200673


  0%|          | 0/11 [00:00<?, ?it/s]

Validation loss: 0.5054571032524109
F1 Score (Weighted): 0.8004495144045174


Epoch 7:   0%|          | 0/37 [00:00<?, ?it/s]


Epoch 7
Training loss: 0.328777547220926


  0%|          | 0/11 [00:00<?, ?it/s]

Validation loss: 0.5139308165420186
F1 Score (Weighted): 0.8017811294659648


Epoch 8:   0%|          | 0/37 [00:00<?, ?it/s]


Epoch 8
Training loss: 0.3082683185065115


  0%|          | 0/11 [00:00<?, ?it/s]

Validation loss: 0.5894641822034662
F1 Score (Weighted): 0.8013017738784203


Epoch 9:   0%|          | 0/37 [00:00<?, ?it/s]


Epoch 9
Training loss: 0.28850736630124013


  0%|          | 0/11 [00:00<?, ?it/s]

Validation loss: 0.5135540975765749
F1 Score (Weighted): 0.7988953364467442


Epoch 10:   0%|          | 0/37 [00:00<?, ?it/s]


Epoch 10
Training loss: 0.2673632361598917


  0%|          | 0/11 [00:00<?, ?it/s]

Validation loss: 0.5247098817066713
F1 Score (Weighted): 0.8040043057369044


**9. Loading finetuned BERT model and evaluate its performance**

In [46]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(label_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

model.to(device)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

In [47]:
model.load_state_dict(torch.load('/content/finetuned_BERT_epoch_10.model', map_location=torch.device('cpu')))

<All keys matched successfully>

In [48]:
_, predictions, true_vals = evaluate(dataloader_validation)

  0%|          | 0/11 [00:00<?, ?it/s]

In [49]:
accuracy_per_class(predictions, true_vals)

Class: happy
Accuracy: 240/249

Class: not-relevant
Accuracy: 22/45

Class: angry
Accuracy: 8/16

Class: disgust
Accuracy: 0/1

Class: sad
Accuracy: 0/8

Class: surprise
Accuracy: 1/8

