<a href="https://colab.research.google.com/github/ITU-Business-Analytics-Team/Business_Analytics_for_Professionals/blob/main/Part%20I%20%3A%20Methods%20%26%20Technologies%20for%20Business%20Analytics/Chapter%207%3A%20Text%20Analytics/7_6_3_Deep_Learning_Based_Sentiment_Analysis_Bert.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Sentiment Analysis (Opinion Mining)**
## Deep Learning Based Sentiment Analysis

The sentiment analysis of commodity news task are previously investigated under statistical methods. Since it is also mentioned at the end of that notebook, sentiment analysis with a few instances (we have only 1120 news total for train and test) is a complex problem. In the latest years, research on NLP models advanced and produced some high quality classification models, most of them are deep learning based methods. In this notebook, the BERT approach will be introduced. 

### BERT

BERT is developed by Google to use in Google Search and Translate and is based on transformers architecture. It is possible to implement BERT with PyTorch or TensorFlow. In this notebook, it will be implemented with PyTorch. First, we need to ensure we have necessary libraries.

In [None]:
!pip install transformers
!pip install torch torchvision torchaudio

Collecting transformers
  Downloading transformers-4.12.3-py3-none-any.whl (3.1 MB)
[K     |████████████████████████████████| 3.1 MB 16.0 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.46-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 44.6 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 33.2 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.1.1-py3-none-any.whl (59 kB)
[K     |████████████████████████████████| 59 kB 6.3 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 65.0 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attempting

Now, we can import them to use.

In [None]:
# for the deep learning implementation
import torch
# to work on the dataset
import pandas as pd
# to follow progress as bar in notebook
from tqdm.notebook import tqdm

Since different deep learning models will be compared, a test dataset is prepared before and will be used across other deep learning based sentiment analysis models. To increase the model performance, some noisy words from dataset such as 'News: ', 'UPDATE', 'METALS-' will be excluded.

In [None]:
# read the data  
url=   'https://docs.google.com/spreadsheets/d/1XXyxrd7r0mx7kyLaYHDVwh6BFJzo8cPD/edit?usp=sharing&ouid=108589602591644119588&rtpof=true&sd=true'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]

df = pd.read_excel(path)
df['summary'] = df['summary'].map(lambda x: x.lstrip('News :'))
df['summary'] = df['summary'].map(lambda x: x.lstrip('UPDATE'))
df['summary'] = df['summary'].map(lambda x: x.lstrip('METALS-'))
df.rename(columns={'summary':'text'}, inplace = True)

In [None]:
url=   'https://docs.google.com/spreadsheets/d/145tqf2J949KGCYnH-Nx3hiaHTogiZFn4/edit?usp=sharing&ouid=108589602591644119588&rtpof=true&sd=true'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
test_df = pd.read_excel(path)
test_df['summary'] = test_df['summary'].map(lambda x: x.lstrip('News :'))
test_df['summary'] = test_df['summary'].map(lambda x: x.lstrip('UPDATE'))
test_df['summary'] = test_df['summary'].map(lambda x: x.lstrip('METALS-'))
test_df.rename(columns={'summary':'text'}, inplace = True)
test_df

Unnamed: 0,text,sentiment
0,Copper at near 2-week highs on hopes China imp...,0
1,"China's Yunnan to help firms stockpile 110,000...",1
2,COLUMN-Politics trumps aluminium as U.S. reimp...,-1
3,Base metals decline on weak China demand outlook,-1
4,"ALUMINIUM FALLS TO $1,751.50/T, LOWEST SINCE...",-1
...,...,...
163,China names former Chinalco exec as industry m...,1
164,Copper edges off two-year low as Washington so...,0
165,"Uncertainty on global growth, trade war weighs...",-1
166,Copper gains after Fed chief rekindles rate cu...,1


The dataset should be analyzed as complete, so merge two dataset carefully with checking potential duplicates.

In [None]:
df = df.drop_duplicates().merge(test_df.drop_duplicates(), on=test_df.columns.to_list(), 
                   how='left', indicator=True, right_index = False, left_index = False)
df = df.loc[df._merge=='left_only',df.columns!='_merge']
df = df.reset_index(drop = True, inplace= False)

As seen in statistical methods based sentiment analysis, the dataset skewed on positive news. We can check it again with our new test dataset.

In [None]:
df.sentiment.value_counts()

 1    486
-1    366
 0     64
Name: sentiment, dtype: int64

PyTorch produces class labels as non-negative integers, however, we have -1 within our labels to represent negative news. In order to avoid a potential conflict, we can renumerate our classification as below.

In [None]:
possible_labels = df.sentiment.unique()
label_dict = {}
for index, possible_label in enumerate(possible_labels):
    label_dict[possible_label] = index

To make it our classification more intuitive, we can replace sentiment with ***'label'*** column name.

In [None]:
df['label'] = df.sentiment.replace(label_dict)
test_df['label'] = test_df.sentiment.replace(label_dict)

We will need data_type column to differentiate datasets.

In [None]:
df['data_type'] = ['train']*df.shape[0]
test_df['data_type'] = ['val']*test_df.shape[0]

Which label interpret which sentiment can be seen as follows:

In [None]:
df.groupby(['sentiment', 'label', 'data_type']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,text
sentiment,label,data_type,Unnamed: 3_level_1
-1,1,train,366
0,2,train,64
1,0,train,486


Since the dataset is skewed, in order to make accurate comparison on each class, the same distribution should be preserved over test and train dataset. Lets check it.

In [None]:
test_df.groupby(['sentiment', 'label', 'data_type']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,text
sentiment,label,data_type,Unnamed: 3_level_1
-1,1,val,65
0,2,val,12
1,0,val,91


The test and train datasets are ready to use. So, we continue with implementation of BERT. First, BertTokenizer will be imported for processing the tokens (words) and TensorDataset to edit datatype of the train and test datasets.

In [None]:
from transformers import BertTokenizer
from torch.utils.data import TensorDataset

'bert-base-uncased' token type will be used. There are also other pretrained tokens, which can be found: https://huggingface.co/transformers/pretrained_models.html

In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

As a next step, we need to convert our train and test datasets to 'tensorflow-style' encoded batches. There are some hyperparameters:
- **add_special_tokens:** Tokens to indicate special circumstances such as stop words.
- **return_attention_mask:** Transformers models utilize attention mask algorithm as in 'All you need is attention' paper. To make the process of how our model classify news understandable, we can analyse attention masks.
- **pad_to_max_length:** Each news could be in different length (i.e consists different number of words). We can pad shorter ones with special tokens (padding token) to maximum length in our dataset which is 70. 

In [None]:
encoded_data_train = tokenizer.batch_encode_plus(
    df.text.values, 
    add_special_tokens=True, 
    return_attention_mask=True, 
    pad_to_max_length=True, 
    max_length=70, 
    return_tensors='pt'
)

encoded_data_val = tokenizer.batch_encode_plus(
    test_df.text.values, 
    add_special_tokens=True, 
    return_attention_mask=True, 
    pad_to_max_length=True, 
    max_length=70, 
    return_tensors='pt'
)


input_ids_train = encoded_data_train['input_ids']
attention_masks_train = encoded_data_train['attention_mask']
labels_train = torch.tensor(df.label.values)

input_ids_val = encoded_data_val['input_ids']
attention_masks_val = encoded_data_val['attention_mask']
labels_val = torch.tensor(test_df.label.values)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


In [None]:
dataset_train = TensorDataset(input_ids_train, attention_masks_train, labels_train)
dataset_val = TensorDataset(input_ids_val, attention_masks_val, labels_val)

In [None]:
len(dataset_train)

916

In [None]:
len(dataset_val)

168

We created tensor datasets. Next thing we need from Bert library is the complete deep learning architecture which is pretrained with bert-base-uncased tokens.

In [None]:
from transformers import BertForSequenceClassification

In [None]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(label_dict),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

While getting closer to end of our implementation, some other helper functions are needed to feed created datasets to pretrained model as batches with 32 instances in for each.

In [None]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

In [None]:
batch_size = 32

dataloader_train = DataLoader(dataset_train, 
                              sampler=RandomSampler(dataset_train), 
                              batch_size=batch_size)

dataloader_validation = DataLoader(dataset_val, 
                                   sampler=SequentialSampler(dataset_val), 
                                   batch_size=batch_size)

All deep learning models benefit from a cost function and optimizer. Adam is one of the robust optimizers regarding researches, so we will use it.

In [None]:
from transformers import AdamW, get_linear_schedule_with_warmup

In [None]:
optimizer = AdamW(model.parameters(),
                  lr=2e-5, 
                  eps=1e-8)

We will train our model in 10 steps and will use the most accurate epoch.

In [None]:
epochs = 10

scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps=0,
                                            num_training_steps=len(dataloader_train)*epochs)

To decide which epoch has superior performance and overall model accuracy, we will use some metrics. Since our dataset is imbalanced, accuracy can not be only metric we use to examine the performance. Addition to accuracy, we will leverage f1-score.

In [None]:
import numpy as np
from sklearn.metrics import f1_score

In [None]:
def f1_score_func(preds, labels):
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return f1_score(labels_flat, preds_flat, average='weighted')

In [None]:
def accuracy_per_class(preds, labels, test=False):
    label_dict_inverse = {v: k for k, v in label_dict.items()}
    
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()

    overall_acc = 0
    for label in np.unique(labels_flat):
        y_preds = preds_flat[labels_flat==label]
        y_true = labels_flat[labels_flat==label]
        print(f'Class: {label_dict_inverse[label]}')
        acc = (len(y_preds[y_preds==label])/len(y_true))*100
        overall_acc += acc * len(y_preds)
        print(f'Accuracy: {acc}\n')
    if (test==False):
        print(f'Overall Accuracy: {overall_acc/len(dataset_val)}\n')
    else:
        print(f'Overall Accuracy: {overall_acc/len(dataset_test)}\n')
    

While developing machine learning models, we should be sure about reproducibility. Thus, we will set some seeds which stabilize randomness of the model.

In [None]:
import random

seed_val = 17
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

PyTorch could work on cuda or cpu. Cuda is the faster option but it may not be available for everyone. So, we will continue with cpu but if your device has GPU, following code piece will utilize it.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

print(device)

cpu


Following function evaluates the performance of model and returns it after each epoch. 

In [None]:
def evaluate(dataloader_val):

    model.eval()
    
    loss_val_total = 0
    predictions, true_vals = [], []
    
    for batch in dataloader_val:
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }

        with torch.no_grad():        
            outputs = model(**inputs)
            
        loss = outputs[0]
        logits = outputs[1]
        loss_val_total += loss.item()

        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)
    
    loss_val_avg = loss_val_total/len(dataloader_val) 
    
    predictions = np.concatenate(predictions, axis=0)
    true_vals = np.concatenate(true_vals, axis=0)
            
    return loss_val_avg, predictions, true_vals

As a final step in BERT implementation, we will train our model with epochs and evaluates its performance.

In [None]:
for epoch in tqdm(range(1, epochs+1)):
    
    model.train()
    
    loss_train_total = 0

    progress_bar = tqdm(dataloader_train, desc='Epoch {:1d}'.format(epoch), leave=False, disable=False)
    for batch in progress_bar:

        model.zero_grad()
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }       

        outputs = model(**inputs)
        
        loss = outputs[0]
        loss_train_total += loss.item()
        loss.backward()

        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        optimizer.step()
        scheduler.step()
        
        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
         
        
    torch.save(model.state_dict(), f'finetuned_METALNEWS_BERT_epoch_{epoch}.model')
        
    tqdm.write(f'\nEpoch {epoch}')
    
    loss_train_avg = loss_train_total/len(dataloader_train)             
    tqdm.write(f'Training loss: {loss_train_avg}')
    
    val_loss, predictions, true_vals = evaluate(dataloader_validation)
    val_f1 = f1_score_func(predictions, true_vals)
    tqdm.write(f'Validation loss: {val_loss}')
    tqdm.write(f'F1 Score (Weighted): {val_f1}')

  0%|          | 0/10 [00:00<?, ?it/s]

Epoch 1:   0%|          | 0/29 [00:00<?, ?it/s]


Epoch 1
Training loss: 0.9555424636807935
Validation loss: 0.8733222484588623
F1 Score (Weighted): 0.4765416065416065


Epoch 2:   0%|          | 0/29 [00:00<?, ?it/s]


Epoch 2
Training loss: 0.8348842633181605
Validation loss: 0.7842856645584106
F1 Score (Weighted): 0.670002300437083


Epoch 3:   0%|          | 0/29 [00:00<?, ?it/s]


Epoch 3
Training loss: 0.7018413276507937
Validation loss: 0.7349705497423807
F1 Score (Weighted): 0.7024582560296846


Epoch 4:   0%|          | 0/29 [00:00<?, ?it/s]

In [None]:
accuracy_per_class(predictions, true_vals)

Due to randomization in optimization algorithms, each runtime could lead to different results. In order to avoid that, the algorithm could be run multiple times and results could be averaged. Below there is a shared pre-trained model for this problem. You can continue and import weights and biases from this model or you can obtain it by running the algorithm until you obtain a well-performed model.

In [None]:
import gdown
url = "https://drive.google.com/uc?id=1nJSo1L5_aTyyaWRu0RnYSxJla0URPmdG"
output = 'finetuned_METALNEWS_BERT_epoch_10.model'
gdown.download(url, output, quiet=False)

In [None]:
model.load_state_dict(torch.load('/content/finetuned_METALNEWS_BERT_epoch_10.model',map_location=torch.device('cpu')))
_, predictions, true_vals = evaluate(dataloader_validation)

In [None]:
accuracy_per_class(predictions, true_vals)

The improvement in accuracy can be seen clearly with a 73.81% compared 67.9% in neural networks.

In [None]:
f1_score_func(predictions, true_vals)