# Análisis de sentimientos con BERT
In these example we gonna use BERT model for make a binary classification, about how possitive or negative are some comments about some films. This example can be extrapolated to any case of binary classification where the input of the problem is based on natural text, where the context it's really important.

We have extracted BERT model from Hugging Face page, in this big repository exists a lot of models based on text processing.

[Hugging Face](https://huggingface.co/)

![BERT análisis sentimientos](https://drive.google.com/uc?export=view&id=1UwciEQKNZ4SoXn_c0l31hsyZ-8jLdtVf)

In this example we'll modelate a neural network one level upper, this NN recibe bert output (embeeds-> 768 dimensions) as input and generate 2 Dimension output, one per each class that we want to predict.

## Installing some librarys
First of all we should install transformers library.

In [1]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/d8/b2/57495b5309f09fa501866e225c84532d1fd89536ea62406b2181933fb418/transformers-4.5.1-py3-none-any.whl (2.1MB)
[K     |████████████████████████████████| 2.1MB 10.7MB/s 
[?25hCollecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/08/cd/342e584ee544d044fb573ae697404ce22ede086c9e87ce5960772084cad0/sacremoses-0.0.44.tar.gz (862kB)
[K     |████████████████████████████████| 870kB 36.9MB/s 
Collecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/ae/04/5b870f26a858552025a62f1649c20d29d2672c02ff3c3fb4c688ca46467a/tokenizers-0.10.2-cp37-cp37m-manylinux2010_x86_64.whl (3.3MB)
[K     |████████████████████████████████| 3.3MB 37.1MB/s 
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
  Created wheel for sacremoses: filename=sacremoses-0.0.44-cp37-none-any.whl size=886084 sha256=87fb

## Imports
Short description for all packages that will be used in this project:

- BertModel: pretrained BertModel.
- BertTokenizer: used for transform text to Tokens that will be the input of BertModel.
- AdamW: optimizer function for gradient descent.
- get_linear_shedule_with_warmup: 
- torch: to realize all comput operations.
- train_test_split: split dataset between train and test.
- wrap: for visualize text on screen in a prety format.

In [2]:
from transformers import BertModel, BertTokenizer, AdamW, get_linear_schedule_with_warmup
import torch
import numpy as np
from sklearn.model_selection import train_test_split
from torch import nn, optim
from torch.utils.data import Dataset, DataLoader
import pandas as pd
from textwrap import wrap

## Variable definitions
Define some parameters:
- RANDOM_SEED: define random every time that we reexecute program will use same seed and then same results.
- MAX_LEN: max text size processing.
- BATCH_SIZE: size of part of input data that we will compute on the same time.
- DATASET_PATH: dataset, this is own by each model proposal (in this case coments with labels that refers if is good or not)
- NCLASSES: number of classes that own model will have.

Select cuda device if it's avaliable, it's necessary for train if don't want to expend too much time.
As in this example you can use googlecolab for use free GPU.

In [3]:
# Initialitzation
RANDOM_SEED = 42
MAX_LEN = 200
BATCH_SIZE = 16
DATASET_PATH = '/content/drive/My Drive/BERT/IMDB_Dataset.csv'
NCLASSES = 2

np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


## Load data
For more commodites we will load data from drive and read it with pandas library

In [4]:
# Load dataset
from google.colab import drive
drive.mount('/content/drive')

df = pd.read_csv(DATASET_PATH)
df = df[0:10000]

Mounted at /content/drive


To reducce training time, we only gonna use  10000 rows if you don't care about computing time you can remove this last part of code.

In [5]:
print(df.head())
print(df.shape)
print("\n".join(wrap(df['review'][200])))


                                              review sentiment
0  One of the other reviewers has mentioned that ...  positive
1  A wonderful little production. <br /><br />The...  positive
2  I thought this was a wonderful way to spend ti...  positive
3  Basically there's a family where a little boy ...  negative
4  Petter Mattei's "Love in the Time of Money" is...  positive
(10000, 2)
Interesting and short television movie describes some of the
machinations surrounding Jay Leno's replacing Carson as host of the
Tonight Show. Film is currently very topical given the public drama
surrounding Conan O'Brien and Jay Leno.<br /><br />The film does a
good job of sparking viewers' interest in the events and showing some
of the concerns of the stakeholders, particularly of the NBC
executives. The portrayal of Ovitz was particularly compelling and
interesting, I thought.<br /><br />Still, many of the characters were
only very briefly limned or touched upon, and some of the acting
seemed perfunc

We need convert target column to binary value (0 or 1), cause algorithm needs these type of target column.

In [6]:
df['label'] = (df['sentiment']=='positive').astype(int)
df.drop('sentiment', axis=1, inplace=True)
df.head()

Unnamed: 0,review,label
0,One of the other reviewers has mentioned that ...,1
1,A wonderful little production. <br /><br />The...,1
2,I thought this was a wonderful way to spend ti...,1
3,Basically there's a family where a little boy ...,0
4,"Petter Mattei's ""Love in the Time of Money"" is...",1


## Preparing data
The most important part of this example is to prepare data in a format that bert can interpretate.
First of all we should understand what tokenization task is it. This tokenizer will convert sentences to numbers that bert will interpretate and convert to embeeds.

In [7]:
# TOKENIZATION
PRE_TRAINED_MODEL_NAME = 'bert-base-cased'
tokenizer = BertTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=29.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435797.0, style=ProgressStyle(descripti…




To throw some light to tokenization, let's see one simple example about how tokenizer converts one input sentence to vector tokens.

In [8]:
# Tokenization example
sample_txt = 'I really loved that movie!'
tokens = tokenizer.tokenize(sample_txt)
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print('Frase: ', sample_txt)
print('Tokens: ', tokens)
print('Tokens numéricos: ', token_ids)

Frase:  I really loved that movie!
Tokens:  ['I', 'really', 'loved', 'that', 'movie', '!']
Tokens numéricos:  [146, 1541, 3097, 1115, 2523, 106]


BERT model needs one special format data to works fine. To optimize computation time, BERT works with data vector of same lenght it's important to take care about this, cause we will add some padding tokens to maintain the same lenght. To exclude this parts of analysis we will use attention_mask to tell BERT model will know if this part is relevant or not.

In [10]:
# Codification to BERT example
encoding = tokenizer.encode_plus(
    sample_txt,
    max_length = 10,
    truncation = True,
    add_special_tokens = True,
    return_token_type_ids = False,
    padding = True,
    return_attention_mask = True,
    return_tensors = 'pt'
)



In [11]:
encoding.keys()

dict_keys(['input_ids', 'attention_mask'])

Printing encoding example:

In [12]:
print(tokenizer.convert_ids_to_tokens(encoding['input_ids'][0]))
print(encoding['input_ids'][0])
print(encoding['attention_mask'][0])

['[CLS]', 'I', 'really', 'loved', 'that', 'movie', '!', '[SEP]', '[PAD]', '[PAD]']
tensor([ 101,  146, 1541, 3097, 1115, 2523,  106,  102,    0,    0])
tensor([1, 1, 1, 1, 1, 1, 1, 1, 0, 0])


BERT model needs structured data that derivate from torch.Dataset where we only need to implement 3 functions:
- __init__: initialitzation dataset
- __len__: return len of dataset
- __getitem__: return data that will enter to BERT model.

In [13]:
# DATASET creation

class IMDBDataset(Dataset):

    def __init__(self,reviews,labels,tokenizer,max_len):
        self.reviews = reviews
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.reviews)

    def __getitem__(self, item):
        review = str(self.reviews[item])
        label = self.labels[item]
        encoding = tokenizer.encode_plus(
            review,
            max_length = self.max_len,
            truncation = True,
            add_special_tokens = True,
            return_token_type_ids = False,
            padding = True,
            return_attention_mask = True,
            return_tensors = 'pt'
            )


        return {
              'review': review,
              'input_ids': encoding['input_ids'].flatten(),
              'attention_mask': encoding['attention_mask'].flatten(),
              'label': torch.tensor(label, dtype=torch.long)
          } 



To have clear code we define function that will create dataset from  DataLoader.

In [14]:
# Data loader:

def data_loader(df, tokenizer, max_len, batch_size):
    dataset = IMDBDataset(  
      reviews = df.review.to_numpy(),
      labels = df.label.to_numpy(),
      tokenizer = tokenizer,
      max_len = MAX_LEN)  

    return DataLoader(dataset, batch_size = BATCH_SIZE, num_workers = 4)

In all machine learning algorithms it's important to split dataset on train and test. Train dataset it's used for train model and test dataset will be the data that we will use for define how good it's our model.

In [15]:
df_train, df_test = train_test_split(df, test_size = 0.2, random_state=RANDOM_SEED)

train_data_loader = data_loader(df_train, tokenizer, MAX_LEN, BATCH_SIZE)
test_data_loader = data_loader(df_test, tokenizer, MAX_LEN, BATCH_SIZE)

  cpuset_checked))


## Modelling
It's easy to create one new model based on BERT, we conli should define our base model (BERT), and added layers. In this case we add dropout layer that will make that our model avoid overfits by hidding some neural perceptrons on each time that we load data.
Create neural network that will be the power of our model, that will recibe BERT output as input and will have 2 neurons as output, one per each class to predict.

In [16]:
class BERTSentimentClassifier(nn.Module):

    def __init__(self, n_classes):
        super(BERTSentimentClassifier, self).__init__()
        self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME)
        self.drop = nn.Dropout(p=0.3)
        self.linear = nn.Linear(self.bert.config.hidden_size, n_classes)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(
            input_ids = input_ids,
            attention_mask = attention_mask
        )
        drop_output = self.drop(outputs[1])
        output = self.linear(drop_output)
        return output




Create Model and send it to device.

Printing model you can see that BERT model implements a lot of layers and at the end were our two added layers

In [17]:
model = BERTSentimentClassifier(NCLASSES)
model = model.to(device)
print(model)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435779157.0, style=ProgressStyle(descri…


BERTSentimentClassifier(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(28996, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_

Let's define some interesting parameters that we will use to train our model:
- EPOCHS: number of iterations that our model will train with whole training dataset.
- Optimizer: optimizer function.
- total_steps: total number of examples to process.
- scheduler: this will change our lerning rate of our optimizer to improve our error, and have more robust model.
- loss_fn: function that we want to optimize

In [18]:
# ENTRENAMIENTO
EPOCHS = 5
optimizer = AdamW(model.parameters(), lr=2e-5, correct_bias=False)
total_steps = len(train_data_loader) * EPOCHS
scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps = 0,
    num_training_steps = total_steps
)
loss_fn = nn.CrossEntropyLoss().to(device)

Train fucntion and validation function basically the same only with two diferences, when we train we should put our model to train mode, wich let us make some weight rectification, mean while when we evaluate our model run as eval mode, wich freezze our weights.
Last  5 th rows of trrain_epoch's function are dedicated to upgrade this weights.

In [19]:
# TRAINING EPOCH
def train_epoch(model, data_loader, loss_fn, optimizer, device, scheduler, n_examples):
    model = model.train()
    losses = []
    correct_predictions = 0
    for batch in data_loader:
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['label'].to(device)
        outputs = model(input_ids = input_ids, attention_mask = attention_mask)
        _, preds = torch.max(outputs, dim=1)
        loss = loss_fn(outputs, labels)
        correct_predictions += torch.sum(preds == labels)
        losses.append(loss.item())
        loss.backward()
        nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()
    return correct_predictions.double()/n_examples, np.mean(losses)

# EVALUATING MODEL
def eval_model(model, data_loader, loss_fn, device, n_examples):
    model = model.eval()
    losses = []
    correct_predictions = 0
    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['label'].to(device)
            outputs = model(input_ids = input_ids, attention_mask = attention_mask)
            _, preds = torch.max(outputs, dim=1)
            loss = loss_fn(outputs, labels)
            correct_predictions += torch.sum(preds == labels)
            losses.append(loss.item())
    return correct_predictions.double()/n_examples, np.mean(losses)

For each epoch train model and validate it, printing results on screen. If you want to let our alogtrithm training a lot of time and only store best of them, you should store model at the end of each epoh where the results are better, in this case lower Loss and high accuracy.

In [None]:
# TRAINING!!!

for epoch in range(EPOCHS):
    print('Epoch {} de {}'.format(epoch+1, EPOCHS))
    print('------------------')
    train_acc, train_loss = train_epoch(
      model, train_data_loader, loss_fn, optimizer, device, scheduler, len(df_train)
    )
    test_acc, test_loss = eval_model(
      model, test_data_loader, loss_fn, device, len(df_test)
    )
    print('Training: Loss: {}, accuracy: {}'.format(train_loss, train_acc))
    print('Validation: Loss: {}, accuracy: {}'.format(test_loss, test_acc))
    print('')

Epoch 1 de 5
------------------


  cpuset_checked))


Training: Loss: 0.40202889354526994, accuracy: 0.816375
Validation: Loss: 0.2888660610467195, accuracy: 0.8855000000000001

Epoch 2 de 5
------------------




Training: Loss: 0.18809325058874674, accuracy: 0.933125
Validation: Loss: 0.4618920519771054, accuracy: 0.883

Epoch 3 de 5
------------------




This funtion recibe as input one review and predict if it's possitive or negative sentiment as output.

In [None]:
def classifySentiment(review_text):
    encoding_review = tokenizer.encode_plus(
      review_text,
      max_length = MAX_LEN,
      truncation = True,
      add_special_tokens = True,
      return_token_type_ids = False,
      padding = True,
      return_attention_mask = True,
      return_tensors = 'pt'
      )

    input_ids = encoding_review['input_ids'].to(device)
    attention_mask = encoding_review['attention_mask'].to(device)
    output = model(input_ids, attention_mask)
    _, prediction = torch.max(output, dim=1)
    print("\n".join(wrap(review_text)))
    if prediction:
    print('Predicted sentiment: 5')
    else:
    print('Predicted sentiment: 1')


  

In [None]:
review_text = "Avengers: Infinity War at least had the good taste to abstain from Jeremy Renner. No such luck in Endgame."

classifySentiment(review_text)