Installing transformer using pip package

In [33]:
!pip install transformers




Basic imports

In [32]:
import torch
import random 
import numpy as np 


The transformers library has tokenizers for each of the transformer models provided. In this case we are using the BERT model which ignores casing (i.e. will lower case every word). We get this by loading the pre-trained bert-base-uncased tokenizer.

In [34]:
from transformers import BertTokenizer

In [35]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

In [36]:
len(tokenizer.vocab)

30522

This is how bert-base-uncased tokenizer tokenizes a string

In [37]:
tokens = tokenizer.tokenize("macHine LEArning")
print(tokens)

['machine', 'learning']


In [38]:
id = tokenizer.convert_tokens_to_ids(tokens)
print(id)

[3698, 4083]


The transformer was also trained with special tokens to mark the beginning and end of the sentence. As well as a standard padding and unknown token. We can also get these from the tokenizer.

In [39]:

init_token = tokenizer.cls_token
eos_token = tokenizer.sep_token
pad_token = tokenizer.pad_token
unk_token = tokenizer.unk_token

print(init_token, eos_token, pad_token, unk_token)

[CLS] [SEP] [PAD] [UNK]


ids of init , eos , padding and unk tokens

In [40]:
init_token_idx = tokenizer.convert_tokens_to_ids(init_token)
eos_token_idx = tokenizer.convert_tokens_to_ids(eos_token)
pad_token_idx = tokenizer.convert_tokens_to_ids(pad_token)
unk_token_idx = tokenizer.convert_tokens_to_ids(unk_token)

print(init_token_idx, eos_token_idx, pad_token_idx, unk_token_idx)

101 102 0 100


the transformer model was trained on fix maximum length. 

In [41]:

max_input_length = tokenizer.max_model_input_sizes['bert-base-uncased']

print(max_input_length)

512


preprocessing function for our text to tokenize and cut it as per the max length. 
Note: Max_length-2 because we have init token and eos token  at the start and end.

In [42]:

def tokenize_and_cut(sentence):
    tokens = tokenizer.tokenize(sentence) 
    tokens = tokens[:max_input_length-2]
    return tokens

Defining text field and label field. Label field is defined as long since multiclass categorical variable doesnt take float as a datatype

In [43]:

from torchtext import data

TEXT = data.Field(batch_first = True,
                  use_vocab = False,
                  tokenize = tokenize_and_cut,
                  preprocessing = tokenizer.convert_tokens_to_ids,
                  init_token = init_token_idx,
                  eos_token = eos_token_idx,
                  pad_token = pad_token_idx,
                  unk_token = unk_token_idx)

LABEL = data.LabelField(dtype = torch.long)

splitting the dataset into trainset , test set and validset

In [44]:

from torchtext import datasets

train_data, test_data = datasets.TREC.splits(TEXT, LABEL)

train_data, valid_data = train_data.split(random_state = random.seed(1234))

number of samples in training , testing and validation

In [45]:
print(f"Number of training examples: {len(train_data)}")
print(f"Number of validation examples: {len(valid_data)}")
print(f"Number of testing examples: {len(test_data)}")

Number of training examples: 3816
Number of validation examples: 1636
Number of testing examples: 500


In [46]:
LABEL.build_vocab(train_data)

preparing iterators as per the batch size . For this job we use bucketiterator

In [47]:

BATCH_SIZE = 128

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size = BATCH_SIZE, 
    device = device)

loading our pretrained model. Note we load same model as our tokenizer

In [48]:
from transformers import BertTokenizer, BertModel

bert = BertModel.from_pretrained('bert-base-uncased')

defining our model. we will be using Biderectional , 2 layered  GRU mode.

In [49]:
import torch.nn as nn

class BERTGRUSentiment(nn.Module):
    def __init__(self,
                 bert,
                 hidden_dim,
                 output_dim,
                 n_layers,
                 bidirectional,
                 dropout):
        
        super().__init__()
        
        self.bert = bert
        
        embedding_dim = bert.config.to_dict()['hidden_size']
        
        self.rnn = nn.GRU(embedding_dim,
                          hidden_dim,
                          num_layers = n_layers,
                          bidirectional = bidirectional,
                          batch_first = True,
                          dropout = 0 if n_layers < 2 else dropout)
        
        self.out = nn.Linear(hidden_dim * 2 if bidirectional else hidden_dim, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text):
        
        #text = [batch size, sent len]
                
        with torch.no_grad():
            embedded = self.bert(text)[0]
                
        #embedded = [batch size, sent len, emb dim]
        
        _, hidden = self.rnn(embedded)
        
        #hidden = [n layers * n directions, batch size, emb dim]
        
        if self.rnn.bidirectional:
            hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
        else:
            hidden = self.dropout(hidden[-1,:,:])
                
        #hidden = [batch size, hid dim]
        
        output = self.out(hidden)
        
        #output = [batch size, out dim]
        
        return output


declaring hyperparameters of our model and instantiating our model.

In [50]:
HIDDEN_DIM = 256
OUTPUT_DIM = len(LABEL.vocab)
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5

model = BERTGRUSentiment(bert,
                         HIDDEN_DIM,
                         OUTPUT_DIM,
                         N_LAYERS,
                         BIDIRECTIONAL,
                         DROPOUT)

freezing the pretrained parameters since we are using transfer learning.

In [51]:
for name, param in model.named_parameters():                
    if name.startswith('bert'):
        param.requires_grad = False

defining our optimizer and our loss function

In [52]:
import torch.optim as optim

optimizer = optim.Adam(model.parameters())

In [53]:
criterion = nn.CrossEntropyLoss()
model = model.to(device)
criterion = criterion.to(device)

accuracy function

In [54]:
def categorical_accuracy(preds, y):

    max_preds = preds.argmax(dim = 1, keepdim = True) 
    correct = max_preds.squeeze(1).eq(y)
    return correct.sum() / torch.FloatTensor([y.shape[0]])

Training Function

In [55]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        
        predictions = model(batch.text)
        
        loss = criterion(predictions, batch.label)
        
        acc = categorical_accuracy(predictions, batch.label)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)


Evaluation function

In [56]:
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            predictions = model(batch.text).squeeze(1)
            
            loss = criterion(predictions, batch.label)
            
            acc = categorical_accuracy(predictions, batch.label)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

Function to test elapsed time in each epochs during training

In [57]:
import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

Training...

In [58]:

N_EPOCHS = 5

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'tut5-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

Epoch: 01 | Epoch Time: 0m 17s
	Train Loss: 1.055 | Train Acc: 58.90%
	 Val. Loss: 0.535 |  Val. Acc: 81.24%
Epoch: 02 | Epoch Time: 0m 17s
	Train Loss: 0.432 | Train Acc: 84.87%
	 Val. Loss: 0.415 |  Val. Acc: 85.91%
Epoch: 03 | Epoch Time: 0m 18s
	Train Loss: 0.317 | Train Acc: 89.08%
	 Val. Loss: 0.417 |  Val. Acc: 86.84%
Epoch: 04 | Epoch Time: 0m 17s
	Train Loss: 0.267 | Train Acc: 90.48%
	 Val. Loss: 0.444 |  Val. Acc: 85.87%
Epoch: 05 | Epoch Time: 0m 17s
	Train Loss: 0.222 | Train Acc: 92.50%
	 Val. Loss: 0.383 |  Val. Acc: 86.92%


Loading the best performing model and using it in our test case .

In [59]:
model.load_state_dict(torch.load('tut5-model.pt'))

test_loss, test_acc = evaluate(model, test_iterator, criterion)

print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

Test Loss: 0.277 | Test Acc: 91.88%


Hurrayy! we achieved close to 92% accuracy for our test case

function to predict sentiment of our custom statement using our model

In [62]:

def predict_class(model, tokenizer, sentence):
    model.eval()
    tokens = tokenizer.tokenize(sentence)
    tokens = tokens[:max_input_length-2]
    indexed = [init_token_idx] + tokenizer.convert_tokens_to_ids(tokens) + [eos_token_idx]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(0)
    prediction = model(tensor).argmax(dim=1)
    return prediction.item()

Testing our custom statement

In [67]:
pred_class = predict_class(model,tokenizer, "Who is Kushal?")
print(f'Predicted class is: {pred_class} = {LABEL.vocab.itos[pred_class]}')

Predicted class is: 0 = HUM


In [68]:
pred_class = predict_class(model,tokenizer, "How many hours are there in a day?")
print(f'Predicted class is: {pred_class} = {LABEL.vocab.itos[pred_class]}')

Predicted class is: 3 = NUM


In [69]:
pred_class = predict_class(model,tokenizer, "Where is India?")
print(f'Predicted class is: {pred_class} = {LABEL.vocab.itos[pred_class]}')

Predicted class is: 4 = LOC


In [70]:

pred_class = predict_class(model,tokenizer, "What does UNO stands for?")
print(f'Predicted class is: {pred_class} = {LABEL.vocab.itos[pred_class]}')

Predicted class is: 5 = ABBR
