# Benchmarking LSTMs

This notebook contains experiments on 3 different sequence tasks:
* __Sequence Labelling__
    * A many-to-one, _Sentiment Analysis_ task on the IMDb dataset available with _torchtext_.
* __Sequence to Sequence - Same__
    * A many-to-many-same, _Predict missing word_ task on the Facebook bAbi dataset.
* __Sequence to Sequence - Different__
    * A many-to-one-different, _NTM toy_ task.
    
For each of the tasks, we will run our implementation of vanilla LSTM. <br>

### Loading dependencies

In [0]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.datasets import IMDB
from torchtext import data
from torchtext.vocab import GloVe

import os
import json
import time
import random
import copy
import numpy as np
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix, f1_score, classification_report

from IPython.display import Image

from seq_label import SeqLabel


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# device = torch.device('cpu')

SEED = 1
random.seed(SEED)
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# Task 1: Sequence Labelling

All code pertaining to this task will have a pre-/post-fix **SeqLabel**.

### Common Parameters

In [0]:
batch_size = 16
# Percentage of training data
split_ratio = 0.8
learning_rate = 0.001
epochs = 200
# vocabulary size to embed input (GloVe output dim)
embed_dim = 300

### Loading and preparing IMDb data

In [3]:
from imdb import IMDB_dataset

imdb = IMDB_dataset(split_ratio, SEED)
imdb.load(verbose = True)
imdb.build_vocab(embed_dim)
train_loader, valid_loader, test_loader = imdb.create_data_loader(batch_size, 
                                                                  device)
vocab_len = len(imdb.TEXT.vocab)

Training data size:    20000
Validation data size:  5000
Test data size:        25000


# LSTM

#### LSTM Parameters

In [0]:
# Number of hidden nodes
hidden_dim = 256
# Number of output nodes
output_dim = 1
# Number of LSTMs cells to be stacked
layers = 1
# Boolean value for bidirectioanl or not
bidirectional = False
# Boolean value to use LayerNorm or not
layernorm = False

### Our implementation

In [0]:
# Our implementation

from seq_label import LSTMSeqLabel

# Initializing model
model = LSTMSeqLabel(vocab_len, embed_dim, hidden_dim, output_dim, 
                      imdb.pretrained_weights, layers, bidirectional,
                      layernorm)
model.to(device)

print('Model parameters: ', model.count_parameters())

# Initializing optimizer and loss
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
loss_criterion = nn.BCEWithLogitsLoss()

# Initializing task
task = SeqLabel(model, optimizer, loss_criterion, device)

# Training
freq = 5    # epoch interval to calculate F1 score and save models
out_dir = "results/seq_label/"
# out_dir = "/content/drive/My Drive/colab/seq_label/"
model, stats = task.train(epochs, train_loader, valid_loader, freq, out_dir)

print("=" * 50)

In [0]:
# Testing
f1_test = task.evaluate(test_loader, verbose=True)

### PyTorch implementation

In [0]:
# PyTorch implementation

from seq_label import PyTorchBaseline

# Initializing model
model = PyTorchBaseline(vocab_len, embed_dim, hidden_dim, output_dim, 
                       imdb.pretrained_weights, layers, bidirectional)
model.to(device)

print('Model parameters: ', model.count_parameters())

# Initializing optimizer and loss
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
loss_criterion = nn.BCEWithLogitsLoss()

# Initializing task
task = SeqLabel(model, optimizer, loss_criterion, device)

# Training
freq = 5    # epoch interval to calculate F1 score and save models
out_dir = "results/seq_label/pytorch/"
# out_dir = "/content/drive/My Drive/colab/seq_label/"
model, stats = task.train(epochs, train_loader, valid_loader, freq, out_dir)

print("=" * 50)

Model parameters:  571649
Beginning training model with 571649 parameters

Epoch #1: Batch 2500/2500 -- Loss = 0.6400628089904785; Time taken: 0.016690731048583984s
Epoch #1: Average loss is 0.688916947054863
Epoch #1: Train F1-score is 0.11512942715634053
Epoch #1: Validation F1-score is 0.5475675675675676
Time taken for epoch: 117.011958360672s

Epoch #2: Batch 2500/2500 -- Loss = 0.3095054626464844; Time taken: 0.016431331634521484s
Epoch #2: Average loss is 0.6341397064268589
Time taken for epoch: 77.60263323783875s

Epoch #3: Batch 2500/2500 -- Loss = 0.12923932075500488; Time taken: 0.02263617515563965s
Epoch #3: Average loss is 0.359573724719882
Time taken for epoch: 76.62852096557617s

Epoch #4: Batch 2500/2500 -- Loss = 0.09730026125907898; Time taken: 0.02712535858154297s
Epoch #4: Average loss is 0.30158056974858044
Time taken for epoch: 77.24223208427429s

Epoch #5: Batch 2500/2500 -- Loss = 0.39330965280532837; Time taken: 0.013465404510498047s
Epoch #5: Average loss is 0.

KeyboardInterrupt: ignored

In [0]:
# Testing
f1_test = task.evaluate(test_loader, verbose=True)

# Transformer

### our implementation

In [4]:
from seq_label import TransformerSeqLabel
from transformer import NoamOpt

# 117k
model = TransformerSeqLabel(in_dim=vocab_len, out_dim=1, N=1, heads=4, embed_dim=embed_dim, model_dim=128, ff_dim=256, 
                            key_dim=32, value_dim=32, batch_first=False,
                            pretrained_vec=imdb.pretrained_weights)

model = model.to(device)

print('Model parameters: ', model.count_parameters())

Model parameters:  171137


In [6]:
# Initializing optimizer and loss
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
optimizer = NoamOpt(model.model_dim, 1, 2000,
        torch.optim.Adam(model.parameters(), lr=0, betas=(0.9, 0.98), eps=1e-9))
loss_criterion = nn.BCEWithLogitsLoss()

# Initializing task
task = SeqLabel(model, optimizer, loss_criterion, device)

# Training
freq = 5    # epoch interval to calculate F1 score and save models
out_dir = "results/seqLabel/transformer/"
model, stats = task.train(150, train_loader, valid_loader, freq, out_dir)

print("=" * 50)

# Testing
f1_test = task.evaluate(test_loader, verbose=True)

Beginning training model with 171137 parameters


Epoch #1: Average loss is 0.47241753222346305
Epoch #1: Train F1 is 0.8475949367088608
Epoch #1: Validation F1 is 0.8491663435440092
Time taken for epoch: 77.926922082901s


Epoch #2: Average loss is 0.3745592767238617
Time taken for epoch: 52.81982731819153s


Epoch #3: Average loss is 0.33830806418061254
Time taken for epoch: 53.295148611068726s


Epoch #4: Average loss is 0.3065046506434679
Time taken for epoch: 54.342206716537476s


Epoch #5: Average loss is 0.28564162150919437
Epoch #5: Train F1 is 0.8827127940090708
Epoch #5: Validation F1 is 0.8563334682314853
Time taken for epoch: 76.9979019165039s


Epoch #6: Average loss is 0.2663201034232974
Time taken for epoch: 53.23896503448486s


Epoch #7: Average loss is 0.2516104934856296
Time taken for epoch: 53.220128774642944s


Epoch #8: Average loss is 0.23765434447824954
Time taken for epoch: 53.5066499710083s


Epoch #9: Average loss is 0.2253943543717265
Time taken for epoch: 53

KeyboardInterrupt: ignored

In [0]:
# !zip -r results-seq_label.zip results/seqLabel/

# from google.colab import files
# files.download('results-seq_label.zip')

In [14]:
# Testing
f1_test = task.evaluate(test_loader, verbose=True)
print('F1 score: ', f1_test)

Confusion Matrix: 
 [[10179  2321]
 [ 1361 11139]]
Classification Report: 
               precision    recall  f1-score   support

         0.0       0.88      0.81      0.85     12500
         1.0       0.83      0.89      0.86     12500

    accuracy                           0.85     25000
   macro avg       0.85      0.85      0.85     25000
weighted avg       0.85      0.85      0.85     25000

F1 score:  (0.8581664098613251, 0.4305374167611678)
