# Long Short-Term Memory Network

*Jibin Mathew, PyTorch Artificial Intelligence Fundamentals (2020), p82*:

"...LSTM networks are a type of RNNs that has internal gates that helps in better information persistence. These gates are tiny neural networks that control when information needs to be saved and when it can be erased or forgotten. RNNs suffer from vanishing and exploding gradients, making it difficult to learn long-term dependencies. LSTMs are resistant to exploding and vanishing gradients, although it is still mathematically possible."

In [1]:
tokenizer = lambda words: words.split()
tokenizer("This is a test for the tokenizer.")

['This', 'is', 'a', 'test', 'for', 'the', 'tokenizer.']

In [11]:
import torch
from torchtext.data import Field

REVIEW = Field(sequential=True, tokenize=tokenizer, lower=True)
LABEL = Field(sequential=False, use_vocab=False)

## Developing a dataset

In [7]:
from torchtext.data import TabularDataset

train_datafields = [
    ("id", None),
    ("content", REVIEW),
    ("Business", LABEL),
    ("SciTech", LABEL),
    ("Sports", LABEL),
    ("World", LABEL)]

test_datafields = [
    ("id", None),
    ("content", REVIEW)]

train, valid = TabularDataset.splits(
    path='/Users/nikolavetnic/Desktop/Text Materials/DeepLearning/[AI] Jibin Mathew - PyTorch Artificial Intelligence Fundamentals (2020)/Chapter 4/',
    train='train.csv',
    validation='valid.csv',
    format='csv',
    skip_header=True,
    fields=train_datafields)

test = TabularDataset.splits(
    path='/Users/nikolavetnic/Desktop/Text Materials/DeepLearning/[AI] Jibin Mathew - PyTorch Artificial Intelligence Fundamentals (2020)/Chapter 4/',
    format='csv',
    skip_header=True,
    fields=test_datafields)

REVIEW.build_vocab(train, min_freq=2)

## Developing iterators

In [10]:
from torchtext.data import BucketIterator

BATCH_SIZE = 128

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iter, valid_iter, test_iter = BucketIterator.splits(
    (train, valid, test),
    batch_size=BATCH_SIZE,
    device=device,
    sort_key=lambda x: len(x.comment_text),
    sort_within_batch=False)

## Exploring word embeddings

In [19]:
from torchtext.vocab import Vectors

# loading the pretrained embedding vectors
vec = Vectors(
    'glove.6B.100d.txt',
    cache='./vec/glove_embedding/',
    url='http://nlp.stanford.edu/data/glove.6B.zip')

REVIEW.build_vocab(train, min_freq=2, vectors=vec)

./vec/glove_embedding/glove.6B.zip: 862MB [06:48, 2.11MB/s]                               
100%|█████████▉| 399441/400000 [00:29<00:00, 13623.07it/s]

## Building an LSTM network

In [20]:
import torch.nn as nn

class LSTMClassifier(nn.Module):
    
    def __init__(self, embedding_dim, hidden_dim, output_dim, dropout):
        super().__init__()
        self.embedding = nn.Embedding(len(REVIEW.vocab), embedding_dim)
        self.rnn = nn.LSTM(embedding_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.dropout = nn.Dropout(dropout)
    
    def forward(self, x):
        x = self.embedding(x)
        output, (hidden, cell) = self.rnn(x)
        hidden = self.dropout(hidden)
        return self.fc(hidden)
    
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
DROPOUT = 0.5
    
model = LSTMClassifier(EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM, DROPOUT)

100%|█████████▉| 399441/400000 [00:40<00:00, 13623.07it/s]

## Building a multilayer LSTM

*Jibin Mathew, PyTorch Artificial Intelligence Fundamentals (2020), p86*:

"...In this recipe, we added `num_layers` and the parameter in the constructor to control the number of layers of LSTMs in the model, and passed it as a keyword argument, `num_layers`, in the LSTM definition.

Then, in the `forward()` method, we took the hidden state only from the last LSTM layer using `hidden[-1]` since the shape of the hidden state is `[num_layers * num_directions, batch, hidden_dim]`, where `num_direction` is `1` by default. This meant that `hidden[-1]` gave the last layer's hidden state. By doing this, we could choose `num_layers` as a hyperparameter. The hidden state output from the lower layer was passed as the input of the higher state."

In [24]:
class MultiLSTMClassifier(nn.Module):
    
    def __init__(self, embedding_dim, hidden_dim, output_dim, dropout, num_layers):
        super().__init__()
        self.embedding = nn.Embedding(len(REVIEW.vocab), embedding_dim)
        self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers=num_layers)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.dropout = nn.Dropout(dropout)
    
    def forward(self, x):
        x = self.embedding(x)
        output, (hidden, cell) = self.rnn(x)
        hidden = self.dropout(hidden)
        return self.fc(hidden[-1])

EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
DROPOUT = 0.5
NUM_LAYERS = 2

model = MultiLSTMClassifier(EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM, DROPOUT, NUM_LAYERS)

## Building a bidirectional LSTM

*Jibin Mathew, PyTorch Artificial Intelligence Fundamentals (2020), p88*:

"...In this recipe, we set the `bidirectional` flag to `True` in the LSTM definition. We concatenated the hidden states of the forward and backward LSTMs and passed them into the fully connected layer. Because of this, the input dimension of the fully connected layer was doubled to accommodate the forward and backward hidden state tensors.

In the `forward()` method, we concatenated the forward and backward hidden states using `torch.cat()`, and we used the last hidden states of the forward and backward LSTMs. In PyTorch, the hidden states are stacked as `[forward_layer_0, backward_layer_0, forward_layer_1, backward_layer_1, ..., forward_layer_n, backward_layer_n]`, and so the required tensors are `hidden[-2,:,:]`, `hidden[-1,:,:]`. After concatenation, we passed the hidden vector into the fully connected layer after squeezing out the extra dimensions.

In [25]:
class BiLSTMClassifier(nn.Module):
    
    def __init__(self, embedding_dim, hidden_dim, output_dim, dropout, num_layers):
        super().__init__()
        self.embedding = nn.Embedding(len(REVIEW.vocab), embedding_dim)
        self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers=num_layers, bidirectional=True)
        self.fc = nn.Linear(2*hidden_dim, output_dim)
        self.dropout = nn.Dropout(dropout)
    
    def forward(self, x):
        x = self.embedding(x)
        output, (hidden, cell) = self.rnn(x)
        hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1))
        return self.fc(hidden.squeeze(0))

EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
DROPOUT = 0.5
NUM_LAYERS = 2
    
model = BiLSTMClassifier(EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM, DROPOUT, NUM_LAYERS)