### Introduction

We'll be using a **recurrent neural network** (RNN) as they are commonly used in analysing sequences. An RNN takes in sequence of words, $X=\{x_1, ..., x_T\}$, one at a time, and produces a _hidden state_, $h$, for each word. We use the RNN _recurrently_ by feeding in the current word $x_t$ as well as the hidden state from the previous word, $h_{t-1}$, to produce the next hidden state, $h_t$. 

$$h_t = \text{RNN}(x_t, h_{t-1})$$

Once we have our final hidden state, $h_T$, (from feeding in the last word in the sequence, $x_T$) we feed it through a linear layer, $f$, (also known as a fully connected layer), to receive our predicted sentiment, $\hat{y} = f(h_T)$.

**In this example we will be feeding hidden state from RNN into fully connected layer. We can also feed output from RNN into fully connected layer which will provide better accuracy. In general classification tasks we feed output and encoder-decoder tasks we use hidden state.**  

**The output is the concatenation of the hidden state from every time step, whereas hidden is simply the final hidden state**

In [228]:
import torch
from torchtext import data
from torchtext import datasets
import random
SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True


## Getting and transforming data

In [278]:
TEXT = data.Field(tokenize='spacy',batch_first=True)
LABEL = data.LabelField(dtype=torch.float)

LABEL is defined by a LabelField, a special subset of the Field class specifically used for handling labels. TorchText sets tensors to be LongTensors by default, however our criterion expects both inputs to be FloatTensors. Setting the dtype to be torch.float, did this for us. The alternative method of doing this would be to do the conversion inside the train function by passing batch.label.float() instad of batch.label to the criterion.

In [279]:
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

In [280]:
train_data, valid_data = train_data.split(random_state = random.seed(SEED), split_ratio=0.7)

In [281]:
print(f'Number of training examples: {len(train_data)}')
print(f'Number of validation examples: {len(valid_data)}')
print(f'Number of testing examples: {len(test_data)}')

Number of training examples: 17500
Number of validation examples: 7500
Number of testing examples: 25000


In [282]:
TEXT.build_vocab(train_data, max_size = 25000)
LABEL.build_vocab(train_data)

In [283]:
print(f"Unique tokens in TEXT vocabulary: {len(TEXT.vocab)}")
print(f"Unique tokens in LABEL vocabulary: {len(LABEL.vocab)}")

Unique tokens in TEXT vocabulary: 25002
Unique tokens in LABEL vocabulary: 2


In [284]:
LABEL.vocab.stoi

defaultdict(None, {'neg': 0, 'pos': 1})

In [285]:
number_of_vocabs = len(TEXT.vocab)
embedding_dim = 300

In [286]:
BATCH_SIZE = 64

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data),
    batch_size=BATCH_SIZE,

    device=device)

In [287]:
trial_input = None
for batch in train_iterator:
    print("batch.text:")
    print(batch.text)
    print("batch.text.size:")
    print(batch.text.size())
    print("batch.label:")
    print(batch.label, len(batch.label))
    trial_input = batch.text
    break

batch.text:
tensor([[  261,    19,  4234,  ...,     1,     1,     1],
        [  314,    11,    19,  ...,     1,     1,     1],
        [  149, 10066,  4247,  ...,     1,     1,     1],
        ...,
        [  377,   170,    24,  ...,     1,     1,     1],
        [12801,   472, 16997,  ...,     1,     1,     1],
        [  377,    59,   303,  ...,     1,     1,     1]], device='cuda:0')
batch.text.size:
torch.Size([64, 932])
batch.label:
tensor([0., 1., 1., 1., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1., 0., 1., 0., 0.,
        1., 1., 1., 1., 1., 1., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1.,
        1., 0., 1., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 0.,
        1., 0., 1., 1., 0., 1., 0., 1., 1., 0.], device='cuda:0') 64


## Creating our Model

### Feeding RNN's output or hidden state to fc layer?

RNN will return `(output,hidden)` from nn.RNN. Do we feed output or hidden into final linear layer? 

We can feed either hidden or output into linear layer. The output is the appended stack of the hidden state from every time step, whereas hidden is simply the final hidden state. Size of the output will be `[batch_size, seq_len,input_size]` where as hidden will be `[batch_size, input_size]`. 

1. If we use hidden:
```hidden = hidden.squeeze(0)```

2. If we use output:
```output = output[:,-1,:]```

Now, we can feed this into fc layer.


![](assets/rnn_hidden_output.png)


In [299]:
import torch.nn as nn
class Model(nn.Module):
    def __init__(self, number_of_vocabs, embedding_dim=300):
        super().__init__()
        self.embedding = nn.Embedding(number_of_vocabs, embedding_dim)
        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=200,batch_first=True)
        self.fc1 = nn.Linear(200,1)
            
    def forward(self, text):
        #text = [batch_size(64), x(variable sent len)]
        
        embedding = self.embedding(text)
        #embedding = [batch_size(64), x(variable sent len), embedding_dim(300)]
        
        output, hidden = self.rnn(embedding)
        #output = [batch_size(64), x(variable sent len), hidden_size(200)]
        #hidden = [num_layers*num_direction(1), batch_size(64), hidden_size(200)]
        
        assert torch.equal(output[:,-1,:], hidden.squeeze(0))

        
        hidden = hidden.squeeze(0)
        #hidden = [batch_size(64), hidden_size(200)]
        
        hidden = self.fc1(hidden)
        #hidden = [batch_size(64), 1]
        
        return hidden


In [300]:
print("number_of_vocabs: ",number_of_vocabs)
model = Model(number_of_vocabs)
model.cuda()

number_of_vocabs:  25002


Model(
  (embedding): Embedding(25002, 300)
  (rnn): RNN(300, 200, batch_first=True)
  (fc1): Linear(in_features=200, out_features=1, bias=True)
)

In [301]:
#feed a sample data into our model to observe if the output dimension is as expected. 
a = model(trial_input)
a.size()

torch.Size([64, 1])

## Training our Model

In [297]:
import torch.optim as optim
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
optimizer = optim.SGD(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()
model = model.to(device)
criterion = criterion.to(device)

In [291]:
def binary_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """

    #round predictions to the closest integer
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float() #convert into float for division 
    acc = correct.sum() / len(correct)
    return acc

In [292]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
                
        predictions = model(batch.text).squeeze(1)
        
        loss = criterion(predictions, batch.label)
        
        acc = binary_accuracy(predictions, batch.label)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [293]:

def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            predictions = model(batch.text).squeeze(1)
            
            loss = criterion(predictions, batch.label)
            
            acc = binary_accuracy(predictions, batch.label)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [294]:

import time

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

In [295]:
N_EPOCHS = 5

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'tut1-model.pt')
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

Epoch: 01 | Epoch Time: 0m 12s
	Train Loss: 0.703 | Train Acc: 49.63%
	 Val. Loss: 0.700 |  Val. Acc: 50.55%
Epoch: 02 | Epoch Time: 0m 12s
	Train Loss: 0.703 | Train Acc: 49.64%
	 Val. Loss: 0.700 |  Val. Acc: 50.55%
Epoch: 03 | Epoch Time: 0m 13s
	Train Loss: 0.703 | Train Acc: 49.63%
	 Val. Loss: 0.700 |  Val. Acc: 50.55%
Epoch: 04 | Epoch Time: 0m 12s
	Train Loss: 0.703 | Train Acc: 49.65%
	 Val. Loss: 0.700 |  Val. Acc: 50.55%
Epoch: 05 | Epoch Time: 0m 12s
	Train Loss: 0.703 | Train Acc: 49.65%
	 Val. Loss: 0.700 |  Val. Acc: 50.55%


In [296]:
model.load_state_dict(torch.load('tut1-model.pt'))

test_loss, test_acc = evaluate(model, test_iterator, criterion)

print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

Test Loss: 0.710 | Test Acc: 47.58%
