## Testing RNN

This file contains mostly code from the introduction on [Medium](https://medium.com/dair-ai/building-rnns-is-fun-with-pytorch-and-google-colab-3903ea9a3a79)

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import os
import numpy as np

## Single Recurrent Neural Network

In [14]:
class SingleRNN(nn.Module):
    def __init__(self, n_inputs, n_neurons):
        super(SingleRNN, self).__init__()
        
        self.Wx = torch.randn(n_inputs, n_neurons) # 4 X 1 Defines input weights
        self.Wy = torch.randn(n_neurons, n_neurons) # 1 X 1 defines weights between time series
        
        self.b = torch.zeros(1, n_neurons) # 1 X 4 Defines biases of inputs
    
    def forward(self, X0, X1):
        self.Y0 = torch.tanh(torch.mm(X0, self.Wx) + self.b) # 4 X 1
        
        self.Y1 = torch.tanh(torch.mm(self.Y0, self.Wy) +
                            torch.mm(X1, self.Wx) + self.b) # 4 X 1
        
        return self.Y0, self.Y1

<img src="RNN_img.png"/>

## Test Model with one neuron

In [15]:
N_INPUT = 4 # Input dim
N_NEURONS = 1

X0_batch = torch.tensor([[0,1,2,0], [3,4,5,0], 
                         [6,7,8,0], [9,0,1,0]],
                        dtype = torch.float) #t=0 => 4 X 4

X1_batch = torch.tensor([[9,8,7,0], [0,0,0,0], 
                         [6,5,4,0], [3,2,1,0]],
                        dtype = torch.float) #t=1 => 4 X 4

model = SingleRNN(N_INPUT, N_NEURONS)

Y0_val, Y1_val = model(X0_batch, X1_batch)

In [16]:
print(Y0_val, Y1_val)

tensor([[ 0.9664],
        [ 0.9998],
        [ 1.0000],
        [-0.5101]]) tensor([[1.0000],
        [0.5358],
        [0.9989],
        [0.2595]])


## Model with many neurons

<img src="RNN_img2.png"/>

In [17]:
N_INPUT = 3 # number of features in input
N_NEURONS = 5 # number of units in layer

X0_batch = torch.tensor([[0,1,2], [3,4,5], 
                         [6,7,8], [9,0,1]],
                        dtype = torch.float) #t=0 => 4 X 3

X1_batch = torch.tensor([[9,8,7], [0,0,0], 
                         [6,5,4], [3,2,1]],
                        dtype = torch.float) #t=1 => 4 X 3

model = SingleRNN(N_INPUT, N_NEURONS)

Y0_val, Y1_val = model(X0_batch, X1_batch)

## Write general RNN with N time_steps

The RNNCell is a pytorch function, that handels the weight matrices by itself.

In [18]:
class CleanBasicRNN(nn.Module):
    def __init__(self, batch_size, n_inputs, n_neurons, n_time_steps):
        super(CleanBasicRNN, self).__init__()
        
        self.rnn = nn.RNNCell(n_inputs, n_neurons)
        self.hx = torch.randn(batch_size, n_neurons) # initialize hidden state
        self.n_time_steps = n_time_steps
    def forward(self, X):
        output = []

        # for each time step
        for i in range(self.n_time_steps):
            self.hx = self.rnn(X[i], self.hx)
            output.append(self.hx)
        
        return output, self.hx

FIXED_BATCH_SIZE = 4 # our batch size is fixed for now
N_INPUT = 3
N_NEURONS = 5

X_batch = torch.tensor([[[0,1,2], [3,4,5], 
                         [6,7,8], [9,0,1]],
                        [[9,8,7], [0,0,0], 
                         [6,5,4], [3,2,1]]
                       ], dtype = torch.float) # X0 and X1


model = CleanBasicRNN(FIXED_BATCH_SIZE, N_INPUT, N_NEURONS, 2)
output_val, states_val = model(X_batch)
print(output_val) # contains all output for all timesteps
print(states_val) # contains values for final state or final timestep, i.e., t=1

[tensor([[ 0.0513, -0.5216,  0.7115, -0.8208, -0.5552],
        [-0.1247, -0.7843,  0.7987, -0.8954, -0.9682],
        [-0.9959, -0.5484,  0.7341, -0.9217, -0.8594],
        [ 0.8855,  0.9299, -0.9548,  0.9903, -0.9878]], grad_fn=<TanhBackward>), tensor([[ 0.1819, -0.2414,  0.9283,  0.3014, -0.9850],
        [ 0.9093, -0.2167,  0.3887, -0.0421, -0.2961],
        [ 0.5842,  0.0397,  0.7662,  0.2752, -0.9704],
        [ 0.8072, -0.4158,  0.4931,  0.2239, -0.7348]], grad_fn=<TanhBackward>)]
tensor([[ 0.1819, -0.2414,  0.9283,  0.3014, -0.9850],
        [ 0.9093, -0.2167,  0.3887, -0.0421, -0.2961],
        [ 0.5842,  0.0397,  0.7662,  0.2752, -0.9704],
        [ 0.8072, -0.4158,  0.4931,  0.2239, -0.7348]], grad_fn=<TanhBackward>)


## RNNs are well explained in:
[youtube link](https://www.youtube.com/watch?v=ogZi5oIo4fI)

`hidden_size`in the RNN class is simply the output size of the RNN. Later we can add a fully-connected layer to the RNN output to increase the complexity of the model. 

`hidden_0` is the initial hidden value. Because it is a recurrent network we already need some value there from the beginning.

<img src="RNN-unrolled.png"/>

It is important not to confuse $h_t$ with the hidden output. $h_t$ in the end will be the output of the model. E.g. a label or the prediction of the next step in the time series.

In [89]:
cell = nn.RNN(input_size = 10, hidden_size = 100, batch_first = True)
inputs = torch.randn(1, 5, 10)# (batch_size, sequenze_length, inpiut_size)
hidden_0 = torch.zeros(1, 1, 100)# (num_layers, batch_size, hidden_size) 

out, hidden = cell(inputs, hidden_0)

print(out.shape, hidden.shape)

torch.Size([1, 5, 100]) torch.Size([1, 1, 100])


The output of the RNN is provided at every time step. The hidden output on the other size must be the same dimension as `hidden_0`.

## Now let's try the example to learn HELLO from the youtube video


In [91]:
h = [1,0,0,0]
e = [0,1,0,0]
l = [0,0,1,0]
o = [0,0,0,1]

cell = nn.RNN(input_size = 4, hidden_size = 2, batch_first = True)
inputs = torch.Tensor([[h]]) # dim (1,1,4)

# init the hidden vector with (num_layers*num_directions, batch_size, hidden_size)
hidden = torch.randn(1,1,2)

# This is for one step in a RNN
out, hidden = cell(inputs,hidden)

print(out.data)

tensor([[[ 0.8001, -0.2796]]])


## This was simply to feed one letter into the RNN
Now we would like to feed "Hello" into the RNN

In [93]:
inputs = torch.Tensor([[h,e,l,l,o]]) # dim (1,1,4)
print(inputs.shape)
# init the hidden vector with (num_layers*num_directions, batch_size, hidden_size)
hidden = torch.randn(1,1,2)

# This is for one step in a RNN
out, hidden = cell(inputs,hidden)

print(out.data)

torch.Size([1, 5, 4])
tensor([[[ 0.9305, -0.6559],
         [-0.9300,  0.0903],
         [ 0.7901, -0.8132],
         [-0.6055,  0.3513],
         [ 0.4348, -0.9057]]])


## If we want to feed several words at a time
We need to stack them to batches

## Now let's teach the network to say hihello!
So far we have chose the output dimension to be 2. In a real world example we would want to predict for example the next letter in a word. Therefore the output dimension has to be equal to the input dimension.

First we will need one more letter. Therefor the input and the ouput dimensions will be 5.

In [110]:
h = [1,0,0,0,0]
i = [0,1,0,0,0]
e = [0,0,1,0,0]
l = [0,0,0,1,0]
o = [0,0,0,0,1]

# There is an easier way to do this:
chars = ['h', 'i', 'e', 'l', 'o']
x_data = [0,1,0,2,3,3] #hihell
x_one_hot = [[[h,i,h,e,l,l]]]

y_data = [1,0,2,3,3,4] #ihello

inputs = torch.Tensor(x_one_hot)
labels = torch.LongTensor(y_data)

num_classes = 5
input_size = 5
hidden_size = 5
batch_size = 1
sequence_length = 1 # For the start we do one by one
num_layers = 1 # We only have one layer RNN

## Define the RNN

In [158]:
# Lab 12 RNN
import sys
import torch
import torch.nn as nn
from torch.autograd import Variable

torch.manual_seed(777)  # reproducibility
#            0    1    2    3    4
idx2char = ['h', 'i', 'e', 'l', 'o']

# Teach hihell -> ihello
x_data = [0, 1, 0, 2, 3, 3]   # hihell
one_hot_lookup = [[1, 0, 0, 0, 0],  # 0
                  [0, 1, 0, 0, 0],  # 1
                  [0, 0, 1, 0, 0],  # 2
                  [0, 0, 0, 1, 0],  # 3
                  [0, 0, 0, 0, 1]]  # 4

y_data = [1, 0, 2, 3, 3, 4]    # ihello
x_one_hot = [one_hot_lookup[x] for x in x_data]

# As we have one batch of samples, we will change them to variables only once
inputs = Variable(torch.Tensor(x_one_hot))
labels = Variable(torch.LongTensor(y_data))

num_classes = 5
input_size = 5  # one-hot size
hidden_size = 5  # output from the RNN. 5 to directly predict one-hot
batch_size = 1   # one sentence
sequence_length = 1  # One by one
num_layers = 1  # one-layer rnn


class Model(nn.Module):

    def __init__(self):
        super(Model, self).__init__()
        self.rnn = nn.RNN(input_size=input_size,
                          hidden_size=hidden_size, batch_first=True)

    def forward(self, hidden, x):
        # Reshape input (batch first)
        x = x.view(batch_size, sequence_length, input_size)

        # Propagate input through RNN
        # Input: (batch, seq_len, input_size)
        # hidden: (num_layers * num_directions, batch, hidden_size)
        out, hidden = self.rnn(x, hidden)
        return hidden, out.view(-1, num_classes)

    def init_hidden(self):
        # Initialize hidden and cell states
        # (num_layers * num_directions, batch, hidden_size)
        return Variable(torch.zeros(num_layers, batch_size, hidden_size))


# Instantiate RNN model
model = Model()
print(model)

# Set loss and optimizer function
# CrossEntropyLoss = LogSoftmax + NLLLoss
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
# Train the model
labels = labels.view(6,1)
for epoch in range(100):
    optimizer.zero_grad()
    loss = 0
    hidden = model.init_hidden()

    sys.stdout.write("predicted string: ")
    for input, label in zip(inputs, labels):
        hidden, output = model(hidden, input)
        val, idx = output.max(1)
        sys.stdout.write(idx2char[idx.data[0]])

        loss += criterion(output, label)

    print(", epoch: %d, loss: %1.3f" % (epoch + 1, loss.data))

    loss.backward()
    optimizer.step()

print("Learning finished!")

Model(
  (rnn): RNN(5, 5, batch_first=True)
)
predicted string: llllll, epoch: 1, loss: 10.155
predicted string: llllll, epoch: 2, loss: 9.137
predicted string: llllll, epoch: 3, loss: 8.355
predicted string: llllll, epoch: 4, loss: 7.577
predicted string: llllll, epoch: 5, loss: 6.876
predicted string: lhelll, epoch: 6, loss: 6.327
predicted string: ihelll, epoch: 7, loss: 6.014
predicted string: ihelll, epoch: 8, loss: 5.787
predicted string: ihelll, epoch: 9, loss: 5.477
predicted string: ihelll, epoch: 10, loss: 5.274
predicted string: ihelll, epoch: 11, loss: 5.041
predicted string: ihello, epoch: 12, loss: 4.827
predicted string: ihello, epoch: 13, loss: 4.676
predicted string: ihello, epoch: 14, loss: 4.550
predicted string: ihello, epoch: 15, loss: 4.430
predicted string: ihello, epoch: 16, loss: 4.305
predicted string: ihello, epoch: 17, loss: 4.164
predicted string: ihelll, epoch: 18, loss: 4.003
predicted string: ihelll, epoch: 19, loss: 3.860
predicted string: ihelll, epoch

## Now use this model to autofill the sentence.
If we feed a "h" to the model it will automatically give "hihello"

In [301]:
hidden = model.init_hidden()
input = inputs[0]
print(hidden, input)

tensor([[[0., 0., 0., 0., 0.]]]) tensor([1., 0., 0., 0., 0.])


In [310]:
hidden, output = model(hidden, input)
print(hidden, output)
val, idx = output.max(1)
input = output #inputs[idx.data[0]]
print(input)

sys.stdout.write(idx2char[idx.data[0]])

tensor([[[-0.6325, -0.1642, -0.9999, -0.9997,  1.0000]]],
       grad_fn=<StackBackward>) tensor([[-0.6325, -0.1642, -0.9999, -0.9997,  1.0000]], grad_fn=<ViewBackward>)
tensor([[-0.6325, -0.1642, -0.9999, -0.9997,  1.0000]], grad_fn=<ViewBackward>)
o

In [316]:
hidden = model.init_hidden()
input = inputs[0]
print("h")
for ins in inputs:
    hidden, output = model(hidden, ins)
    val, idx = output.max(1)
    sys.stdout.write(idx2char[idx.data[0]])

h
ihello

## We see that the output actually is not very good. Because the other values are not zero. They are very negative.
This will have to be improved