# Recurrent Neural Networks in PyTorch

In this section we will have a look at what a recurrent neural network does under the hood in PyTorch. We will conduct the same computations using `nn.RNN` on the one hand and manually by implementing matrix multiplications and additions on the other hand. This will give us the necessary intuition to work with more complex architectures in the future.

In [1]:
import torch
import torch.nn as nn

The parameters were provided with different values, in order to be able to understand the dimensionalities of tensors. If you work through the notebook and ask yourself why the output is shaped in a particular way, return to these parameters.

In [2]:
BATCH_SIZE=4
SEQUENCE_LENGTH=5
INPUT_SIZE=1
HIDDEN_SIZE=3
NUM_LAYERS=2

A recurrent neural network in PyTorch is as expected a `nn.Module`. There are several important arguments that the module requires, you can read more on the official [PyTorch documentatoin](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html).

- input_size – This is the lenghts of the vectors in the sequence. If you have a sequence of size 10 and each piece of the sequence is a 4-dim vector, then input size is 4.

- hidden_size – This is the number of output neurons

- num_layers – The number of recurrent layers, defaults to 1

- nonlinearity – Either 'tanh' or 'relu', defaults to 'tanh'

- batch_first – This parameter can be tricky to grasp. So far when we used neural networks, the first dimensionality has always been the `batch size`. A recurrent neural network in PyTorch on the other hand, takes a shape of `(sequence length, batch size, features)` as the default. If you set this parameter to True, then you must provide the rnn with the shape the `(batch size, sequence length, features)`. Below we will use the default behaviour, but in our practical examples it will be convenient to set this to True.


Our recurrent neural network expects inputs of size 1, generates vectors of size three and stacks two layers.

In [3]:
rnn = nn.RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers=NUM_LAYERS)

In order to be able to reconstruct the functionality of the `nn.RNN` module, we will extract the weights and biases that the module is initialized with. There are two sets of weights for each of the layer: `ih` is input-hidden and `hh` is hidden-hidden. The layers are marked with either `l0` or `l1`. If we created a three layer network, there would be a `l2`.

In [4]:
# ---------------------------- #
# layer 1
# ---------------------------- #

# input to hidden weights and biases
w_ih_l0 = rnn.weight_ih_l0
b_ih_l0 = rnn.bias_ih_l0

# hidden to hidden weights and biases
w_hh_l0 = rnn.weight_hh_l0
b_hh_l0 = rnn.bias_hh_l0

# ---------------------------- #
# layer 2
# ---------------------------- #
# input to hidden weights and biases
w_ih_l1 = rnn.weight_ih_l1
b_ih_l1 = rnn.bias_ih_l1

# hidden to hidden weights and biases
w_hh_l1 = rnn.weight_hh_l1
b_hh_l1 = rnn.bias_hh_l1

We create a random sequence of dimensionality (5, 4, 1) and initial hidden state filled with zeros.

In [5]:
sequence = torch.randn(SEQUENCE_LENGTH, BATCH_SIZE, INPUT_SIZE)
h_0 = torch.zeros(NUM_LAYERS, BATCH_SIZE, HIDDEN_SIZE)

The network returns the output and the last hidden states.

In [6]:
with torch.inference_mode():
    output, h_n = rnn(sequence, h_0)

The `output` variable contains the hidden states that were produced for the top most layer (for each part of the sequence and each batch). 

In [7]:
output.shape

torch.Size([5, 4, 3])

The `h_n` variable on the other hand is the last hidden state for each layer.

In [8]:
h_n.shape

torch.Size([2, 4, 3])

The above explanations might not made a lot of sense to you, therefore you should work through this manual implementation below. Once you do, you will have a much better grasp at what the output and the h_n actually stand for. Don't skip this part.

In [9]:
def manual_rnn():
    hidden = h_0.clone()
    output = torch.zeros(SEQUENCE_LENGTH, BATCH_SIZE, HIDDEN_SIZE)
    with torch.inference_mode():
        for idx, seq in enumerate(sequence):
            for layer in range(NUM_LAYERS):
                if layer == 0:
                    hidden[0] = torch.tanh(seq @ w_ih_l0.T + b_ih_l0 + hidden[0] @ w_hh_l0.T + b_hh_l0)
                elif layer == 1:
                    hidden[1] = torch.tanh(hidden[0] @ w_ih_l1.T + b_ih_l1 + hidden[1] @ w_hh_l1.T + b_hh_l1)
                    output[idx] = hidden[1]
    return output, hidden

In [10]:
manual_output, manual_h_n = manual_rnn()

The values that the `nn.RNN` module produced and those produced by our manual implementation are identical.

In [11]:
output

tensor([[[-1.1792e-02,  1.9047e-01,  1.6709e-01],
         [ 7.2242e-02,  6.5224e-02,  2.0224e-01],
         [ 1.0300e-01,  1.9305e-02,  2.1496e-01],
         [ 1.0832e-02,  1.5682e-01,  1.7670e-01]],

        [[ 1.0195e-01,  9.4162e-02,  1.3910e-01],
         [ 1.3410e-01,  1.8453e-01,  1.5568e-01],
         [ 8.6578e-02,  3.0272e-01,  1.3517e-01],
         [ 1.2236e-01,  1.0065e-01,  1.4895e-01]],

        [[ 1.2442e-01,  1.0543e-01,  1.7694e-01],
         [ 2.4460e-01, -7.7739e-02,  2.1771e-01],
         [ 1.3163e-01, -1.7330e-04,  1.5564e-01],
         [ 2.3440e-01, -4.3322e-02,  2.2091e-01]],

        [[ 2.3210e-01, -1.6176e-02,  1.7971e-01],
         [ 1.3311e-01,  3.1258e-01,  1.3606e-01],
         [ 8.3280e-02,  2.2146e-01,  1.3285e-01],
         [ 2.1146e-01,  1.8934e-01,  1.6867e-01]],

        [[ 2.1319e-01,  1.2671e-01,  2.0186e-01],
         [ 1.5184e-01, -1.8205e-02,  1.5104e-01],
         [ 1.5311e-01, -1.2818e-02,  1.5881e-01],
         [ 3.1330e-01, -1.4031e-01,  2.391

In [12]:
manual_output

tensor([[[-1.1792e-02,  1.9047e-01,  1.6709e-01],
         [ 7.2242e-02,  6.5224e-02,  2.0224e-01],
         [ 1.0300e-01,  1.9305e-02,  2.1496e-01],
         [ 1.0832e-02,  1.5682e-01,  1.7670e-01]],

        [[ 1.0195e-01,  9.4162e-02,  1.3910e-01],
         [ 1.3410e-01,  1.8453e-01,  1.5568e-01],
         [ 8.6578e-02,  3.0272e-01,  1.3517e-01],
         [ 1.2236e-01,  1.0065e-01,  1.4895e-01]],

        [[ 1.2442e-01,  1.0543e-01,  1.7694e-01],
         [ 2.4460e-01, -7.7739e-02,  2.1771e-01],
         [ 1.3163e-01, -1.7332e-04,  1.5564e-01],
         [ 2.3440e-01, -4.3322e-02,  2.2091e-01]],

        [[ 2.3210e-01, -1.6176e-02,  1.7971e-01],
         [ 1.3310e-01,  3.1258e-01,  1.3606e-01],
         [ 8.3280e-02,  2.2146e-01,  1.3285e-01],
         [ 2.1146e-01,  1.8934e-01,  1.6867e-01]],

        [[ 2.1319e-01,  1.2671e-01,  2.0186e-01],
         [ 1.5184e-01, -1.8205e-02,  1.5104e-01],
         [ 1.5311e-01, -1.2818e-02,  1.5881e-01],
         [ 3.1330e-01, -1.4031e-01,  2.391

In [14]:
h_n

tensor([[[-0.0023, -0.4017,  0.0474],
         [-0.2172,  0.0534,  0.1672],
         [-0.2537, -0.1052,  0.1003],
         [-0.3814, -0.2900, -0.2007]],

        [[ 0.2132,  0.1267,  0.2019],
         [ 0.1518, -0.0182,  0.1510],
         [ 0.1531, -0.0128,  0.1588],
         [ 0.3133, -0.1403,  0.2392]]])

In [15]:
manual_h_n

tensor([[[-0.0023, -0.4017,  0.0474],
         [-0.2172,  0.0534,  0.1672],
         [-0.2537, -0.1052,  0.1003],
         [-0.3814, -0.2900, -0.2007]],

        [[ 0.2132,  0.1267,  0.2019],
         [ 0.1518, -0.0182,  0.1510],
         [ 0.1531, -0.0128,  0.1588],
         [ 0.3133, -0.1403,  0.2392]]])