# Recurrent Neural Networks
In this exercise, we will implement a simple one-layer recurrent neural network. We will use the formula for an [Elman RNN](https://en.wikipedia.org/wiki/Recurrent_neural_network#Elman_networks_and_Jordan_networks), one of the most basic and classical RNNs. The hidden state update and output at time $t$ are defined like this:

$$
\begin{align}
h_t &= \tanh(W_h x_t + U_h h_{t-1} + b_h) \\
y_t &= \tanh(W_y h_t + b_y)
\end{align}
$$

In [5]:
import torch
import torch.nn as nn

We start by defining the RNN as a subclass of `nn.Module`. The network's parameters are created in the `__init__` method. Use `input_dim`, `hidden_dim` and `output_dim` as arguments that define the dimensionality of the input/hidden/output vectors. Define your parameters as `nn.Parameter` with the appropriate dimensions. The documentation of `torch.nn` can be found [here](https://pytorch.org/docs/stable/nn.html).

In [6]:
class RNN(nn.Module):

    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.W_ih = nn.Parameter(torch.zeros(hidden_dim, input_dim))
        self.W_hh = nn.Parameter(torch.zeros(hidden_dim, hidden_dim))
        self.W_ho = nn.Parameter(torch.zeros(output_dim, hidden_dim))

        self.b_ih = nn.Parameter(torch.zeros(hidden_dim))
        self.b_ho = nn.Parameter(torch.zeros(output_dim))

Add a function `reset_parameters` that initializes your parameters. Pick a suitable distribution from [nn.init](https://pytorch.org/docs/stable/nn.init.html).

In [7]:
def reset_parameters(self):
    for weight in self.parameters():
        nn.init.uniform_(weight, -1, 1)

RNN.reset_parameters = reset_parameters

Add a `forward` function that takes an input and a starting hidden state $h_{t-1}$ and returns the updated hidden state $h_t$ and output $y$ as outputs. The initial hidden state $h_0$ can be initialized randomly/to all zeros.

In [8]:
def forward(self, x, hidden_state):
    hidden_state = torch.tanh(
        torch.matmul(self.W_ih, x) + self.b_ih +
        torch.matmul(self.W_hh, hidden_state)
    )
    output = torch.matmul(self.W_ho, hidden_state) + self.b_ho
    return output, hidden_state
    

RNN.forward = forward

Test your RNN with a single input.

In [9]:
input_dim = 5
hidden_dim = 20
output_dim = 10
rnn = RNN(input_dim, hidden_dim, output_dim)
rnn.reset_parameters()

x = torch.randn(input_dim)
hidden_state = torch.zeros(hidden_dim)
output, hidden_state = rnn(x, hidden_state) 

print('y shape:', output.shape)
print('hidden state shape:', hidden_state.shape)
print('y:', output)
print('hidden state:', hidden_state)

y shape: torch.Size([10])
hidden state shape: torch.Size([20])
y: tensor([ 3.6179, -3.9651,  2.4711,  3.0582,  1.3208, -0.0065,  2.2494, -3.7407,
        -2.2305, -0.4158], grad_fn=<AddBackward0>)
hidden state: tensor([ 0.3776, -0.8823,  0.6815,  0.8881, -0.8093, -0.9457, -0.9092,  0.6093,
        -0.8032,  0.6621,  0.8265,  0.2261, -0.9486, -0.3425,  0.8368,  0.8207,
         0.1809,  0.3537, -0.9602,  0.4349], grad_fn=<TanhBackward0>)


Now create an input sequence and run it through your RNN.

In [10]:
seq_length = 4
inputs = [torch.randn(input_dim) for _ in range(seq_length)]
hidden_state = torch.zeros(hidden_dim)
outputs = []

for x in inputs:
    output, new_hidden_state = rnn(x, hidden_state)
    hidden_state = new_hidden_state
    outputs.append(output)

print('y shape:', output.shape)
print('hidden state shape:', hidden_state.shape)
print('y:', output)
print('hidden state:', hidden_state)

y shape: torch.Size([10])
hidden state shape: torch.Size([20])
y: tensor([-0.1227, -0.3844, -2.2604,  3.4452, -6.7179,  0.0838, -1.3907,  0.9174,
        -0.0674,  2.3946], grad_fn=<AddBackward0>)
hidden state: tensor([ 0.9990,  0.9542, -1.0000,  0.7215,  0.9027,  0.9982, -0.9286, -0.9915,
        -0.9815,  0.8992, -0.9880, -0.2719,  0.9751,  0.9990,  0.9988,  0.9763,
        -0.9923,  0.8183,  0.9315,  0.9997], grad_fn=<TanhBackward0>)


The final hidden state encodes all the information present in the input sequence. It can be used as a feature for classification, or to initialize a decoder RNN to do translation, for example.

Now look at PyTorch's documentation for the [`nn.RNN`](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) and the [`nn.RNNCell`](https://pytorch.org/docs/stable/generated/torch.nn.RNNCell.html) classes. What is the difference between the two? What is the difference to the definition from Wikipedia we used above? Run your input sequence through both the `nn.RNN` and the `nn.RNNCell`.

In [11]:
rnn = nn.RNNCell(10, 20)
input = torch.randn(6, 3, 10)
hx = torch.randn(3, 20)
output = []
for i in range(6):
    hx = rnn(input[i], hx)
    output.append(hx)

print('hx shape:', hx.shape)
print('hx:', hx)
print('len(output):', len(output))
print('output[-1] shape:', output[-1].shape)

hx shape: torch.Size([3, 20])
hx: tensor([[ 0.7233,  0.5792, -0.4003,  0.3691,  0.0636,  0.3248, -0.1921, -0.4253,
         -0.7624, -0.2697, -0.5016,  0.2533,  0.2468,  0.3396,  0.0448, -0.2077,
          0.3960, -0.4848, -0.5844,  0.4620],
        [ 0.0211,  0.3357, -0.0255,  0.3007, -0.1474,  0.1392, -0.0698, -0.4382,
         -0.3114,  0.2774,  0.0880, -0.7112,  0.1508,  0.5629,  0.2197,  0.6349,
         -0.1773, -0.5267, -0.1543, -0.1631],
        [ 0.5205,  0.5119,  0.4771, -0.2527,  0.5881, -0.5667,  0.2836,  0.0392,
          0.0441,  0.3735, -0.2442,  0.0407,  0.1120,  0.3315,  0.4730, -0.1075,
          0.0994,  0.1260,  0.0323, -0.2825]], grad_fn=<TanhBackward0>)
len(output): 6
output[-1] shape: torch.Size([3, 20])


In [12]:
rnn = nn.RNN(10, 20, 2)
input = torch.randn(5, 3, 10) # 5 = Sequence length, 3 = Batch size, 10 = Input dim
h0 = torch.randn(2, 3, 20) # 2 = Number of layers, 3 = Batch size, 20 = Hidden dim
output, hn = rnn(input, h0)

print('output shape:', output.shape)
print('hn shape:', hn.shape)
print('output:', output)
print('hn:', hn)


output shape: torch.Size([5, 3, 20])
hn shape: torch.Size([2, 3, 20])
output: tensor([[[-4.8626e-01,  4.0709e-01, -8.4299e-01, -4.0113e-01, -2.6314e-01,
          -1.8457e-01,  4.1829e-01,  9.7122e-01, -6.0532e-01, -2.3119e-01,
           7.4868e-01, -4.3750e-01, -1.3184e-01,  3.9995e-01, -7.0460e-01,
          -5.5162e-01, -5.0897e-01,  7.4554e-01, -5.6218e-02,  6.3047e-01],
         [ 6.3641e-01,  3.5804e-01, -6.2575e-01, -7.1356e-01, -4.5802e-01,
          -5.6249e-01,  2.4483e-01,  6.5745e-01,  1.2468e-01,  6.6862e-01,
          -3.4193e-01,  6.3295e-01, -6.8234e-02,  2.7414e-02, -3.7042e-01,
          -6.0987e-01, -4.9840e-01,  4.5116e-01,  1.4207e-01, -9.8684e-02],
         [ 4.2996e-01, -1.1258e-01,  1.2652e-01, -6.6382e-01,  3.3842e-01,
          -1.8776e-01, -2.2294e-02,  7.6567e-01,  5.3179e-01,  2.1342e-01,
          -1.4541e-01, -1.5331e-01, -1.7251e-01, -7.1197e-01, -4.9672e-01,
          -6.6058e-01,  2.8546e-01,  2.2175e-01, -6.4800e-01,  4.8958e-01]],

        [[-2.9645

In [14]:
rnn_cell = nn.RNNCell(input_dim, hidden_dim)
x = torch.randn(seq_length, input_dim)
hidden_state = torch.zeros(hidden_dim)
output = []	

for i in range(seq_length):
    hidden_state = rnn_cell(x[i], hidden_state)
    output.append(hidden_state)

print('hidden state shape:', hidden_state.shape)
print('len(output):', len(output))

hidden state shape: torch.Size([20])
len(output): 4


In [15]:
torch_rnn = nn.RNN(input_dim, hidden_dim, num_layers=1)
x = torch.randn(seq_length, input_dim)
h0 = torch.zeros(1, hidden_dim)
output, hn = torch_rnn(x, h0)

print('output shape:', output.shape)

output shape: torch.Size([4, 20])
