# Recurrent Neural Networks
In this exercise, we will implement a simple one-layer recurrent neural network. We will use the formula for an [Elman RNN](https://en.wikipedia.org/wiki/Recurrent_neural_network#Elman_networks_and_Jordan_networks), one of the most basic and classical RNNs. The hidden state update and output at time $t$ are defined like this:

$$
\begin{align}
h_t &= \tanh(W_h x_t + U_h h_{t-1} + b_h) \\
y_t &= \tanh(W_y h_t + b_y)
\end{align}
$$

In [1]:
import torch
import torch.nn as nn

We start by defining the RNN as a subclass of `nn.Module`. The network's parameters are created in the `__init__` method. Use `input_dim`, `hidden_dim` and `output_dim` as arguments that define the dimensionality of the input/hidden/output vectors. Define your parameters as `nn.Parameter` with the appropriate dimensions. The documentation of `torch.nn` can be found [here](https://pytorch.org/docs/stable/nn.html).

In [2]:
class RNN(nn.Module):
    
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()  # run the nn.Module init
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.w_h = nn.Parameter(torch.zeros(hidden_dim, input_dim))
        self.u_h = nn.Parameter(torch.zeros(hidden_dim, hidden_dim))
        self.b_h = nn.Parameter(torch.zeros(hidden_dim))
        self.w_y = nn.Parameter(torch.empty(output_dim, hidden_dim))  # can also use torch.empty
        self.b_y = nn.Parameter(torch.zeros(output_dim))

Add a function `reset_parameters` that initializes your parameters. Pick a suitable distribution from [nn.init](https://pytorch.org/docs/stable/nn.init.html).

In [3]:
import math

def reset_parameters(self):
    # PyTorch's RNN initialization
    stdv = 1.0 / math.sqrt(self.hidden_dim)
    for weight in self.parameters():
        nn.init.uniform_(weight, -stdv, stdv)

RNN.reset_parameters = reset_parameters

Add a `forward` function that takes an input and the previous hidden state $h_{t-1}$ and returns the updated hidden state $h_t$ and output $y$ as outputs.

In [4]:
def forward(self, x, hidden_state):
    hidden_state = torch.tanh(
        torch.matmul(self.w_h, x)
        + torch.matmul(self.u_h, hidden_state)
        + self.b_h
    )
    y = torch.tanh(torch.matmul(self.w_y, hidden_state) + self.b_y)
    return y, hidden_state

RNN.forward = forward

Test your RNN with a single input. The initial hidden state $h_0$ can be initialized randomly/to all zeros.

In [5]:
input_dim = 5
hidden_dim = 20
output_dim = 10
rnn = RNN(input_dim, hidden_dim, output_dim)
rnn.reset_parameters()  # initialize parameters
x = torch.randn(input_dim)
h0 = torch.zeros(hidden_dim)
y, new_hidden_state = rnn(x, h0)
print('y shape:', y.shape)
print('y:', y)
print('h shape:', new_hidden_state.shape)

y shape: torch.Size([10])
y: tensor([-0.0942,  0.2157, -0.1932, -0.0863,  0.2586, -0.2111, -0.3795, -0.1257,
        -0.0737,  0.0977], grad_fn=<TanhBackward0>)
h shape: torch.Size([20])


Now create an input sequence and run it through your RNN.

In [6]:
seq_length = 4
inputs = [torch.randn(input_dim) for _ in range(seq_length)]
hidden_state = h0
outputs = []
for x in inputs:
    y, new_hidden_state = rnn(x, hidden_state)
    outputs.append(y)
    hidden_state = new_hidden_state
print('Final output shape:', y.shape)
print('Final output:', y)
print('Final hidden state shape:', hidden_state.shape)
print('Final hidden state:', hidden_state)

Final output shape: torch.Size([10])
Final output: tensor([ 0.2384,  0.0625, -0.2640, -0.0279, -0.1201, -0.1379, -0.2457, -0.1157,
         0.0913,  0.1543], grad_fn=<TanhBackward0>)
Final hidden state shape: torch.Size([20])
Final hidden state: tensor([-0.4002,  0.3011,  0.1140,  0.1071, -0.2995,  0.0858,  0.3285,  0.1874,
         0.5416,  0.4066,  0.3935, -0.4461,  0.2086, -0.2570, -0.1168,  0.0769,
         0.1403,  0.0268, -0.2610, -0.3753], grad_fn=<TanhBackward0>)


The final hidden state encodes all the information present in the input sequence. It can be used as a feature for classification, or to initialize a decoder RNN to do translation, for example.

Now look at PyTorch's documentation for the [`nn.RNN`](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) and the [`nn.RNNCell`](https://pytorch.org/docs/stable/generated/torch.nn.RNNCell.html) classes. What is the difference between the two? What is the difference to the definition from Wikipedia we used above? Run your input sequence through both the `nn.RNN` and the `nn.RNNCell`.

In [7]:
# RNNCell is only the hidden state update
rnn_cell = nn.RNNCell(input_dim, hidden_dim)
x = torch.randn(seq_length, input_dim)
hx = torch.randn(hidden_dim)
output = []
for i in range(seq_length):
    hx = rnn_cell(x[i], hx)
    output.append(hx)
print('RNN cell output:', len(output), output[0].shape)
    
# RNN automatically handles the sequence as well
num_layers = 1
torch_rnn = nn.RNN(input_dim, hidden_dim, num_layers)
x = torch.randn(seq_length, input_dim)
h0 = torch.randn(num_layers, hidden_dim)
output, hn = torch_rnn(x, h0)
print('RNN output:', output.shape)

RNN cell output: 4 torch.Size([20])
RNN output: torch.Size([4, 20])
