# Recurrent Neural Networks
In this exercise, we will implement a simple one-layer recurrent neural network. We will use the formula for an [Elman RNN](https://en.wikipedia.org/wiki/Recurrent_neural_network#Elman_networks_and_Jordan_networks), one of the most basic and classical RNNs. The hidden state update and output at time $t$ are defined like this:

$$
\begin{align}
h_t &= \tanh(W_h x_t + U_h h_{t-1} + b_h) \\
y_t &= \tanh(W_y h_t + b_y)
\end{align}
$$

In [1]:
import torch
import torch.nn as nn

We start by defining the RNN as a subclass of `nn.Module`. The network's parameters are created in the `__init__` method. Use `input_dim`, `hidden_dim` and `output_dim` as arguments that define the dimensionality of the input/hidden/output vectors. Define your parameters as `nn.Parameter` with the appropriate dimensions. The documentation of `torch.nn` can be found [here](https://pytorch.org/docs/stable/nn.html).

In [5]:
class RNN(nn.Module):

    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.W_ih = nn.Parameter(torch.zeros(hidden_dim, input_dim))
        self.W_hh = nn.Parameter(torch.zeros(hidden_dim, hidden_dim))
        self.W_ho = nn.Parameter(torch.zeros(output_dim, hidden_dim))

        self.b_ih = nn.Parameter(torch.zeros(hidden_dim))
        self.b_ho = nn.Parameter(torch.zeros(output_dim))

Add a function `reset_parameters` that initializes your parameters. Pick a suitable distribution from [nn.init](https://pytorch.org/docs/stable/nn.init.html).

In [8]:
def reset_parameters(self):
    for weight in self.parameters():
        nn.init.uniform_(weight, -1, 1)

RNN.reset_parameters = reset_parameters

Add a `forward` function that takes an input and a starting hidden state $h_{t-1}$ and returns the updated hidden state $h_t$ and output $y$ as outputs. The initial hidden state $h_0$ can be initialized randomly/to all zeros.

In [11]:
def forward(self, x, hidden_state):
    hidden_state = torch.tanh(
        torch.matmul(self.W_ih, x) + self.b_ih +
        torch.matmul(self.W_hh, hidden_state)
    )
    output = torch.matmul(self.W_ho, hidden_state) + self.b_ho
    return output, hidden_state
    

RNN.forward = forward

Test your RNN with a single input.

In [12]:
input_dim = 5
hidden_dim = 20
output_dim = 10
rnn = RNN(input_dim, hidden_dim, output_dim)
rnn.reset_parameters()

x = torch.randn(input_dim)
hidden_state = torch.zeros(hidden_dim)
output, hidden_state = rnn(x, hidden_state)

print('y shape:', output.shape)
print('hidden state shape:', hidden_state.shape)
print('y:', output)
print('hidden state:', hidden_state)

y shape: torch.Size([10])
hidden state shape: torch.Size([20])
y: tensor([ 0.1539,  1.0786, -0.0413, -0.0896,  0.7565,  1.1715,  3.0346,  1.6836,
         1.3743, -2.6290], grad_fn=<AddBackward0>)
hidden state: tensor([-0.1219, -0.9612,  0.6818, -0.9993, -0.0551,  0.9607, -0.7810, -0.9295,
        -0.9291,  0.9873,  0.3704,  0.8568,  0.7495,  0.2143, -0.5961, -0.9334,
        -0.7909, -0.3600,  0.8832,  0.9368], grad_fn=<TanhBackward0>)


Now create an input sequence and run it through your RNN.

The final hidden state encodes all the information present in the input sequence. It can be used as a feature for classification, or to initialize a decoder RNN to do translation, for example.

Now look at PyTorch's documentation for the [`nn.RNN`](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) and the [`nn.RNNCell`](https://pytorch.org/docs/stable/generated/torch.nn.RNNCell.html) classes. What is the difference between the two? What is the difference to the definition from Wikipedia we used above? Run your input sequence through both the `nn.RNN` and the `nn.RNNCell`.