# Recurrent Neural Networks
In this exercise, we will implement a simple one-layer recurrent neural network. We will use the formula for an [Elman RNN](https://en.wikipedia.org/wiki/Recurrent_neural_network#Elman_networks_and_Jordan_networks), one of the most basic and classical RNNs. The hidden state update and output at time $t$ are defined like this:

$$
\begin{align}
h_t &= \tanh(W_h x_t + U_h h_{t-1} + b_h) \\
y_t &= \tanh(W_y h_t + b_y)
\end{align}
$$

In [1]:
import torch
import torch.nn as nn

We start by defining the RNN as a subclass of `nn.Module`. The network's parameters are created in the `__init__` method. Use `input_dim`, `hidden_dim` and `output_dim` as arguments that define the dimensionality of the input/hidden/output vectors. Define your parameters as `nn.Parameter` with the appropriate dimensions. The documentation of `torch.nn` can be found [here](https://pytorch.org/docs/stable/nn.html).

In [None]:
class RNN(nn.Module):
    
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()        
        self.w_h = nn.Parameter(torch.empty(hidden_dim, input_dim))
        self.u_h = nn.Parameter(torch.empty(hidden_dim, hidden_dim))
        self.w_y = nn.Parameter(torch.empty(output_dim, hidden_dim))
        
        self.b_h = nn.Parameter(torch.empty(hidden_dim))
        self.b_y = nn.Parameter(torch.empty(output_dim))      

Add a function `reset_parameters` that initializes your parameters. Pick a suitable distribution from [nn.init](https://pytorch.org/docs/stable/nn.init.html).

In [17]:
def reset_parameters(self):
    for weight in self.parameters():
        nn.init.uniform_(weight, -1, 1)
        
RNN.reset_parameters = reset_parameters

Add a `forward` function that takes an input and a starting hidden state $h_{t-1}$ and returns the updated hidden state $h_t$ and output $y$ as outputs. The initial hidden state $h_0$ can be initialized randomly/to all zeros.

In [22]:
def forward(self, x, h_t):
    new_h_t = torch.tanh(self.w_h @ x + self.u_h @ h_t + self.b_h)
    y_t = torch.tanh(self.w_y @ new_h_t - self.b_y)
    return y_t, new_h_t

RNN.forward = forward

Test your RNN with a single input.

In [27]:
input_dim = 5
hidden_dim = 20
output_dim = 10
rnn = RNN(input_dim, hidden_dim, output_dim)
rnn.reset_parameters()
x = torch.randn(input_dim)
h0 = torch.zeros(hidden_dim)
y, new_h_t = rnn(x, h0)
print('y shape: ', y.shape)
print('y: ', y)
print('h shape: ', new_h_t.shape)

y shape:  torch.Size([10])
y:  tensor([-0.8376,  0.9170,  0.9925,  0.9643, -0.4893,  0.5281, -0.8434,  0.9928,
        -0.6311,  0.0650], grad_fn=<TanhBackward0>)
h shape:  torch.Size([20])


Now create an input sequence and run it through your RNN.

In [None]:
seq_length = 4
inputs = [torch.randn(input_dim) for _ in range(seq_length)]
h0 = torch.zeros(hidden_dim)
outputs = []
for x in inputs:
    y, new_h_t = rnn(x, h0)
    h_t = new_h_t
    outputs.append(y)

The final hidden state encodes all the information present in the input sequence. It can be used as a feature for classification, or to initialize a decoder RNN to do translation, for example.

Now look at PyTorch's documentation for the [`nn.RNN`](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) and the [`nn.RNNCell`](https://pytorch.org/docs/stable/generated/torch.nn.RNNCell.html) classes. What is the difference between the two? What is the difference to the definition from Wikipedia we used above? Run your input sequence through both the `nn.RNN` and the `nn.RNNCell`.