# Recurrent Neural Networks
In this exercise, we will implement a simple one-layer recurrent neural network. We will use the formula for an [Elman RNN](https://en.wikipedia.org/wiki/Recurrent_neural_network#Elman_networks_and_Jordan_networks), one of the most basic and classical RNNs. The hidden state update and output at time $t$ are defined like this:

$$
\begin{align}
h_t &= \tanh(W_h x_t + U_h h_{t-1} + b_h) \\
y_t &= \tanh(W_y h_t + b_y)
\end{align}
$$

In [1]:
import torch
import torch.nn as nn

We start by defining the RNN as a subclass of `nn.Module`. The network's parameters are created in the `__init__` method. Use `input_dim`, `hidden_dim` and `output_dim` as arguments that define the dimensionality of the input/hidden/output vectors. Define your parameters as `nn.Parameter` with the appropriate dimensions. The documentation of `torch.nn` can be found [here](https://pytorch.org/docs/stable/nn.html).

In [37]:
class RNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.output_dim = output_dim
        
        self.W_h = nn.Parameter(torch.randn(hidden_dim, input_dim))  # Weight matrix for input to hidden state
        self.U_h = nn.Parameter(torch.randn(hidden_dim, hidden_dim))  # Weight matrix for hidden to hidden state
        self.b_h = nn.Parameter(torch.zeros(hidden_dim))  # Bias for hidden state
        
        self.W_y = nn.Parameter(torch.randn(output_dim, hidden_dim))  # Weight matrix for hidden to output state
        self.b_y = nn.Parameter(torch.zeros(output_dim))  # Bias for output state
        
    def reset_parameters(self):
        for weight in self.parameters():
            nn.init.uniform(weight, -1, 1)
        
    def forward(self, x, h_previous):
        h_t = torch.tanh(self.W_h @ x + self.U_h @ h_previous + self.b_h)
        y_t = torch.tanh(self.W_y @ h_t + self.b_y)
        
        return y_t, h_t
        

Add a function `reset_parameters` that initializes your parameters. Pick a suitable distribution from [nn.init](https://pytorch.org/docs/stable/nn.init.html).

Add a `forward` function that takes an input and a starting hidden state $h_{t-1}$ and returns the updated hidden state $h_t$ and output $y$ as outputs. The initial hidden state $h_0$ can be initialized randomly/to all zeros.

Test your RNN with a single input.

In [40]:
net_input_dim = 100
net_hidden_dim = 128
net_output_dim = 10

net_rnn = RNN(net_input_dim, net_hidden_dim, net_output_dim)

test_input = torch.randn(net_input_dim)
hidden_input = torch.zeros(net_hidden_dim)

y, h = net_rnn(test_input, hidden_input)
print(y)
print(h)

tensor([-1.0000, -0.9990,  1.0000,  1.0000,  0.9927, -1.0000,  1.0000,  1.0000,
         0.6235,  0.9701], grad_fn=<TanhBackward0>)
tensor([ 1.0000,  0.9682,  1.0000, -1.0000,  1.0000, -0.9999, -0.9948, -0.9818,
        -0.9997,  0.8280,  1.0000,  0.9997,  1.0000,  1.0000,  0.9999,  1.0000,
         1.0000, -1.0000, -0.9995,  0.9982,  0.9999,  1.0000, -1.0000,  0.9106,
        -0.9234,  0.9924, -0.9999, -0.9977, -0.5818,  1.0000, -1.0000,  1.0000,
         1.0000,  1.0000, -0.1461, -0.7147,  0.9377,  0.9633,  1.0000,  0.9998,
         0.9904,  1.0000, -0.9965, -1.0000,  1.0000, -0.6506,  0.2831, -1.0000,
         1.0000,  0.9994, -1.0000, -0.9972, -1.0000,  1.0000, -0.5986,  1.0000,
        -1.0000,  0.9998, -0.7417, -0.9985,  0.9891,  1.0000,  0.9832, -0.8445,
        -1.0000,  1.0000, -1.0000, -0.9122, -1.0000, -0.9999,  0.9995, -0.7235,
        -1.0000, -1.0000,  1.0000,  1.0000,  1.0000,  1.0000,  1.0000,  1.0000,
        -0.9980,  1.0000, -0.9993,  1.0000,  1.0000, -0.9917, -0.456

Now create an input sequence and run it through your RNN.

In [41]:
sequence_size = 10
test_sequence = torch.randn(sequence_size, net_input_dim)

outputs = []
h = hidden_input
for x in test_sequence:
    y, h = net_rnn(x, h)
    outputs.append(y)

print(len(outputs))
print(outputs[0].shape)

10
torch.Size([10])


The final hidden state encodes all the information present in the input sequence. It can be used as a feature for classification, or to initialize a decoder RNN to do translation, for example.

Now look at PyTorch's documentation for the [`nn.RNN`](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) and the [`nn.RNNCell`](https://pytorch.org/docs/stable/generated/torch.nn.RNNCell.html) classes. What is the difference between the two? What is the difference to the definition from Wikipedia we used above? Run your input sequence through both the `nn.RNN` and the `nn.RNNCell`.