# Recurrent Neural Networks
In this exercise, we will implement a simple one-layer recurrent neural network. We will use the formula for an [Elman RNN](https://en.wikipedia.org/wiki/Recurrent_neural_network#Elman_networks_and_Jordan_networks), one of the most basic and classical RNNs. The hidden state update and output at time $t$ are defined like this:

$$
\begin{align}
h_t &= \tanh(W_h x_t + U_h h_{t-1} + b_h) \\
y_t &= \tanh(W_y h_t + b_y)
\end{align}
$$

In [2]:
import torch
import torch.nn as nn

We start by defining the RNN as a subclass of `nn.Module`. The network's parameters are created in the `__init__` method. Use `input_dim`, `hidden_dim` and `output_dim` as arguments that define the dimensionality of the input/hidden/output vectors. Define your parameters as `nn.Parameter` with the appropriate dimensions. The documentation of `torch.nn` can be found [here](https://pytorch.org/docs/stable/nn.html).

In [5]:
class RNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.Wh = nn.Parameter(torch.zeros(hidden_dim, input_dim))
        self.Uh = nn.Parameter(torch.zeros(hidden_dim, hidden_dim))
        self.Wy = nn.Parameter(torch.zeros(output_dim, hidden_dim))
        self.bh = nn.Parameter(torch.zeros(hidden_dim))
        self.by = nn.Parameter(torch.zeros(output_dim))    
        

Add a function `reset_parameters` that initializes your parameters. Pick a suitable distribution from [nn.init](https://pytorch.org/docs/stable/nn.init.html).

In [19]:
def reset_parameters(self):
    for weight in self.parameters():
        nn.init.uniform_(weight, -1, 1)

RNN.reset_parameters = reset_parameters

Add a `forward` function that takes an input and a starting hidden state $h_{t-1}$ and returns the updated hidden state $h_t$ and output $y$ as outputs. The initial hidden state $h_0$ can be initialized randomly/to all zeros.

In [20]:
def forward(self, x, ht_1):
    ht = torch.tanh(self.Wh @ x + self.Uh @ ht_1 + self.bh)
    y = torch.tanh(self.Wy @ ht + self.by)
    return ht, y
    
RNN.forward = forward

Test your RNN with a single input.

In [21]:
dim_input, dim_hidden, dim_output = (10, 20, 15)
h0 = torch.randn(dim_hidden)
x0 = torch.randn(dim_input)

In [36]:
model = RNN(dim_input, dim_hidden, dim_output)
model.reset_parameters()
ht, y = model.forward(x0, h0)

print("y shape: ", y.shape)
print("y: ", y)
print("ht shape: ", ht.shape)
print("ht: ", ht)

y shape:  torch.Size([15])
y:  tensor([ 0.8522, -0.4396,  0.9081,  0.7018,  0.6939, -0.2038, -0.7309, -0.3070,
        -0.9993, -0.9968, -0.3899,  0.9506, -0.3034, -0.4268, -0.9063],
       grad_fn=<TanhBackward0>)
ht shape:  torch.Size([20])
ht:  tensor([-0.9897, -0.8737, -0.9210,  0.7689,  0.9758, -1.0000,  0.9759,  0.9992,
        -0.9356,  0.4071,  0.9147, -0.2621, -0.9999, -0.8910, -0.9722, -0.3043,
        -0.7230,  0.6928, -0.8009, -0.9991], grad_fn=<TanhBackward0>)


Now create an input sequence and run it through your RNN.

In [37]:
seq_length = 5

inputs = [torch.randn(dim_input) for _ in range(seq_length)]

In [38]:
outputs = []
ht = h0
for xt in inputs:
    ht, y = model.forward(xt, ht)
    outputs.append(y)
    
print("ht\n", ht)
print("\noutputs\n", outputs)

ht
 tensor([-9.8492e-01,  9.0891e-01,  6.6347e-01, -9.9267e-01, -9.8754e-01,
        -9.9162e-01,  9.2085e-01,  9.9529e-01,  6.6217e-01,  8.1873e-01,
         9.4253e-01, -9.8380e-01, -9.4849e-04,  9.5505e-01, -4.0467e-01,
        -2.1099e-01, -9.9999e-01, -7.8392e-01, -9.9894e-01, -5.9547e-01],
       grad_fn=<TanhBackward0>)

y
 tensor([-0.2970,  0.4125,  0.9999, -0.1348,  0.4817,  0.9889, -0.9941, -0.8282,
        -0.9976,  0.9502,  0.9996, -0.1354, -0.6858,  0.5237,  0.2370],
       grad_fn=<TanhBackward0>)

outputs
 [tensor([ 0.8530,  0.5062,  0.9869,  0.9053, -0.9133,  0.8298,  0.3715,  0.9877,
        -0.8625, -0.2985,  0.3582,  0.7245,  0.4831, -0.5751,  0.2007],
       grad_fn=<TanhBackward0>), tensor([-0.8989, -0.6206,  0.6532, -0.9993, -0.4434,  0.4054,  0.1613, -0.9830,
         0.8664,  0.1243,  0.9999,  0.7757,  0.9802, -0.9999, -0.7440],
       grad_fn=<TanhBackward0>), tensor([-0.4862,  0.4932, -0.0237,  0.9999,  0.1283,  0.5273,  0.8645, -0.3761,
        -0.9958, -0.99

The final hidden state encodes all the information present in the input sequence. It can be used as a feature for classification, or to initialize a decoder RNN to do translation, for example.

Now look at PyTorch's documentation for the [`nn.RNN`](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) and the [`nn.RNNCell`](https://pytorch.org/docs/stable/generated/torch.nn.RNNCell.html) classes. What is the difference between the two? What is the difference to the definition from Wikipedia we used above? Run your input sequence through both the `nn.RNN` and the `nn.RNNCell`.