# Recurrent Neural Networks
In this exercise, we will implement a simple one-layer recurrent neural network. We will use the formula for an [Elman RNN](https://en.wikipedia.org/wiki/Recurrent_neural_network#Elman_networks_and_Jordan_networks), one of the most basic and classical RNNs. The hidden state update and output at time $t$ are defined like this:

$$
\begin{align}
h_t &= \tanh(W_h x_t + U_h h_{t-1} + b_h) \\
y_t &= \tanh(W_y h_t + b_y)
\end{align}
$$

In [1]:
import torch
import torch.nn as nn

We start by defining the RNN as a subclass of `nn.Module`. The network's parameters are created in the `__init__` method. Use `input_dim`, `hidden_dim` and `output_dim` as arguments that define the dimensionality of the input/hidden/output vectors. Define your parameters as `nn.Parameter` with the appropriate dimensions. The documentation of `torch.nn` can be found [here](https://pytorch.org/docs/stable/nn.html).

In [2]:
class RNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.Wh = nn.Parameter(torch.zeros(hidden_dim, input_dim))
        self.Uh = nn.Parameter(torch.zeros(hidden_dim, hidden_dim))
        self.Wy = nn.Parameter(torch.zeros(output_dim, hidden_dim))
        self.bh = nn.Parameter(torch.zeros(hidden_dim))
        self.by = nn.Parameter(torch.zeros(output_dim))    
        

Add a function `reset_parameters` that initializes your parameters. Pick a suitable distribution from [nn.init](https://pytorch.org/docs/stable/nn.init.html).

In [3]:
def reset_parameters(self):
    for weight in self.parameters():
        nn.init.uniform_(weight, -1, 1)

RNN.reset_parameters = reset_parameters

Add a `forward` function that takes an input and a starting hidden state $h_{t-1}$ and returns the updated hidden state $h_t$ and output $y$ as outputs. The initial hidden state $h_0$ can be initialized randomly/to all zeros.

In [4]:
def forward(self, x, ht_1):
    ht = torch.tanh(self.Wh @ x + self.Uh @ ht_1 + self.bh)
    y = torch.tanh(self.Wy @ ht + self.by)
    return ht, y
    
RNN.forward = forward

Test your RNN with a single input.

In [5]:
dim_input, dim_hidden, dim_output = (10, 20, 15)
h0 = torch.randn(dim_hidden)
x0 = torch.randn(dim_input)

In [6]:
model = RNN(dim_input, dim_hidden, dim_output)
model.reset_parameters()
ht, y = model.forward(x0, h0)

print("y shape: ", y.shape)
print("y: ", y)
print("ht shape: ", ht.shape)
print("ht: ", ht)

y shape:  torch.Size([15])
y:  tensor([-0.8026,  0.9993, -0.4994, -0.6240,  0.9978, -0.9041, -0.3223,  0.9963,
         0.9881, -0.9676,  0.9970,  0.9999, -0.9980, -0.4405,  0.9582],
       grad_fn=<TanhBackward0>)
ht shape:  torch.Size([20])
ht:  tensor([ 0.9676, -0.5179,  0.6834,  0.4823,  0.9545, -0.9994,  0.3752,  0.9804,
        -0.9777, -0.3434,  0.4345, -0.0618,  1.0000,  0.9723,  0.2229,  0.9930,
        -0.9636,  0.9763,  0.5521,  0.9998], grad_fn=<TanhBackward0>)


Now create an input sequence and run it through your RNN.

In [7]:
seq_length = 5

inputs = [torch.randn(dim_input) for _ in range(seq_length)]

In [8]:
outputs = []
ht = h0
for xt in inputs:
    ht, y = model.forward(xt, ht)
    outputs.append(y)
    
print("ht\n", ht)
print("\noutputs\n", outputs)

ht
 tensor([-0.9832,  0.6787,  0.7038, -0.9843, -0.9976, -0.5880, -0.8074, -0.9992,
         0.1631,  0.7951, -0.1714,  0.6469,  0.9971,  0.9482, -0.4259, -0.4039,
        -0.9999, -0.9874, -0.9555,  0.6234], grad_fn=<TanhBackward0>)

outputs
 [tensor([-0.9997,  0.9996,  0.9565, -0.9927, -0.5232,  0.3257, -0.9490,  0.4157,
        -0.7570, -0.9964,  0.7305,  0.9387, -0.8323,  0.9593,  0.1272],
       grad_fn=<TanhBackward0>), tensor([-0.8643,  0.9448, -0.0737,  0.8463, -1.0000,  0.8029, -0.9273,  0.9604,
        -0.9671,  0.8110, -0.9197,  0.8933, -0.9804,  1.0000,  0.9843],
       grad_fn=<TanhBackward0>), tensor([-0.9861, -0.7645,  0.0919, -0.9332, -0.9997,  0.9621, -0.9634,  0.6877,
        -0.9973, -0.5681, -0.9986, -0.9975, -0.0511,  0.9933,  0.5907],
       grad_fn=<TanhBackward0>), tensor([-0.9699,  0.9604, -0.7648, -0.9739,  0.9631,  0.8086, -0.9645, -0.8132,
         0.3085, -0.4652,  0.9772,  0.9975,  0.7106,  0.3331, -0.9957],
       grad_fn=<TanhBackward0>), tensor([ 0.6681

The final hidden state encodes all the information present in the input sequence. It can be used as a feature for classification, or to initialize a decoder RNN to do translation, for example.

Now look at PyTorch's documentation for the [`nn.RNN`](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) and the [`nn.RNNCell`](https://pytorch.org/docs/stable/generated/torch.nn.RNNCell.html) classes. What is the difference between the two? What is the difference to the definition from Wikipedia we used above? Run your input sequence through both the `nn.RNN` and the `nn.RNNCell`.

In [9]:
seq_length = 5

In [10]:
rnn_cell = nn.RNNCell(dim_input, dim_hidden)
x = torch.randn(seq_length, dim_input)
hidden_state = torch.zeros(dim_hidden)
output = []
for i in range(seq_length):
    hidden_state = rnn_cell(x[i], hidden_state)
    output.append(hidden_state)
print(len(output))
print(output[0].shape)

torch_rnn = nn.RNN(dim_input, dim_hidden, num_layers=1)
x = torch.randn(seq_length, dim_input)
h0 = torch.zeros(1, dim_hidden)
output, hn = torch_rnn(x, h0)
print(output.shape)

5
torch.Size([20])
torch.Size([5, 20])


* They expect 2D or 3D input
* RNN returns both y and ht
* RNNCell only returns ht