# Recurrent Neural Networks
- Vanilla RNN
- Gated Recurrent Units
- Long Short Term Memory

In [1]:
import numpy as np
import pandas as pd
import torch, torchvision
torch.__version__

'1.12.0+cu113'

In [2]:
import torch.nn as nn

## 1. Vanilla RNN
![vanilla_RNN](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png)


- Vanilla RNN can be implemented with ```torch.nn.RNN()``` 

- Key Parameters
  - ```input_size```:  number of expected features in the input (i.e., dimensionality of feature space)
  - ```hidden_size```: number of features in hidden state (i.e., dimensionality of output space)
  - ```num_layers```: number of recurrent layers (to create stacked RNN)
  - ```batch_first```: If ```True```, input and output tensor shapes are ```(batch_size, seq_length, dim_feature)```. If ```False```, ```(sequence_length, batch_size, dim_feature)```
  - ```bidirectional```: If ```True```, bidirectional RNN
  
- One thing to note is that unlike fully-connected layers or convolutional layers, RNNs take multi inputs/outputs
  - In addition to (sequential) inputs, RNN has another called hidden state, which makes RNN special
  - This hidden state sends information regarding current step to the next\
  
- Inputs to RNN: ```(x0, h0)```
  - ```x0```: tensor that contains features of the input sequence
    - shape
      - ```(seq_len, batch_size, input_size)``` if ```batch_first == False``` (default)
      - ```(batch_size, seq_len, input_size)``` if ```batch_fist == True``` 
  - ```h0```: tensor that contains hidden state for each instance
    - shape
      - ```(num_layers * num_directions, batch_size, hidden_size)```

In [3]:
rnn = nn.RNN(input_size = 10, 
             hidden_size = 5, 
             num_layers = 1)

In [4]:
## inputs to RNN
# input data (seq_len, batch_size, input_size)
x0 = torch.from_numpy(np.random.randn(12, 64, 10)).float()     
# hidden state (num_layers * num_directions, batch_size, hidden_size)
h0 = torch.from_numpy(np.zeros((1, 64, 5))).float()            

print(x0.shape, h0.shape)

torch.Size([12, 64, 10]) torch.Size([1, 64, 5])


In [5]:
## outputs from RNN
# output (seq_len, batch_size, num_directions * hidden_size)
# hidden state (num_layers * num_directions, batch_size, hidden_size)
out, h1 = rnn(x0, h0)

print(out.shape, h1.shape)

torch.Size([12, 64, 5]) torch.Size([1, 64, 5])


In [6]:
# when batch_first = True
rnn = nn.RNN(input_size = 10, 
             hidden_size = 5, 
             num_layers = 2,     # stacked RNN (2 layers)
             batch_first = True)

In [7]:
## inputs to RNN
x0 = torch.from_numpy(np.random.randn(64, 12, 10)).float()     
# note that even batch_first == True, hidden state shape order does not change
h0 = torch.from_numpy(np.zeros((2, 64, 5))).float()            

print(x0.shape, h0.shape)

torch.Size([64, 12, 10]) torch.Size([2, 64, 5])


In [8]:
## outputs from RNN
out, h1 = rnn(x0, h0)

print(out.shape, h1.shape)

torch.Size([64, 12, 5]) torch.Size([2, 64, 5])


In [9]:
# bidirectional, stacked RNN
rnn = nn.RNN(input_size = 20, 
             hidden_size = 10, 
             num_layers = 4,     
             bidirectional = True)

x0 = torch.from_numpy(np.random.randn(5, 64, 20)).float()
h0 = torch.from_numpy(np.zeros((4 * 2, 64, 10))).float()  # notice the dimensionality of hidden state
out, h1 = rnn(x0, h0)

print(out.shape, h1.shape)

torch.Size([5, 64, 20]) torch.Size([8, 64, 10])


## 2. Gated Recurrent Units (GRU)

![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Gated_Recurrent_Unit%2C_base_type.svg/780px-Gated_Recurrent_Unit%2C_base_type.svg.png)

- GRU has rather complicated structure compared to vanilla RNN (see below figure), but in terms of implementing it with Pytorch, largely similar to RNN, using ```torch.nn.GRU```

In [10]:
gru = nn.GRU(input_size = 10, 
             hidden_size = 5, 
             num_layers = 1)

In [11]:
## inputs to GRU
# input data (seq_len, batch_size, input_size)
x0 = torch.from_numpy(np.random.randn(12, 64, 10)).float()     
# hidden state (num_layers * num_directions, batch_size, hidden_size)
h0 = torch.from_numpy(np.zeros((1, 64, 5))).float()            

print(x0.shape, h0.shape)

torch.Size([12, 64, 10]) torch.Size([1, 64, 5])


In [12]:
## outputs from GRU
# output (seq_len, batch_size, num_directions * hidden_size)
# hidden state (num_layers * num_directions, batch_size, hidden_size)
out, h1 = gru(x0, h0)

print(out.shape, h1.shape)

torch.Size([12, 64, 5]) torch.Size([1, 64, 5])


## 3. Long Short Term Memory (LSTM)

![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3b/The_LSTM_cell.png/1200px-The_LSTM_cell.png)

- LSTM is another variant of vanilla RNN that is widely used. Though there exist some differences in structure, when implementing one just need to attend to the cell state (c_t)
  - Inputs to LSTM: (x0, (h0, c0)
    - ```x0```: tensor that contains features of the input sequence
      - shape: ```(seq_len, batch_size, input_size)```
    - ```h0```: tensor that contains initial hidden state
      - shape: ```(num_layers * num_directions, batch_size, hidden_size)```
    - ```c0```: tensor that contains initial cell state
      - shape: ```(num_layers * num_directions, batch_size, hidden_size)``` (same as h0)
  - Outputs to LSTM: (xn, (hn, cn))
    - ```xn```: tensor that contains output features from the last layer
      - shape: ```(seq_len, batch_size, num_directions * hidden_size)```
    - ```hn```: tensor containing the hidden state
      - shape: ```(num_layers * num_directions, batch_size, hidden_size)```
    - ```cn```: tensor containing the cell state
      - shape: ```(num_layers * num_directions, batch_size, hidden_size)```

In [13]:
lstm = nn.LSTM(input_size = 10, 
             hidden_size = 5, 
             num_layers = 1)

In [14]:
## inputs to LSTM
# input data (seq_len, batch_size, input_size)
x0 = torch.from_numpy(np.random.randn(1, 64, 10)).float()     

print(x0.shape)

torch.Size([1, 64, 10])


In [15]:
# outputs from LSTM
# when initial hidden & cell state are not given, they are regarded as zero
xn, (hn, cn) = lstm(x0)

print(xn.shape)               # (seq_len, batch_size, hidden_size)
print(hn.shape, cn.shape)     # (num_layers, batch_size, hidden_size)

torch.Size([1, 64, 5])
torch.Size([1, 64, 5]) torch.Size([1, 64, 5])


In [16]:
# when initial hidden & cell states are given
x0 = torch.from_numpy(np.random.randn(1, 64, 10)).float()     
h0, c0 = torch.from_numpy(np.zeros((1, 64, 5))).float(), torch.from_numpy(np.zeros((1, 64, 5))).float()

xn, (hn, cn) = lstm(x0, (h0, c0))

print(xn.shape)               # (seq_len, batch_size, hidden_size)
print(hn.shape, cn.shape)     # (num_layers, batch_size, hidden_size)

torch.Size([1, 64, 5])
torch.Size([1, 64, 5]) torch.Size([1, 64, 5])


In [17]:
# stacked, bidirectional LSTM
lstm = nn.LSTM(input_size = 10, 
             hidden_size = 5, 
             num_layers = 2,
             bidirectional = True)

In [18]:
# inputs to LSTM
x0 = torch.from_numpy(np.random.randn(5, 64, 10)).float()
h0, c0 = torch.from_numpy(np.zeros((4, 64, 5))).float(), torch.from_numpy(np.zeros((4, 64, 5))).float()

xn, (hn, cn) = lstm(x0, (h0, c0))

print(xn.shape)
print(hn.shape, cn.shape)

torch.Size([5, 64, 10])
torch.Size([4, 64, 5]) torch.Size([4, 64, 5])
