# 56: The RNN Class in PyTorch

## ðŸŽ¯ Objective
How does a neural network remember the past? In this notebook, we dissect the **Recurrent Neural Network (RNN)** layer in PyTorch. We will explore its parameters, visualize its weights, and understand the crucial flow of dataâ€”inputs, outputs, and hidden statesâ€”that allows these networks to process sequential information.

## ðŸ“š Key Concepts
* **Recurrent Neural Network (RNN):** A type of neural network designed for sequential data (time series, text) where the output depends on both the current input and the previous hidden state.
* **Hidden State:** The network's internal "memory" that is updated at each time step.
* **Input vs. Hidden Dimensions:** `input_size` is the number of features in the data (e.g., 1 for a single stock price); `hidden_size` is the number of features in the memory (e.g., 64).
* **Sequence Length:** How many time steps are in the input data.
* **Batch First:** PyTorch RNNs default to `(SeqLen, Batch, InputSize)`. We often change this to `(Batch, SeqLen, InputSize)` for convenience.

## 1. Import Libraries

We import PyTorch and NumPy.

In [1]:
### import libraries
import torch
import torch.nn as nn
import numpy as np

## 2. Exploring the RNN Layer

We start by instantiating a raw `nn.RNN` layer to understand its structure.

### Parameters
* **input_size (2):** The number of expected features in the input `x`.
* **hidden_size (5):** The number of features in the hidden state `h`.
* **num_layers (1):** Number of recurrent layers. E.g., setting to 2 would mean stacking two RNNs together.
* **bidirectional (False):** If True, becomes a Bidirectional RNN (processing sequence forwards and backwards).

### Weights
The RNN has two main weight matrices:
1.  **`weight_ih_l0`:** Weights connecting the **Input** to the **Hidden** state.
2.  **`weight_hh_l0`:** Weights connecting the previous **Hidden** state to the current **Hidden** state.

*(Note: `l0` refers to Layer 0)*

In [2]:
# set layer parameters
input_size  =  2 # number of features in the input sequence
hidden_size =  5 # number of features in the hidden state
num_layers  =  1 # number of stacked RNN layers
actfun      = 'tanh' # activation function (tanh or relu)
bias        = True   # whether to include bias terms

# create an RNN instance
rnn = nn.RNN(input_size,hidden_size,num_layers,nonlinearity=actfun,bias=bias)
print(rnn)


# check out the weight sizes
print( rnn.weight_ih_l0.shape )
print( rnn.weight_hh_l0.shape )
print( rnn.bias_ih_l0.shape )
print( rnn.bias_hh_l0.shape )

RNN(2, 5)
torch.Size([5, 2])
torch.Size([5, 5])
torch.Size([5])
torch.Size([5])


## 3. The Forward Pass

Now we pass data through the RNN. 

### Tensor Dimensions
By default, PyTorch RNNs expect inputs in the format: **`(Sequence_Length, Batch_Size, Input_Size)`**.

### Inputs
* **`X`:** The sequence data.
* **`hidden`:** The initial hidden state. If not provided, it defaults to zeros.

### Outputs
* **`output`:** Contains the hidden state for *every* time step in the sequence. Shape: `(SeqLen, Batch, HiddenSize)`.
* **`hidden`:** Contains only the *final* hidden state (after the last time step). Shape: `(NumLayers, Batch, HiddenSize)`.

Notice that `output[-1]` (the last time step) is identical to `hidden` (for a single layer RNN).

In [3]:
# make some data (SeqLen x Batch x InputSize)
seqlength = 5
batchsize = 2
X = torch.rand(seqlength,batchsize,input_size)

# create a hidden layer (typically initialized as zeros)
hidden = torch.zeros(num_layers,batchsize,hidden_size)


# push some data through the model and check the output sizes
y,h = rnn(X,hidden)
print(f' Input shape: {list(X.shape)}')
print(f'Hidden shape: {list(h.shape)}')
print(f'Output shape: {list(y.shape)}')

# Default output is: (SeqLen, Batch, OutputSize)
# but this can be changed using batch_first=True

 Input shape: [5, 2, 2]
Hidden shape: [1, 2, 5]
Output shape: [5, 2, 5]


In [4]:
# check that the last output is the same as the hidden state
# Note: This is only true if num_layers=1
print(y[-1])
print(h)

tensor([[-0.0068, -0.0510, -0.2577,  0.0450,  0.2966],
        [-0.0618,  0.1450, -0.3237,  0.0370,  0.2057]],
       grad_fn=<SelectBackward0>)
tensor([[[-0.0068, -0.0510, -0.2577,  0.0450,  0.2966],
         [-0.0618,  0.1450, -0.3237,  0.0370,  0.2057]]],
       grad_fn=<StackBackward0>)


## 4. Building an RNN Model Class

We now wrap the RNN layer into a proper PyTorch model class. A typical RNN model consists of:
1.  **RNN Layer:** Processes the sequence.
2.  **Linear Layer (Head):** Transforms the RNN's hidden state (which has size `hidden_size`) into the desired prediction size (e.g., 1 value for regression, N classes for classification).

In this example, we apply the linear layer to the output of *every* time step, effectively making a prediction at each point in the sequence (Many-to-Many).

In [5]:
class RNNnet(nn.Module):
  def __init__(self,input_size,num_hidden,num_layers):
    super().__init__()

    # store model parameters
    self.input_size = input_size
    self.num_hidden = num_hidden
    self.num_layers = num_layers

    # RNN Layer
    self.rnn = nn.RNN(input_size,num_hidden,num_layers)

    # linear layer for output
    self.out = nn.Linear(num_hidden,1) # predicting 1 value

  def forward(self,x):

    print(f'Input: {list(x.shape)}')

    # initialize hidden state for first input
    hidden = torch.zeros(self.num_layers,batchsize,self.num_hidden)
    print(f'Hidden: {list(hidden.shape)}')

    # run through the RNN layer
    y,hidden = self.rnn(x,hidden)
    print(f'RNN out: {list(y.shape)}')
    print(f'RNN hid: {list(hidden.shape)}')

    # pass the RNN output through the linear output layer
    o = self.out(y)
    print(f'Output: {list(o.shape)}')

    return o,hidden

## 5. Testing the Model

We create an instance of our class and pass dummy data through it. We also define a loss function (`MSELoss`) to verify that the output shapes are compatible with standard PyTorch loss functions.

In [6]:
# create an instance of the model
input_size  =  3 # e.g., temperature, wind speed, pressure
hidden_size = 16 # internal memory capacity
num_layers  =  1 # number of stacked layers

net = RNNnet(input_size,hidden_size,num_layers)
print(net)

RNNnet(
  (rnn): RNN(3, 16)
  (out): Linear(in_features=16, out_features=1, bias=True)
)


In [8]:
# test the model with some data
# create some data
X = torch.rand(seqlength,batchsize,input_size)
y = torch.rand(seqlength,batchsize,1)
yHat,h = net(X)

# try a loss function
lossfun = nn.MSELoss()
lossfun(yHat,y)

Input: [5, 2, 3]
Hidden: [1, 2, 16]
RNN out: [5, 2, 16]
RNN hid: [1, 2, 16]
Output: [5, 2, 1]


tensor(0.4522, grad_fn=<MseLossBackward0>)