# Understanding Recurrent Neural Network (RNN) Mechanics (Chapter 15 - Initial Focus)

---

This notebook serves as the introduction to **Chapter 15: Processing Sequential Data with Recurrent Neural Networks (RNNs)**. It focuses on breaking down the core mechanics of a simple RNN layer in PyTorch, establishing the foundation for sequence processing tasks like text generation and time series analysis.

### 1. The Core RNN Layer (`nn.RNN`)

* **Initialization:** The notebook defines a single `nn.RNN` layer, configuring essential parameters:
    * **`input_size`:** The dimensionality of each time step's feature vector.
    * **`hidden_size`:** The number of units in the recurrent hidden state, which captures memory of the sequence seen so far.
    * **`batch_first=True`:** A critical setting for PyTorch, indicating the input Tensor shape is **(Batch Size, Sequence Length, Feature Size)** (N, L, H$_{\text{in}}$).

### 2. Manual Inspection of RNN Weights

* **Parameter Extraction:** The notebook extracts and inspects the four main parameter Tensors that define the recurrent computation:
    * **`weight_ih_l0` (`w_xh`):** Weights connecting the **input** ($x_t$) to the **hidden** state.
    * **`bias_ih_l0` (`b_xh`):** Bias connecting the **input** to the **hidden** state.
    * **`weight_hh_l0` (`w_hh`):** Weights connecting the **previous hidden** state ($h_{t-1}$) to the **current hidden** state.
    * **`bias_hh_l0` (`b_hh`):** Bias connecting the **previous hidden** state to the **current hidden** state.

### 3. Demonstrating the Recurrent Operation

The notebook performs a sequence forward pass and then replicates it step-by-step, proving the RNN's internal calculation:

* **Input Tensor:** A sequence tensor $X$ is created in the (N, L, H$_{\text{in}}$) format, where $L$ is the sequence length (number of time steps).
* **RNN Output:** The PyTorch RNN layer is run, generating two outputs:
    1.  **`output`:** The output hidden state for **every time step** in the sequence.
    2.  **`h_n`:** The **final hidden state** after processing the entire sequence.
* **Manual Step-by-Step Calculation:** The core demonstration loops through the time steps ($t=0, 1, 2, \ldots$) and manually calculates the new hidden state ($h_t$) using the standard RNN formula:

$$h_t = \tanh(x_t W_{xh}^T + b_{xh} + h_{t-1} W_{hh}^T + b_{hh})$$ 

* **Verification:** The manual calculation output is printed alongside the official PyTorch RNN output for each time step, confirming that the user understands **how the previous hidden state (memory) is combined with the current input** to produce the new output.

This notebook establishes the critical understanding of **"unrolling"** the RNN over time, a concept essential for all sequential deep learning models like LSTMs and GRUs.

In [1]:
import torch
from torch import nn

In [3]:
torch.manual_seed(28)
rnn_layer = nn.RNN(input_size= 5, hidden_size= 2,
                   num_layers= 1, batch_first= True)

In [5]:
w_xh = rnn_layer.weight_ih_l0
w_hh = rnn_layer.weight_hh_l0
b_xh = rnn_layer.bias_ih_l0
b_hh = rnn_layer.bias_hh_l0

In [6]:
print(f'W_xh shape: {w_xh.shape}')
print(f'W_hh shape: {w_hh.shape}')
print(f'b_xh shape: {b_xh.shape}')
print(f'b_hh shape: {b_hh.shape}')

W_xh shape: torch.Size([2, 5])
W_hh shape: torch.Size([2, 2])
b_xh shape: torch.Size([2])
b_hh shape: torch.Size([2])


In [20]:
x_seq = torch.tensor([[1.0]* 5, [2.0]* 5,[3.0] * 5]).float()
output, hn = rnn_layer(torch.reshape(x_seq, (1, 3, 5)))
out_man = []
for t in range(3):
    xt = torch.reshape(x_seq[t], (1, 5))
    print(f'Time Step {t} =>')
    print(f'    Input           :    {xt.numpy()}')
    
    ht = torch.matmul(xt, torch.transpose(w_xh, 0, 1)) + b_xh
    print(f'    Hidden          :    {ht.detach().numpy()}')
    
    if t > 0:
        h_prev = out_man[t-1]
    else:
        h_prev = torch.zeros((ht.shape))
    ot = ht + torch.matmul(h_prev, torch.transpose(w_hh, 0, 1)) + b_hh
    ot = torch.tanh(ot)
    out_man.append(ot)
    print(f"    Output(Manually):    {ot.detach().numpy()}")
    print(f"    RNN output      :    {output[:, t].detach().numpy()}")

Time Step 0 =>
    Input           :    [[1. 1. 1. 1. 1.]]
    Hidden          :    [[ 1.7883494 -1.2822826]]
    Output(Manually):    [[ 0.9557818 -0.9550318]]
    RNN output      :    [[ 0.9557818 -0.9550318]]
Time Step 1 =>
    Input           :    [[2. 2. 2. 2. 2.]]
    Hidden          :    [[ 3.5776963 -2.2393026]]
    Output(Manually):    [[ 0.9994815  -0.99265563]]
    RNN output      :    [[ 0.9994815  -0.99265563]]
Time Step 2 =>
    Input           :    [[3. 3. 3. 3. 3.]]
    Hidden          :    [[ 5.3670435 -3.1963227]]
    Output(Manually):    [[ 0.9999861 -0.9989048]]
    RNN output      :    [[ 0.9999861 -0.9989048]]


tensor([[1., 1., 1., 1., 1.]])