<a href="https://colab.research.google.com/github/deepakgarg08/llm-diary/blob/main/llm_chronicles_4_2_recurrent_neural_networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## LLM Chronicles 4.2 - Recurrent Neural Networks

In this notebook we'll take a look at how a recurrent layer of a neural network is implemented. Specifically, we'll focus on the recurrence called *hidden-to-hidden*, as this is most commonly used and it's PyTorch's implementation.


![picture](https://raw.githubusercontent.com/kyuz0/llm-chronicles/main/4.1%20-%20Lab%20-%20Recurrent%20Neural%20Networks/rnn.png)

In [None]:
import torch
import torch.nn as nn
import numpy as np

## 1. PyTorch RNN layer
In this cell, we're creating a simple recurrent layer for a neural network using PyTorch's **nn.RNN** module.

Here's a breakdown of the parameters:

* **input_size = 2**: This specifies that each input sequence element (or timestep) will have a feature size of 2. In other words, the input tensor to this layer will have a shape like (batch_size, sequence_length, 2) where 2 is the feature size for each element in the sequence.

* **hidden_size = 3**: This denotes that the hidden state of the RNN will have a size of 3. The hidden state is the internal state or memory of the RNN, and its size can be thought of as the 'width' or 'capacity' of the RNN's memory at each timestep. When the RNN processes an input, it updates its hidden state based on both the current input and the previous hidden state. This allows the RNN to maintain a form of 'memory' of past inputs in the sequence.

* **batch_first=True**: This argument ensures that the expected input tensor shape is (batch_size, sequence_length, input_size). By default, the RNN module in PyTorch expects inputs to have a shape of (sequence_length, batch_size, input_size), but batch_first=True modifies this expectation to be more in line with what's commonly used in other deep learning frameworks and for easier handling.

In [None]:
rnn_layer = nn.RNN(input_size = 2, hidden_size=3, batch_first=True)
rnn_layer

RNN(2, 3, batch_first=True)

In [None]:
# Let's look at the parameters of this RNN layer
w_ih = rnn_layer.weight_ih_l0
w_hh = rnn_layer.weight_hh_l0
b_ih = rnn_layer.bias_ih_l0
b_hh = rnn_layer.bias_hh_l0

w_ih, b_ih, w_hh, b_hh

(Parameter containing:
 tensor([[ 0.5407, -0.1051],
         [ 0.1025,  0.3401],
         [-0.1288,  0.4972]], requires_grad=True),
 Parameter containing:
 tensor([-0.5645,  0.1087, -0.1245], requires_grad=True),
 Parameter containing:
 tensor([[-0.2017, -0.4285, -0.0833],
         [ 0.4595,  0.3113, -0.0283],
         [-0.5705, -0.0535, -0.1654]], requires_grad=True),
 Parameter containing:
 tensor([ 0.3875,  0.5334, -0.2906], requires_grad=True))

In [None]:
# Let's create a dummy input sequence with 4 time-steps, each having 2 features
input_seq = torch.tensor([[1.0]*2,[2.0]*2,[3.0]*2,[4.0]*2])
input_seq

tensor([[1., 1.],
        [2., 2.],
        [3., 3.],
        [4., 4.]])

In [None]:
input_seq.shape

torch.Size([4, 2])

In [None]:
# we unsqueeze to add a batch dimension
input_batch = input_seq.unsqueeze(0)
input_batch.shape

torch.Size([1, 4, 2])

In [None]:
# Forward pass
output, hh = rnn_layer(input_batch)
output, hh

(tensor([[[ 0.2529,  0.7949, -0.0466],
          [ 0.2971,  0.9556,  0.1416],
          [ 0.5706,  0.9837,  0.4186],
          [ 0.7589,  0.9947,  0.5448]]], grad_fn=<TransposeBackward1>),
 tensor([[[0.7589, 0.9947, 0.5448]]], grad_fn=<StackBackward0>))

## 2. RNN forward pass from scratch

Let's now implement the RNN computations from scratch.

![picture](https://raw.githubusercontent.com/kyuz0/llm-chronicles/main/4.1%20-%20Lab%20-%20Recurrent%20Neural%20Networks/rnn-layer.png)

In [None]:
states = [torch.zeros(3)]                                                         # initialize hidden state to zero

for t in range(0,4):                                                              # iterate through time steps
  print("Time step: ", t)
  xt = input_seq[t]                                                               # extract first element
  ht = torch.matmul(xt, torch.transpose(w_ih, 0, 1)) + b_ih                       # weighted sum of input + bias
  ot = ht + torch.matmul(states[t], torch.transpose(w_hh, 0, 1)) + b_hh           # factor in hidden state from previous step
  ot = torch.tanh(ot)                                                             # final activation
  states.append(ot)
  print("   Weighted sum with input:", ht)
  print("   Output:", ot)

Time step:  0
   Weighted sum with input: tensor([-0.1290,  0.5513,  0.2439], grad_fn=<AddBackward0>)
   Output: tensor([ 0.2529,  0.7949, -0.0466], grad_fn=<TanhBackward0>)
Time step:  1
   Weighted sum with input: tensor([0.3066, 0.9938, 0.6123], grad_fn=<AddBackward0>)
   Output: tensor([0.2971, 0.9556, 0.1416], grad_fn=<TanhBackward0>)
Time step:  2
   Weighted sum with input: tensor([0.7421, 1.4364, 0.9806], grad_fn=<AddBackward0>)
   Output: tensor([0.5706, 0.9837, 0.4186], grad_fn=<TanhBackward0>)
Time step:  3
   Weighted sum with input: tensor([1.1777, 1.8790, 1.3490], grad_fn=<AddBackward0>)
   Output: tensor([0.7589, 0.9947, 0.5448], grad_fn=<TanhBackward0>)


## 3. PyTorch LSTM cell
In this cell, we're constructing a Long Short-Term Memory (LSTM) layer using PyTorch's nn.LSTM module. LSTMs are a special kind of recurrent neural network (RNN) that are particularly effective at learning and remembering over long sequences and are less susceptible to the vanishing gradient problem.

![picture](https://raw.githubusercontent.com/kyuz0/llm-chronicles/main/4.1%20-%20Lab%20-%20Recurrent%20Neural%20Networks/lstm.png)

The forward pass through the LSTM returns two items:

* **output**: This tensor contains the LSTM's output for each timestep in the input sequence for each batch, exactly as a regular RNN layer.

* **(hidden_state, cell_state)**: This is a tuple containing the final hidden and cell states for the LSTM. The hidden state can be thought of as the LSTM's memory of the most recent timestep, while the cell state is a longer-term memory.

In [None]:
lstm_layer = nn.LSTM(input_size=2, hidden_size=3, batch_first=True)

# Forward pass
output, (hidden_state, cell_state) = lstm_layer(input_seq)
output, (hidden_state, cell_state)

(tensor([[ 0.0439, -0.1310,  0.1267],
         [ 0.0249, -0.1867,  0.1401],
         [-0.0658, -0.1846,  0.1176],
         [-0.1995, -0.1537,  0.0968]], grad_fn=<SqueezeBackward1>),
 (tensor([[-0.1995, -0.1537,  0.0968]], grad_fn=<SqueezeBackward1>),
  tensor([[-0.5042, -0.1580,  0.4641]], grad_fn=<SqueezeBackward1>)))