### __Inside a Single RNN Layer (Vanilla RNN)__

A single RNN layer maintains temporal memory by combining the current input and the previous hidden state using three shared weight matrices, producing both a new hidden state and an output at every time step.

RNNs machinery is a bit more complex. Inside a single Recurrent Neural Network layer we have `3 weight matrices` as well as `2 input tensors` and `2 output tensors`.



At each time step t, a basic RNN cell computes ht(hidden state output): 

$$
h_t = \phi\left(W_{xh} x_t + W_{hh} h_{t-1} + b_h\right)
$$

$$
y_t = W_{hy} h_t + b_y
$$


![](https://i.sstatic.net/xFs0V.jpg)

- ϕ is a non-linear activation function such as tanh or ReLU
- xt is the input at time step t
- ht−1 is the previous hidden state

__Weight Matrices (3 Total)__


1. Input → Hidden
   
$$
W_{xh} \in \mathbb{R}^{d_{\text{hidden}} \times d_{\text{input}}}
$$

2. Hidden → Hidden (Recurrent Weight)

$$
W_{hh} \in \mathbb{R}^{d_{\text{hidden}} \times d_{\text{hidden}}}
$$


3. Hidden → Output

$$
W_{hy} \in \mathbb{R}^{d_{\text{output}} \times d_{\text{hidden}}}
$$

These weight matrices are shared across all time steps.

__Input Tensors (2)__

At time step t:

1. Current input
2. Previous hidden state


__Output Tensors (2)__

1. Current hidden state
2. Current output

![](https://contenthub-static.grammarly.com/blog/wp-content/uploads/2024/09/158474-6180-Blog-Visuals_-RNNs_V1.png)

Recurrent Nets introduce a new concept called “hidden state”, which is simply another input based on previous layer outputs. But wait, if this is based on previous layer outputs, how do I get it for the first run? Simple, just start it with zeros.

RNNs are fed in a different way than feedforward networks. Because we are working with sequences, the order that we input the data matters, this is why each time we feed the net, we have to input a single item in the sequence. for example if it’s a stock price, we input the stock price for each day. If it’s a text we enter a single letter/word each time.

__Character RNN__

In [16]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader 
import numpy as np
import random


print(torch.__version__)

2.8.0


In [25]:
device = (
    'mps' if torch.mps.is_available()
    else 'cuda' if torch.cuda.is_available()
    else 'cpu'
)

device

'mps'

In [27]:
set('I am Pawan')

{' ', 'I', 'P', 'a', 'm', 'n', 'w'}

In [30]:
class TextDataset(Dataset):

    def _init__(self, text_data: str, seq_length:int = 25) -> None:

        self.chars = sorted(list(set(text_data)))
        self.data_size, self.vocab_size = len(text_data), len(self.chars)
            
        self.inx_to_char = {i:ch for i, ch in enumerate(self.chars)}
        self.char_to_idx = {ch:i for i, ch in enumerate(self.chars)}
        self.seq_length = seq_length
        self.X = self.string_to_vector(text_data)

    @property
    def X_string(self) -> str:
        return self.vector_to_string(self.X)

    def _len__(self) -> int: 
        return int(len(self.X)/self.seq_length -1)

    def __getitem__(self, index) -> tuple[torch.tensor, torch.tensor]:
        start_idx = index * self.seq_length
        end_idx = (index + 1) * self.seq_length

        X = torch.tensor(self.X[start_idx:end_idx]).float()
        y = torch.tensor(self.y[start_idx+1:end_idx+1]).float()
        return X, y

    def string_to_vector(self, name:str) -> list[int]:
        vector = list()
        for s in name:
            vector.append(self.char_to_idx[s])
        return vector

    def vector_to_string(self, vector: list[int]) -> str:
        vector_string =  ''
        for i in vector:
            vector_string += self.inx_to_char[i]
        return vector_string

In [32]:
# RNN Class 

class RNN(nn.Module):

    def __init__(self, 
                 input_size: int,
                 hidden_size: int,
                 output_size: int) -> None:
        super().__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.batch_size  = batch_size

        self.i2h = nn.Linear(input_size, hidden_size, bias=False) 
        self.h2h = nn.Linear(hidden_size, hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)


    def forward(self, X, hidden_state) -> tuple[torch.tensor, torch.tensor]:

        X = self.i2h(X)
        hidden_state = self.h2h(hidden_state)
        hidden_state = torch.tanh(X + hidden_state)
        return self.h2o(hidden_state), hidden_state

    def init_zero_hidden(self, batch_size=1) -> torch.tensor:
        return torch.zeros(batch_size, self.hidden_size, requires_grad = False)

In [None]:
def generate_text(model: RNN, dataset: TextDataset, prediction_lengeth:int = 100) -> str:

    model.eval()
    predicted = dataset.vector_to_string([random.randint(0, len(dataset.chars) -1)])
    hidden = model.init_zero_hidden()

    for i in range(prediction_lengeth -1):
        