### __Inside a Single RNN Layer (Vanilla RNN)__

A single RNN layer maintains temporal memory by combining the current input and the previous hidden state using three shared weight matrices, producing both a new hidden state and an output at every time step.

RNNs machinery is a bit more complex. Inside a single Recurrent Neural Network layer we have `3 weight matrices` as well as `2 input tensors` and `2 output tensors`.



At each time step t, a basic RNN cell computes ht(hidden state output): 

$$
h_t = \phi\left(W_{xh} x_t + W_{hh} h_{t-1} + b_h\right)
$$

$$
y_t = W_{hy} h_t + b_y
$$


![](https://i.sstatic.net/xFs0V.jpg)

- ϕ is a non-linear activation function such as tanh or ReLU
- xt is the input at time step t
- ht−1 is the previous hidden state

__Weight Matrices (3 Total)__


1. Input → Hidden
   
$$
W_{xh} \in \mathbb{R}^{d_{\text{hidden}} \times d_{\text{input}}}
$$

2. Hidden → Hidden (Recurrent Weight)

$$
W_{hh} \in \mathbb{R}^{d_{\text{hidden}} \times d_{\text{hidden}}}
$$


3. Hidden → Output

$$
W_{hy} \in \mathbb{R}^{d_{\text{output}} \times d_{\text{hidden}}}
$$

These weight matrices are shared across all time steps.

__Input Tensors (2)__

At time step t:

1. Current input
2. Previous hidden state


__Output Tensors (2)__

1. Current hidden state
2. Current output

![](https://contenthub-static.grammarly.com/blog/wp-content/uploads/2024/09/158474-6180-Blog-Visuals_-RNNs_V1.png)

Recurrent Nets introduce a new concept called “hidden state”, which is simply another input based on previous layer outputs. But wait, if this is based on previous layer outputs, how do I get it for the first run? Simple, just start it with zeros.

RNNs are fed in a different way than feedforward networks. Because we are working with sequences, the order that we input the data matters, this is why each time we feed the net, we have to input a single item in the sequence. for example if it’s a stock price, we input the stock price for each day. If it’s a text we enter a single letter/word each time.

### __Character RNN__

A Character-level RNN (Char-RNN) is a recurrent neural network that models text one character at a time instead of word by word.

Suppose your text is:

"hello"


Training pairs look like:

| Input (chars) | Target (next char) |
| ------------- | ------------------ |
| h             | e                  |
| e             | l                  |
| l             | l                  |
| l             | o                  |

At each time step t:


$$
h_t = \phi\left(W_{xh} x_t + W_{hh} h_{t-1} + b_h\right)
$$

$$
y_t = W_{hy} h_t + b_y
$$


- x_t → one-hot or embedding of current character
- h_t → hidden state (memory)
- y_t → logits over all possible characters


Text generation (sampling)

- Start with a seed character (e.g. "h")
- Predict next char probabilities
- Sample one char
- Feed it back as input
- Repeat 

----

In [16]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader 
import numpy as np
import random
import requests 


print(torch.__version__)

2.8.0


In [25]:
device = (
    'mps' if torch.mps.is_available()
    else 'cuda' if torch.cuda.is_available()
    else 'cpu'
)

device

'mps'

In [27]:
set('I am Pawan')

{' ', 'I', 'P', 'a', 'm', 'n', 'w'}

In [106]:
class TextDataset(Dataset):

    def __init__(self, text_data: str, seq_length:int = 25) -> None:

        self.chars = sorted(list(set(text_data)))
        self.data_size, self.vocab_size = len(text_data), len(self.chars)
            
        self.inx_to_char = {i:ch for i, ch in enumerate(self.chars)}
        self.char_to_idx = {ch:i for i, ch in enumerate(self.chars)}
        self.seq_length = seq_length
        self.X = self.string_to_vector(text_data)

    @property
    def X_string(self) -> str:
        return self.vector_to_string(self.X)

    def __len__(self) -> int: 
        return int(len(self.X)/self.seq_length -1)

    def __getitem__(self, index) -> tuple[torch.tensor, torch.tensor]:
        start_idx = index * self.seq_length
        end_idx = (index + 1) * self.seq_length

        X = torch.tensor(self.X[start_idx:end_idx], dtype=torch.long)
        y = torch.tensor(self.X[start_idx+1:end_idx+1], dtype=torch.long)
        
        return X, y

    def string_to_vector(self, name:str) -> list[int]:
        vector = list()
        for s in name:
            vector.append(self.char_to_idx[s])
        return vector

    def vector_to_string(self, vector: list[int]) -> str:
        vector_string =  ''
        for i in vector:
            vector_string += self.inx_to_char[i]
        return vector_string

In [92]:
# RNN Class 

class RNN(nn.Module):

    def __init__(self, 
                 vocab_size: int,
                 hidden_size: int,
                 output_size: int,
                 batch_size: int) -> None:
        super().__init__()
        
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.batch_size  = batch_size

        self.embedding = nn.Embedding(vocab_size, hidden_size)
        # self.i2h = nn.Linear(input_size, hidden_size, bias=False) 
        self.h2h = nn.Linear(hidden_size, hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)


    def forward(self, X, hidden_state) -> tuple[torch.tensor, torch.tensor]:

        X = self.embedding(X)
        hidden_state = self.h2h(hidden_state)
        hidden_state = torch.tanh(X + hidden_state)
        return self.h2o(hidden_state), hidden_state

    def init_zero_hidden(self, batch_size=1) -> torch.tensor:
        return torch.zeros(batch_size, self.hidden_size, requires_grad = False)

In [116]:
def sample_with_temperature(out, temperature=0.7):
    out = out / temperature
    probs = torch.softmax(out, dim=1)
    return torch.multinomial(probs, 1).item()

In [118]:
def generate_text(model: RNN, dataset: TextDataset, prediction_lengeth:int = 100) -> str:

    model.eval()
    predicted = dataset.vector_to_string([random.randint(0, len(dataset.chars) -1)])
    hidden = model.init_zero_hidden()

    for i in range(prediction_lengeth -1):
        last_char = torch.tensor([dataset.char_to_idx[predicted[-1]]])
        X, hidden = last_char.to(device), hidden.to(device)
        out, hidden = model(X, hidden)
        # result = torch.multinomial(nn.functional.softmax(out,1),1).item()
        result = sample_with_temperature(out, temperature=0.7)
        predicted += dataset.inx_to_char[result]

    return predicted

In [112]:
def train(model:RNN, 
          data: DataLoader,
          epochs: int,
          optimizer: optim.Optimizer,
          loss_fn: nn.Module) -> None:

    train_losses = []
    model.to(device)

    model.train()
    for epoch in range(epochs):
        epoch_losses = list()

        for X, y in data:
            if X.shape[0] != model.batch_size:
                continue
            hidden = model.init_zero_hidden(batch_size=model.batch_size)

            X, y, hidden  = X.to(device), y.to(device), hidden.to(device)

            model.zero_grad()

            loss = 0

            for c in range(X.shape[1]):
                # out, hidden  = model(X[:, c].reshape(X.shape[0],1), hidden)
                out, hidden = model(X[:, c], hidden)
                loss += loss_fn(out, y[:,c].long())
                
            loss.backward()

            nn.utils.clip_grad_norm_(model.parameters(), 3)
            optimizer.step()

            epoch_losses.append(loss.detach().item() / X.shape[1])


        train_losses.append(torch.tensor(epoch_losses).mean())

        print(f'epoch : {epoch + 1}, loss {train_losses[epoch]}')
        print(generate_text(model, data.dataset))        

In [46]:
from pathlib import Path

data_path  = Path('datasets/')

with open(data_path / 'dinos.txt', 'wb') as f:
    request = requests.get('https://raw.githubusercontent.com/brunoklein99/deep-learning-notes/refs/heads/master/dinos.txt')
    f.write(request.content)

In [120]:
if __name__ ==  "__main__":

    data = open('datasets/dinos.txt', 'r').read()
    data = data.lower()

    seq_length  = 25
    batch_size = 64
    hidden_size = 256

    text_dataset = TextDataset(data, seq_length=seq_length)
    text_dataloader = DataLoader(text_dataset, batch_size)

    rnnModel = RNN(len(text_dataset.chars), hidden_size, len(text_dataset.chars),batch_size)

    epochs = 100
    loss = nn.CrossEntropyLoss()
    optimizer = optim.RMSprop(rnnModel.parameters(), lr=0.001)

    train(rnnModel, text_dataloader, epochs, optimizer, loss)

epoch : 1, loss 2.292391300201416
hiobot
ronosaurus
lennteosaurus
tinnitorosaurus
sankonocaurus
thinatoruvs
uelopronges
kilisaurus
mni
epoch : 2, loss 1.9377340078353882
ysaurus
churinn
hhanimus
saurososaurus
teratops
torathiwelt
rus
ptyraptor
itarosaurus
telitor
niando
epoch : 3, loss 1.8498417139053345
inasaurus
saondia
tiodan
sixa
locavthn
terhalodos
singinasaurus
tritops
tagoyan
saurus
palichus
qini
epoch : 4, loss 1.8019977807998657
x
kelacosaurus
kasaurs
mauropyptomalonogacomanngor
framtorataptrosaurus
sugisaurus
palianosaurus
she
epoch : 5, loss 1.7568279504776
x
tricomantor
qchuansaurus
meleosaurus
roekanosaurus
gdenanosaurus
terachainos
uracosaurus
telanosau
epoch : 6, loss 1.730265736579895
onyx
ptoria
ranthosaurus
tatipandyts
protor
kutheinasaurus
sigtops
tenatop
liasaurus
lyonosaurus
tom
epoch : 7, loss 1.7053937911987305
x
beneronamores
aurasaurus
cojisaurosaurus
sineteosaurus
richangianosaurus
tiranenalia
cplopongosaur
epoch : 8, loss 1.6854609251022339
yx
auhualosaurus
t

----