# Wprowadzenie do sieci neuronowych i uczenia maszynowego - Sieci Rekurencyjne


---

**Prowadzący:** Piotr Baryczkowski, Jakub Bednarek<br>
**Kontakt:** piotr.baryczkowski@put.poznan.pl<br>

---

## Cel ćwiczeń:
- zapoznanie się z rekurencyjnymi sieciami neuronowymi,
- stworzenie modelu sieci z warstwami rekurencyjnymi dla zbioru danych MNIST,
- stworzenie własnych implementacji warstwami neuronowymi

In [1]:
import numpy as np
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

PyTorch version: 2.5.1+cu121
CUDA available: True


In [2]:
import torch.nn as nn
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [3]:
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=32, shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:00<00:00, 12.0MB/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 348kB/s]


Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 3.21MB/s]


Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 4.37MB/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw






## Sieci rekurencyjne
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

https://pytorch.org/docs/stable/generated/torch.nn.RNN.html

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

Przykładowy model z warstwą rekurencyjną dla danych MNIST:

In [4]:
class RecurrentModel(nn.Module):
    def __init__(self, num_classes=10):
        super(RecurrentModel, self).__init__()
        self.num_classes = num_classes
        # Define your layers here.
        self.lstm_1 = nn.LSTM(input_size=28, hidden_size=128, batch_first=True)
        self.relu_1 = nn.ReLU()
        self.dense_1 = nn.Linear(128, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        lstm_out, _ = self.lstm_1(inputs)
        # Take the last output from the sequence (assume inputs are padded appropriately or have consistent lengths)
        x = lstm_out[:, -1, :]  # Get the output of the last time step
        x = self.relu_1(x)
        x = self.dense_1(x)
        return self.softmax(x)

model = RecurrentModel(num_classes=10)
model

RecurrentModel(
  (lstm_1): LSTM(28, 128, batch_first=True)
  (relu_1): ReLU()
  (dense_1): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [5]:
learning_rate = 1e-3
batch_size = 32
epochs = 5

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [6]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.304932  [   32/60000]
loss: 1.896042  [ 3232/60000]
loss: 2.089072  [ 6432/60000]
loss: 1.877084  [ 9632/60000]
loss: 1.762566  [12832/60000]
loss: 1.632664  [16032/60000]
loss: 1.704073  [19232/60000]
loss: 1.753190  [22432/60000]
loss: 1.852072  [25632/60000]
loss: 1.625864  [28832/60000]
loss: 1.623577  [32032/60000]
loss: 1.636896  [35232/60000]
loss: 1.689724  [38432/60000]
loss: 1.597557  [41632/60000]
loss: 1.639373  [44832/60000]
loss: 1.581354  [48032/60000]
loss: 1.562089  [51232/60000]
loss: 1.594318  [54432/60000]
loss: 1.553415  [57632/60000]
Test Error: 
 Accuracy: 93.1%, Avg loss: 1.532177 

Epoch 2
-------------------------------
loss: 1.519638  [   32/60000]
loss: 1.627392  [ 3232/60000]
loss: 1.531672  [ 6432/60000]
loss: 1.642433  [ 9632/60000]
loss: 1.581258  [12832/60000]
loss: 1.466676  [16032/60000]
loss: 1.540027  [19232/60000]
loss: 1.546480  [22432/60000]
loss: 1.491116  [25632/60000]
loss: 1.526742  [28832/60000

### Zadanie 1
Rozszerz model z powyższego przykładu o kolejną warstwę rekurencyjną przed gęstą warstwą wyjściową.

Standardowe sieci neuronowe generują jeden wynik na podstawie jednego inputu.
Natomiast sieci rekurencyjne przetwarzają dane sekwencyjnie, w każdym kroku łącząc wynik poprzedniego przetwarzania i aktualnego wejścia. Dlatego domyślnym wejściem sieci neuronowej jest tensor 3-wymiarowy ([batch_size,sequence_size,sample_size]).
Domyślnie warstwy rekurencyjne w PyTorchu zwracają sekwencje wyników wszystkich kroków przetwarzania dla warstwy rekurencyjnej. Jeśli chcesz zwrócić tylko wyniki ostatniego przetwarzania dla warstwy rekurencyjnej, musisz samemu to zaimplementować np. `x = lstm_out[:, -1, :]`.


In [7]:
class RecurrentModel2(nn.Module):
    def __init__(self, num_classes=10):
        super(RecurrentModel2, self).__init__()
        self.num_classes = num_classes
        self.lstm1 = nn.LSTM(input_size=28, hidden_size=128, batch_first=True)
        self.relu = nn.ReLU()
        self.lstm2 = nn.LSTM(input_size=128, hidden_size=128, batch_first=True)
        self.dense = nn.Linear(in_features=128, out_features=num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        lstm1_out, _ = self.lstm1(inputs)
        x = self.relu(lstm1_out)
        # Second LSTM layer
        lstm2_out, _ = self.lstm2(x)
        x = lstm2_out[:,-1,:]
        x = self.relu(x)
        out = self.dense(x)
        out = self.softmax(out)

        return out

model = RecurrentModel2(num_classes=10)
model

RecurrentModel2(
  (lstm1): LSTM(28, 128, batch_first=True)
  (relu): ReLU()
  (lstm2): LSTM(128, 128, batch_first=True)
  (dense): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [8]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.301903  [   32/60000]
loss: 2.141454  [ 3232/60000]
loss: 1.767656  [ 6432/60000]
loss: 1.733031  [ 9632/60000]
loss: 1.748597  [12832/60000]
loss: 1.753097  [16032/60000]
loss: 1.683070  [19232/60000]
loss: 1.606607  [22432/60000]
loss: 1.686070  [25632/60000]
loss: 1.560992  [28832/60000]
loss: 1.533781  [32032/60000]
loss: 1.641106  [35232/60000]
loss: 1.576625  [38432/60000]
loss: 1.631911  [41632/60000]
loss: 1.580217  [44832/60000]
loss: 1.525512  [48032/60000]
loss: 1.514617  [51232/60000]
loss: 1.496857  [54432/60000]
loss: 1.563026  [57632/60000]
Test Error: 
 Accuracy: 90.4%, Avg loss: 1.558206 

Epoch 2
-------------------------------
loss: 1.538977  [   32/60000]
loss: 1.597736  [ 3232/60000]
loss: 1.506887  [ 6432/60000]
loss: 1.526548  [ 9632/60000]
loss: 1.497256  [12832/60000]
loss: 1.495361  [16032/60000]
loss: 1.587017  [19232/60000]
loss: 1.493157  [22432/60000]
loss: 1.703901  [25632/60000]
loss: 1.461652  [28832/60000

### Zadanie 2
Wykorzystując model z przykładu, napisz sieć rekurencyjną przy użyciu RNNCell.

RNNCell implementuje tylko operacje wykonywane przez warstwę
rekurencyjną dla jednego kroku. Warstwy rekurencyjne w każdym kroku
łączą wynik operacji poprzedniego kroku i aktualny input.
Wykorzystaj pętle for do wielokrotnego wywołania komórki RNNCell (liczba kroków to liczba elementów w sekwencji).

Wywołanie zainicjalizowanej komórki rekurencyjnej wymaga podania aktualnego inputu i listy stanów ukrytych poprzedniego kroku (RNNCell ma jeden stan).

Trzeba zainicjalizować ukryty stan warstwy z wartościami początkowymi (można wykorzystać zmienne losowe - torch.rand).

In [9]:
import torch
import torch.nn as nn

class RecurrentModel3(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel3, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes
        self.rnnCell = nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
        self.dense = nn.Linear(in_features=hidden_size, out_features=num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        batch_size, sequence_length, features = inputs.size()
        hidden_state = torch.zeros(batch_size, self.hidden_size).to(inputs.device)
        for t in range(sequence_length):
            input_t = inputs[:,t,:]
            hidden_state = self.rnnCell(input_t, hidden_state)
        out = self.dense(hidden_state)
        out = self.softmax(out)
        return out

model = RecurrentModel3(input_size=28, hidden_size=128, num_classes=10)


In [10]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.303452  [   32/60000]
loss: 2.322311  [ 3232/60000]
loss: 2.197619  [ 6432/60000]
loss: 2.147569  [ 9632/60000]
loss: 2.077339  [12832/60000]
loss: 2.183924  [16032/60000]
loss: 2.231068  [19232/60000]
loss: 1.962229  [22432/60000]
loss: 2.066641  [25632/60000]
loss: 2.072934  [28832/60000]
loss: 1.846014  [32032/60000]
loss: 1.976134  [35232/60000]
loss: 2.195160  [38432/60000]
loss: 1.955630  [41632/60000]
loss: 1.879447  [44832/60000]
loss: 1.852863  [48032/60000]
loss: 1.834392  [51232/60000]
loss: 1.888616  [54432/60000]
loss: 1.896469  [57632/60000]
Test Error: 
 Accuracy: 67.8%, Avg loss: 1.794687 

Epoch 2
-------------------------------
loss: 1.858550  [   32/60000]
loss: 1.907042  [ 3232/60000]
loss: 1.783405  [ 6432/60000]
loss: 1.925390  [ 9632/60000]
loss: 1.669199  [12832/60000]
loss: 1.876804  [16032/60000]
loss: 1.813879  [19232/60000]
loss: 1.793387  [22432/60000]
loss: 1.710211  [25632/60000]
loss: 1.740881  [28832/60000

### Zadanie 3
Zamień komórkę rekurencyjną z poprzedniego zadania na LSTMCell.

In [11]:
class RecurrentModel4(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel4, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes
        self.LSTMcell = nn.LSTMCell(input_size=input_size, hidden_size=hidden_size)
        self.dense = nn.Linear(in_features=hidden_size, out_features=num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        batch_size, sequence_length, features = inputs.size()
        hidden_state = torch.zeros(batch_size, self.hidden_size).to(inputs.device)
        cell_state = torch.zeros(batch_size, self.hidden_size).to(inputs.device)
        for t in range (sequence_length):
            input_t = inputs[:,t,:]
            hidden_state, cell_state = self.LSTMcell(input_t, (hidden_state, cell_state))
        out = self.dense(hidden_state)
        out = self.softmax(out)
        return out

model = RecurrentModel4(input_size=28, hidden_size=128, num_classes=10)
model

RecurrentModel4(
  (LSTMcell): LSTMCell(28, 128)
  (dense): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [12]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.302838  [   32/60000]
loss: 2.058228  [ 3232/60000]
loss: 2.024737  [ 6432/60000]
loss: 1.752483  [ 9632/60000]
loss: 1.808285  [12832/60000]
loss: 1.799852  [16032/60000]
loss: 1.911342  [19232/60000]
loss: 1.803631  [22432/60000]
loss: 1.687700  [25632/60000]
loss: 1.601555  [28832/60000]
loss: 1.585752  [32032/60000]
loss: 1.610985  [35232/60000]
loss: 1.564435  [38432/60000]
loss: 1.526325  [41632/60000]
loss: 1.579266  [44832/60000]
loss: 1.584216  [48032/60000]
loss: 1.647131  [51232/60000]
loss: 1.633472  [54432/60000]
loss: 1.585946  [57632/60000]
Test Error: 
 Accuracy: 90.1%, Avg loss: 1.564147 

Epoch 2
-------------------------------
loss: 1.582101  [   32/60000]
loss: 1.588389  [ 3232/60000]
loss: 1.579099  [ 6432/60000]
loss: 1.584901  [ 9632/60000]
loss: 1.495583  [12832/60000]
loss: 1.504992  [16032/60000]
loss: 1.622121  [19232/60000]
loss: 1.527628  [22432/60000]
loss: 1.532826  [25632/60000]
loss: 1.622876  [28832/60000

### Zadanie 4
Wykorzystując model z poprzedniego zadania, stwórz model sieci
neuronowej z własną implementacją prostej warstwy rekurencyjnej.
- w call zamień self.lstm_cell_layer(x) na wyołanie własnej metody np. self.cell(x)
- w konstruktorze modelu usuń inicjalizację komórki LSTM i zastąp ją inicjalizacją warstw potrzebnych do stworzenia własnej komórki rekurencyjnej,
- stwórz metodę cell() wykonującą operacje warstwy rekurencyjnej,
- prosta warstwa rekurencyjna konkatenuje poprzedni wyniki i aktualny input, a następnie przepuszcza ten połączony tensor przez warstwę gęstą (Dense).

In [13]:
class RecurrentModel5(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel5, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes
        self.denseRNN = nn.Linear(in_features=input_size+hidden_size, out_features=hidden_size)
        self.tanh = nn.Tanh()
        self.dense = nn.Linear(in_features=hidden_size, out_features=num_classes)
        self.softmax = nn.Softmax(dim=1)

    def cell(self, x, h):
        new_hidden = torch.cat((x, h), dim=1)
        new_hidden = self.denseRNN(new_hidden)
        return self.tanh(new_hidden)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        batch_size, sequence_length, features = inputs.size()
        hidden_state = torch.zeros(batch_size, self.hidden_size).to(inputs.device)
        for t in range (sequence_length):
            input_t = inputs[:,t,:]
            hidden_state = self.cell(input_t, hidden_state)
        out = self.dense(hidden_state)
        out = self.softmax(out)

        return out

model = RecurrentModel5(input_size=28, hidden_size=128, num_classes=10)
model

RecurrentModel5(
  (denseRNN): Linear(in_features=156, out_features=128, bias=True)
  (tanh): Tanh()
  (dense): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [14]:
epochs = 5

loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.304146  [   32/60000]
loss: 2.319217  [ 3232/60000]
loss: 2.163495  [ 6432/60000]
loss: 2.123345  [ 9632/60000]
loss: 2.363585  [12832/60000]
loss: 2.241209  [16032/60000]
loss: 1.876307  [19232/60000]
loss: 1.942649  [22432/60000]
loss: 2.064069  [25632/60000]
loss: 1.777883  [28832/60000]
loss: 1.900605  [32032/60000]
loss: 1.866296  [35232/60000]
loss: 1.959191  [38432/60000]
loss: 1.941128  [41632/60000]
loss: 1.787942  [44832/60000]
loss: 1.752817  [48032/60000]
loss: 1.860800  [51232/60000]
loss: 1.884494  [54432/60000]
loss: 1.719516  [57632/60000]
Test Error: 
 Accuracy: 70.7%, Avg loss: 1.759172 

Epoch 2
-------------------------------
loss: 1.785138  [   32/60000]
loss: 1.867904  [ 3232/60000]
loss: 1.742783  [ 6432/60000]
loss: 1.811731  [ 9632/60000]
loss: 1.676606  [12832/60000]
loss: 1.848999  [16032/60000]
loss: 1.826753  [19232/60000]
loss: 1.670986  [22432/60000]
loss: 1.914139  [25632/60000]
loss: 1.731224  [28832/60000

### Zadanie 5

Na podstawie modelu z poprzedniego zadania stwórz model z własną implementacją warstwy LSTM. Dokładny i zrozumiały opis działania wartswy LSTM znajduje się na [stronie](http://colah.github.io/posts/2015-08-Understanding-LSTMs/).

In [15]:
from torch.nn.modules.activation import Sigmoid, Tanh

class RecurrentModel6(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel6, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes
        self.dense_forget = nn.Linear(in_features=input_size+hidden_size, out_features=hidden_size)
        self.dense_input = nn.Linear(in_features=input_size+hidden_size, out_features=hidden_size)
        self.dense_cell = nn.Linear(in_features=input_size+hidden_size, out_features=hidden_size)
        self.dense_output = nn.Linear(in_features=input_size+hidden_size, out_features=hidden_size)
        self.sigmoid = nn.Sigmoid()
        self.tanh = nn.Tanh()
        self.dense = nn.Linear(in_features=hidden_size, out_features=num_classes)
        self.softmax = nn.Softmax(dim=1)

        # Define LSTM layers

    def cell(self, x, h, c):
        combined = torch.cat((x,h), dim=1)
        forget = self.sigmoid(self.dense_forget(combined))
        input = self.sigmoid(self.dense_input(combined))
        cell = self.tanh(self.dense_cell(combined))
        output = self.sigmoid(self.dense_forget(combined))

        new_c = forget*c + input*cell
        new_h = output*self.tanh(new_c)

        return new_h, new_c

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        batch_size, sequence_length, features = inputs.size()
        hidden_state = torch.zeros(batch_size, self.hidden_size).to(inputs.device)
        cell_state = torch.zeros(batch_size, self.hidden_size).to(inputs.device)
        for t in range (sequence_length):
            input_t = inputs[:,t,:]
            hidden_state, cell_state = self.cell(input_t, hidden_state, cell_state)
        out = self.dense(hidden_state)
        out = self.softmax(out)

        return out

model = RecurrentModel6(input_size=28, hidden_size=128, num_classes=10)
model

RecurrentModel6(
  (dense_forget): Linear(in_features=156, out_features=128, bias=True)
  (dense_input): Linear(in_features=156, out_features=128, bias=True)
  (dense_cell): Linear(in_features=156, out_features=128, bias=True)
  (dense_output): Linear(in_features=156, out_features=128, bias=True)
  (sigmoid): Sigmoid()
  (tanh): Tanh()
  (dense): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [16]:
epochs = 5
learning_rate = 0.001
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.301377  [   32/60000]
loss: 2.128654  [ 3232/60000]
loss: 1.822023  [ 6432/60000]
loss: 1.907557  [ 9632/60000]
loss: 1.669791  [12832/60000]
loss: 1.715496  [16032/60000]
loss: 1.748798  [19232/60000]
loss: 1.639746  [22432/60000]
loss: 1.660211  [25632/60000]
loss: 1.702534  [28832/60000]
loss: 1.548938  [32032/60000]
loss: 1.636398  [35232/60000]
loss: 1.660517  [38432/60000]
loss: 1.523199  [41632/60000]
loss: 1.640519  [44832/60000]
loss: 1.561658  [48032/60000]
loss: 1.627221  [51232/60000]
loss: 1.538973  [54432/60000]
loss: 1.491424  [57632/60000]
Test Error: 
 Accuracy: 91.9%, Avg loss: 1.544109 

Epoch 2
-------------------------------
loss: 1.623171  [   32/60000]
loss: 1.503903  [ 3232/60000]
loss: 1.583715  [ 6432/60000]
loss: 1.621796  [ 9632/60000]
loss: 1.474361  [12832/60000]
loss: 1.561415  [16032/60000]
loss: 1.492489  [19232/60000]
loss: 1.525066  [22432/60000]
loss: 1.553975  [25632/60000]
loss: 1.528589  [28832/60000