# Wprowadzenie do sieci neuronowych i uczenia maszynowego - Sieci Rekurencyjne


---

**Prowadzący:** Piotr Baryczkowski, Jakub Bednarek<br>
**Kontakt:** piotr.baryczkowski@put.poznan.pl<br>

---

Zadania wypełnione przez \\
Wojciech Kot 151879 \\
Julia Samp 151775 \\

---

## Cel ćwiczeń:
- zapoznanie się z rekurencyjnymi sieciami neuronowymi,
- stworzenie modelu sieci z warstwami rekurencyjnymi dla zbioru danych MNIST,
- stworzenie własnych implementacji warstwami neuronowych

In [1]:
import numpy as np
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

PyTorch version: 2.5.1+cu121
CUDA available: False


In [2]:
import torch.nn as nn
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [3]:
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=32, shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:00<00:00, 52.5MB/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 2.01MB/s]

Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz





Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 14.5MB/s]


Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 3.60MB/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw






## Sieci rekurencyjne
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

https://pytorch.org/docs/stable/generated/torch.nn.RNN.html

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

Przykładowy model z warstwą rekurencyjną dla danych MNIST:

In [4]:
class RecurrentModel(nn.Module):
    def __init__(self, num_classes=10):
        super(RecurrentModel, self).__init__()
        self.num_classes = num_classes
        # Define your layers here.
        self.lstm_1 = nn.LSTM(input_size=28, hidden_size=128, batch_first=True)
        self.relu_1 = nn.ReLU()
        self.dense_1 = nn.Linear(128, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        lstm_out, _ = self.lstm_1(inputs)
        # Take the last output from the sequence (assume inputs are padded appropriately or have consistent lengths)
        x = lstm_out[:, -1, :]  # Get the output of the last time step
        x = self.relu_1(x)
        x = self.dense_1(x)
        return self.softmax(x)

model = RecurrentModel(num_classes=10)
model

RecurrentModel(
  (lstm_1): LSTM(28, 128, batch_first=True)
  (relu_1): ReLU()
  (dense_1): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [5]:
learning_rate = 1e-3
batch_size = 32
epochs = 5

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [6]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.302975  [   32/60000]
loss: 2.174590  [ 3232/60000]
loss: 1.936720  [ 6432/60000]
loss: 1.930533  [ 9632/60000]
loss: 1.624184  [12832/60000]
loss: 1.725207  [16032/60000]
loss: 1.613878  [19232/60000]
loss: 1.673230  [22432/60000]
loss: 1.705237  [25632/60000]
loss: 1.677566  [28832/60000]
loss: 1.567096  [32032/60000]
loss: 1.705653  [35232/60000]
loss: 1.812220  [38432/60000]
loss: 1.567553  [41632/60000]
loss: 1.550892  [44832/60000]
loss: 1.571253  [48032/60000]
loss: 1.505547  [51232/60000]
loss: 1.604810  [54432/60000]
loss: 1.539813  [57632/60000]
Test Error: 
 Accuracy: 92.5%, Avg loss: 1.539122 

Epoch 2
-------------------------------
loss: 1.528945  [   32/60000]
loss: 1.549627  [ 3232/60000]
loss: 1.549197  [ 6432/60000]
loss: 1.556629  [ 9632/60000]
loss: 1.600389  [12832/60000]
loss: 1.572724  [16032/60000]
loss: 1.493337  [19232/60000]
loss: 1.523466  [22432/60000]
loss: 1.553421  [25632/60000]
loss: 1.576199  [28832/60000

### Zadanie 1
Rozszerz model z powyższego przykładu o kolejną warstwę rekurencyjną przed gęstą warstwą wyjściową.

Standardowe sieci neuronowe generują jeden wynik na podstawie jednego inputu.
Natomiast sieci rekurencyjne przetwarzają dane sekwencyjnie, w każdym kroku łącząc wynik poprzedniego przetwarzania i aktualnego wejścia. Dlatego domyślnym wejściem sieci neuronowej jest tensor 3-wymiarowy ([batch_size,sequence_size,sample_size]).
Domyślnie warstwy rekurencyjne w PyTorchu zwracają sekwencje wyników wszystkich kroków przetwarzania dla warstwy rekurencyjnej. Jeśli chcesz zwrócić tylko wyniki ostatniego przetwarzania dla warstwy rekurencyjnej, musisz samemu to zaimplementować np. `x = lstm_out[:, -1, :]`.


In [7]:
class RecurrentModel2(nn.Module):
    def __init__(self, num_classes=10):
        super(RecurrentModel2, self).__init__()
        self.num_classes = num_classes
        # Define your layers here.
        self.lstm_1 = nn.LSTM(input_size=28, hidden_size=128, batch_first=True)
        self.relu_1 = nn.ReLU()
        self.lstm_2 = nn.LSTM(input_size=128, hidden_size=128, batch_first=True)
        self.relu_2 = nn.ReLU()
        self.dense_1 = nn.Linear(128, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        # TODO forward
        lstm_out_1, _ = self.lstm_1(inputs)
        lstm_out_2, _ = self.lstm_2(lstm_out_1)
        x = lstm_out_2[:, -1, :]  # Get the output of the last time step
        x = self.relu_1(x)
        x = self.dense_1(x)
        return self.softmax(x)

model = RecurrentModel2(num_classes=10)
model

RecurrentModel2(
  (lstm_1): LSTM(28, 128, batch_first=True)
  (relu_1): ReLU()
  (lstm_2): LSTM(128, 128, batch_first=True)
  (relu_2): ReLU()
  (dense_1): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [8]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.302902  [   32/60000]
loss: 1.948869  [ 3232/60000]
loss: 1.778781  [ 6432/60000]
loss: 1.735029  [ 9632/60000]
loss: 1.681669  [12832/60000]
loss: 1.619261  [16032/60000]
loss: 1.733459  [19232/60000]
loss: 1.657458  [22432/60000]
loss: 1.647344  [25632/60000]
loss: 1.555269  [28832/60000]
loss: 1.509418  [32032/60000]
loss: 1.603885  [35232/60000]
loss: 1.540100  [38432/60000]
loss: 1.625782  [41632/60000]
loss: 1.525551  [44832/60000]
loss: 1.577795  [48032/60000]
loss: 1.514037  [51232/60000]
loss: 1.561574  [54432/60000]
loss: 1.508332  [57632/60000]
Test Error: 
 Accuracy: 94.3%, Avg loss: 1.519068 

Epoch 2
-------------------------------
loss: 1.536653  [   32/60000]
loss: 1.462890  [ 3232/60000]
loss: 1.493048  [ 6432/60000]
loss: 1.498760  [ 9632/60000]
loss: 1.493448  [12832/60000]
loss: 1.534859  [16032/60000]
loss: 1.585794  [19232/60000]
loss: 1.501599  [22432/60000]
loss: 1.494671  [25632/60000]
loss: 1.492572  [28832/60000

### Zadanie 2
Wykorzystując model z przykładu, napisz sieć rekurencyjną przy użyciu RNNCell.

RNNCell implementuje tylko operacje wykonywane przez warstwę
rekurencyjną dla jednego kroku. Warstwy rekurencyjne w każdym kroku
łączą wynik operacji poprzedniego kroku i aktualny input.
Wykorzystaj pętle for do wielokrotnego wywołania komórki RNNCell (liczba kroków to liczba elementów w sekwencji).

Wywołanie zainicjalizowanej komórki rekurencyjnej wymaga podania aktualnego inputu i listy stanów ukrytych poprzedniego kroku (RNNCell ma jeden stan).

Trzeba zainicjalizować ukryty stan warstwy z wartościami początkowymi (można wykorzystać zmienne losowe - torch.rand).

In [9]:
import torch
import torch.nn as nn

class RecurrentModel3(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel3, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes

        # Define the RNN cell
        self.rnn_cell = nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
        self.dense = nn.Linear(hidden_size, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        batch_size, sequence_length, input_size = inputs.size()
        # TODO forward
        hidden_state = torch.rand(batch_size, self.hidden_size, device=inputs.device)

        for t in range(sequence_length):
            current_input = inputs[:, t, :]
            hidden_state = self.rnn_cell(current_input, hidden_state)

        output = self.dense(hidden_state)
        return self.softmax(output)

model = RecurrentModel3(input_size=28, hidden_size=128, num_classes=10)
model

RecurrentModel3(
  (rnn_cell): RNNCell(28, 128)
  (dense): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [10]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.303405  [   32/60000]
loss: 2.042854  [ 3232/60000]
loss: 2.041989  [ 6432/60000]
loss: 1.976282  [ 9632/60000]
loss: 1.920557  [12832/60000]
loss: 1.778276  [16032/60000]
loss: 1.990311  [19232/60000]
loss: 1.817736  [22432/60000]
loss: 1.771817  [25632/60000]
loss: 1.879548  [28832/60000]
loss: 1.831595  [32032/60000]
loss: 1.800080  [35232/60000]
loss: 1.669052  [38432/60000]
loss: 1.848550  [41632/60000]
loss: 1.908566  [44832/60000]
loss: 1.893922  [48032/60000]
loss: 1.835011  [51232/60000]
loss: 1.735883  [54432/60000]
loss: 1.749113  [57632/60000]
Test Error: 
 Accuracy: 73.9%, Avg loss: 1.730358 

Epoch 2
-------------------------------
loss: 1.716027  [   32/60000]
loss: 1.752628  [ 3232/60000]
loss: 1.782749  [ 6432/60000]
loss: 1.804466  [ 9632/60000]
loss: 1.789232  [12832/60000]
loss: 1.805423  [16032/60000]
loss: 1.717397  [19232/60000]
loss: 1.720725  [22432/60000]
loss: 1.557810  [25632/60000]
loss: 1.620712  [28832/60000

### Zadanie 3
Zamień komórkę rekurencyjną z poprzedniego zadania na LSTMCell.

In [11]:
import torch
import torch.nn as nn

class RecurrentModel4(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel4, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes

        # Define the RNN -> LSTM cell
        self.rnn_cell = nn.LSTMCell(input_size=input_size, hidden_size=hidden_size)
        self.dense = nn.Linear(hidden_size, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        batch_size, sequence_length, input_size = inputs.size()

        # Initialize hidden state and cell state with random values
        hidden_state = torch.rand(batch_size, self.hidden_size, device=inputs.device)
        cell_state = torch.rand(batch_size, self.hidden_size, device=inputs.device)

        for t in range(sequence_length):
            current_input = inputs[:, t, :]
            hidden_state, cell_state = self.rnn_cell(current_input, (hidden_state, cell_state))

        output = self.dense(hidden_state)
        return self.softmax(output)

model = RecurrentModel4(input_size=28, hidden_size=128, num_classes=10)
model

RecurrentModel4(
  (rnn_cell): LSTMCell(28, 128)
  (dense): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [12]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.302319  [   32/60000]
loss: 2.031943  [ 3232/60000]
loss: 1.802835  [ 6432/60000]
loss: 1.580184  [ 9632/60000]
loss: 1.676719  [12832/60000]
loss: 1.620912  [16032/60000]
loss: 1.589483  [19232/60000]
loss: 1.635175  [22432/60000]
loss: 1.609765  [25632/60000]
loss: 1.545296  [28832/60000]
loss: 1.610026  [32032/60000]
loss: 1.499369  [35232/60000]
loss: 1.542769  [38432/60000]
loss: 1.540112  [41632/60000]
loss: 1.532936  [44832/60000]
loss: 1.547606  [48032/60000]
loss: 1.559061  [51232/60000]
loss: 1.496145  [54432/60000]
loss: 1.562528  [57632/60000]
Test Error: 
 Accuracy: 94.3%, Avg loss: 1.519935 

Epoch 2
-------------------------------
loss: 1.553204  [   32/60000]
loss: 1.547702  [ 3232/60000]
loss: 1.593671  [ 6432/60000]
loss: 1.500484  [ 9632/60000]
loss: 1.517636  [12832/60000]
loss: 1.557179  [16032/60000]
loss: 1.499529  [19232/60000]
loss: 1.578616  [22432/60000]
loss: 1.512796  [25632/60000]
loss: 1.463974  [28832/60000

### Zadanie 4
Wykorzystując model z poprzedniego zadania, stwórz model sieci
neuronowej z własną implementacją prostej warstwy rekurencyjnej.
- w call zamień self.lstm_cell_layer(x) na wyołanie własnej metody np. self.cell(x)
- w konstruktorze modelu usuń inicjalizację komórki LSTM i zastąp ją inicjalizacją warstw potrzebnych do stworzenia własnej komórki rekurencyjnej,
- stwórz metodę cell() wykonującą operacje warstwy rekurencyjnej,
- prosta warstwa rekurencyjna konkatenuje poprzedni wyniki i aktualny input, a następnie przepuszcza ten połączony tensor przez warstwę gęstą (Dense).

In [13]:
class RecurrentModel5(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel5, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes

        self.input_dense = nn.Linear(input_size + hidden_size, hidden_size)
        self.activation = nn.Tanh()
        self.output_dense = nn.Linear(hidden_size, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def cell(self, current_input, hidden_state):
        combined = torch.cat((current_input, hidden_state), dim=1)
        new_hidden_state = self.activation(self.input_dense(combined))
        return new_hidden_state

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        batch_size, sequence_length, input_size = inputs.size()
        # TODO forward
        hidden_state = torch.rand(batch_size, self.hidden_size, device=inputs.device)

        for t in range(sequence_length):
            current_input = inputs[:, t, :]
            hidden_state = self.cell(current_input, hidden_state)

        output = self.output_dense(hidden_state)
        return self.softmax(output)

model = RecurrentModel5(input_size=28, hidden_size=128, num_classes=10)
model

RecurrentModel5(
  (input_dense): Linear(in_features=156, out_features=128, bias=True)
  (activation): Tanh()
  (output_dense): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [14]:
epochs = 5

loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.302666  [   32/60000]
loss: 2.281349  [ 3232/60000]
loss: 2.288476  [ 6432/60000]
loss: 2.324862  [ 9632/60000]
loss: 2.086435  [12832/60000]
loss: 2.066480  [16032/60000]
loss: 2.033692  [19232/60000]
loss: 1.926751  [22432/60000]
loss: 2.014091  [25632/60000]
loss: 1.939826  [28832/60000]
loss: 1.939030  [32032/60000]
loss: 2.028589  [35232/60000]
loss: 1.956050  [38432/60000]
loss: 2.121966  [41632/60000]
loss: 1.825640  [44832/60000]
loss: 1.834677  [48032/60000]
loss: 1.880324  [51232/60000]
loss: 1.808635  [54432/60000]
loss: 2.132646  [57632/60000]
Test Error: 
 Accuracy: 62.2%, Avg loss: 1.842621 

Epoch 2
-------------------------------
loss: 1.765394  [   32/60000]
loss: 1.929018  [ 3232/60000]
loss: 1.702716  [ 6432/60000]
loss: 1.862526  [ 9632/60000]
loss: 1.918984  [12832/60000]
loss: 1.894815  [16032/60000]
loss: 1.813027  [19232/60000]
loss: 1.844542  [22432/60000]
loss: 1.745414  [25632/60000]
loss: 1.901624  [28832/60000

### Zadanie 5

Na podstawie modelu z poprzedniego zadania stwórz model z własną implementacją warstwy LSTM. Dokładny i zrozumiały opis działania wartswy LSTM znajduje się na [stronie](http://colah.github.io/posts/2015-08-Understanding-LSTMs/).

In [21]:
from torch.nn.modules.activation import Sigmoid, Tanh

class RecurrentModel6(nn.Module):
    def __init__(self, input_size=28, hidden_size=128, num_classes=10):
        super(RecurrentModel6, self).__init__()
        self.hidden_size = hidden_size
        self.num_classes = num_classes

        # Define LSTM layers
        self.input_dense = nn.Linear(input_size + hidden_size, hidden_size)
        self.activation = nn.Tanh()
        self.output_dense = nn.Linear(hidden_size, num_classes)
        self.softmax = nn.Softmax(dim=1)

    def cell(self, current_input, hidden_state):
        combined = torch.cat((current_input, hidden_state), dim=1)
        new_hidden_state = self.activation(self.input_dense(combined))
        return new_hidden_state

    def forward(self, inputs):
        if inputs.dim() == 4:
            # Example: (batch_size, channels, sequence_length, features)
            inputs = inputs.squeeze(1)  # Remove the channels dimension if it's 1
        elif inputs.dim() != 3:
            raise ValueError(f"Expected input to be 3D, got {inputs.dim()}D instead.")

        batch_size, sequence_length, input_size = inputs.size()
        # TODO forward
        hidden_state = torch.rand(batch_size, self.hidden_size, device=inputs.device)

        for t in range(sequence_length):
            current_input = inputs[:, t, :]
            hidden_state = self.cell(current_input, hidden_state)

        output = self.output_dense(hidden_state)
        return self.softmax(output)

model = RecurrentModel6(input_size=28, hidden_size=128, num_classes=10)
model

RecurrentModel6(
  (input_dense): Linear(in_features=156, out_features=128, bias=True)
  (activation): Tanh()
  (output_dense): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

In [23]:
epochs = 5
learning_rate = 0.001
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.034476  [   32/60000]
loss: 2.127038  [ 3232/60000]
loss: 1.972545  [ 6432/60000]
loss: 2.008631  [ 9632/60000]
loss: 1.939014  [12832/60000]
loss: 1.806337  [16032/60000]
loss: 1.938730  [19232/60000]
loss: 1.752054  [22432/60000]
loss: 1.537331  [25632/60000]
loss: 1.775865  [28832/60000]
loss: 1.776768  [32032/60000]
loss: 1.783341  [35232/60000]
loss: 1.834061  [38432/60000]
loss: 1.649384  [41632/60000]
loss: 1.864385  [44832/60000]
loss: 1.718671  [48032/60000]
loss: 1.667906  [51232/60000]
loss: 2.151249  [54432/60000]
loss: 1.727632  [57632/60000]
Test Error: 
 Accuracy: 76.4%, Avg loss: 1.702773 

Epoch 2
-------------------------------
loss: 1.835730  [   32/60000]
loss: 1.745147  [ 3232/60000]
loss: 1.605351  [ 6432/60000]
loss: 1.805675  [ 9632/60000]
loss: 1.707496  [12832/60000]
loss: 1.668104  [16032/60000]
loss: 1.727616  [19232/60000]
loss: 1.745713  [22432/60000]
loss: 1.739569  [25632/60000]
loss: 1.718007  [28832/60000