**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a neural network using PyTorch to predict handwritten digits of MNIST.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


In [3]:
# insert code here
import torch.nn as nn
import torch.optim as optim
import torchvision
import torch

In [4]:
def load_data():
    transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

    # get the training and test set of MNIST
    trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
    testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
    return trainset, testset

# get the training and test data
trainset, testset = load_data()

batch_size = 32
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=True)

In [5]:
class MnistModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1 = nn.Conv2d(1, 16, kernel_size=(5,5), stride=1, padding=0)
    self.act1 = nn.ReLU()
    self.pool1 = nn.MaxPool2d(kernel_size=(2, 2))
    self.drop1 = nn.Dropout(0.3)

    self.conv2 = nn.Conv2d(16, 32, kernel_size=(5,5), stride=1, padding=0)
    self.act2 = nn.ReLU()
    self.pool2 = nn.MaxPool2d(kernel_size=(2, 2))
    self.drop2 = nn.Dropout(0.3)


    self.flat = nn.Flatten()

    self.fc3 = nn.Linear(512, 128)
    self.act3 = nn.ReLU()
    self.drop3 = nn.Dropout(0.5)

    self.fc4 = nn.Linear(128, 10)

  def forward(self, x):
    # input 1x28x28, output 16x24x24
    x = self.act1(self.conv1(x))
    # input 16x24x24 output 16x12x12
    x = self.pool1(x)
    x = self.drop1(x)
    # input 16x12x12, output 32x8x8
    x = self.act2(self.conv2(x))
    # input 32x8x8, output 32x4x4
    x = self.pool2(x)
    x = self.drop2(x)
    # input 32x4x4, output 512
    x = self.flat(x)
    # input 512, output 128
    x = self.act3(self.fc3(x))
    x = self.drop3(x)
    # input 128, output 10
    x = self.fc4(x)
    return x

In [9]:
import numpy as np

model = MnistModel()
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
torch.manual_seed(42)

n_epochs = 30
for epoch in range(n_epochs):
  losses = []
  for inputs, labels in trainloader:
    # forward, backward, and then weight update
    y_pred = model(inputs)
    loss = loss_fn(y_pred, labels)
    losses.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

  print(f'Epoch {epoch + 1} --> loss = {np.mean(losses)}')

  acc = 0
  count = 0
  for inputs, labels in testloader:
    y_pred = model(inputs)
    acc += (torch.argmax(y_pred, 1) == labels).float().sum()
    count += len(labels)
  acc /= count
  print("Epoch %d: model accuracy %.2f%%" % (epoch+1, acc*100))

torch.save(model.state_dict(), "Mnistmodel.pth")

Epoch 1 --> loss = 1.1014271545171737
Epoch 1: model accuracy 86.70%
Epoch 2 --> loss = 0.3414322280963262
Epoch 2: model accuracy 91.98%
Epoch 3 --> loss = 0.24576735883752504
Epoch 3: model accuracy 93.67%
Epoch 4 --> loss = 0.20111183825234571
Epoch 4: model accuracy 94.96%
Epoch 5 --> loss = 0.17453749780307212
Epoch 5: model accuracy 95.36%
Epoch 6 --> loss = 0.15684192607154449
Epoch 6: model accuracy 95.99%
Epoch 7 --> loss = 0.1434239830593268
Epoch 7: model accuracy 96.05%
Epoch 8 --> loss = 0.13106127521197
Epoch 8: model accuracy 96.38%
Epoch 9 --> loss = 0.1219132648701469
Epoch 9: model accuracy 96.63%
Epoch 10 --> loss = 0.11871466893802086
Epoch 10: model accuracy 96.64%
Epoch 11 --> loss = 0.11139947862563034
Epoch 11: model accuracy 97.22%
Epoch 12 --> loss = 0.10683415232989937
Epoch 12: model accuracy 97.20%
Epoch 13 --> loss = 0.10037438258503874
Epoch 13: model accuracy 97.07%
Epoch 14 --> loss = 0.09608800989923377
Epoch 14: model accuracy 97.06%
Epoch 15 --> loss

In [10]:
# I tried to use Ray Tune for hyperparameter optimization as suggested by pytorch team on the website: https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html
# but unfortunately didn't manage to use it correctly and ran into the error which I couldn't overcome
import os
import tempfile
from ray import train, tune
from ray.tune.schedulers import ASHAScheduler

def train_mnist(config):
    model = MnistModel()

    loss_fn = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=float(config["lr"]), momentum=0.9)

    checkpoint = train.get_checkpoint()

    if checkpoint:
        with checkpoint.as_directory() as checkpoint_dir:
            checkpoint_dict = torch.load(os.path.join(checkpoint_dir, "checkpoint.pt"))
        start_epoch = checkpoint_dict["epoch"]
        model.load_state_dict(checkpoint_dict["model_state_dict"])
        optimizer.load_state_dict(checkpoint_dict["optimizer_state_dict"])
    else:
        start_epoch = 0

    trainset, testset = load_data()

    trainloader = torch.utils.data.DataLoader(
        trainset, batch_size=int(config["batch_size"]), shuffle=True)
    testloader = torch.utils.data.DataLoader(
        testset, batch_size=int(config["batch_size"]), shuffle=True)
    
    for epoch in range(start_epoch, 30):
        losses = []
        for inputs, labels in trainloader:
            # forward, backward, and then weight update
            y_pred = model(inputs)
            loss = loss_fn(y_pred, labels)
            losses.append(loss.item())
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        print(f'Epoch {epoch + 1} --> loss = {np.mean(losses)}')

        acc = 0
        count = 0
        for inputs, labels in testloader:
            y_pred = model(inputs)
            acc += (torch.argmax(y_pred, 1) == labels).float().sum()
            count += len(labels)
        acc /= count
        print("Epoch %d: model accuracy %.2f%%" % (epoch+1, acc*100))
        metrics = {"loss": np.mean(losses), "accuracy": acc*100}
        with tempfile.TemporaryDirectory() as tempdir:
            torch.save(
                {"epoch": epoch,
                "model_state_dict": model.state_dict(),
                "optimizer_state_dict": optimizer.state_dict()},
                os.path.join(tempdir, "checkpoint.pt"),
            )
            train.report(
                metrics=metrics,
                checkpoint=train.Checkpoint.from_directory(tempdir)
            )
    print("Finished Training")


In [11]:
config = {
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([2, 4, 8, 16, 32])
}

tuner = tune.Tuner(train_mnist, param_space={"lr": tune.loguniform(1e-4, 1e-1), "batch_size": tune.choice([2, 4, 8, 16, 32])})
result_grid = tuner.fit()

0,1
Current time:,2023-12-21 14:40:40
Running for:,00:00:47.10
Memory:,9.0/15.3 GiB

Trial name,# failures,error file
train_mnist_6bda9_00000,1,"/home/szymon/ray_results/train_mnist_2023-12-21_14-39-53/train_mnist_6bda9_00000_0_batch_size=8,lr=0.0008_2023-12-21_14-39-53/error.txt"

Trial name,status,loc,batch_size,lr
train_mnist_6bda9_00000,ERROR,192.168.169.223:11756,8,0.000804061


[36m(train_mnist pid=11756)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
[36m(train_mnist pid=11756)[0m Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]
  1%|          | 65536/9912422 [00:00<00:32, 299521.32it/s]
  2%|▏         | 163840/9912422 [00:00<00:21, 460706.38it/s]
  2%|▏         | 229376/9912422 [00:00<00:28, 340699.02it/s]
  3%|▎         | 327680/9912422 [00:00<00:20, 460045.41it/s]
  4%|▍         | 393216/9912422 [00:00<00:24, 393187.99it/s]
  5%|▍         | 491520/9912422 [00:01<00:20, 462899.35it/s]
  6%|▋         | 622592/9912422 [00:01<00:14, 633448.29it/s]
  8%|▊         | 753664/9912422 [00:01<00:12, 761883.09it/s]
  9%|▉         | 884736/9912422 [00:01<00:10, 840199.39it/s]
 11%|█         | 1048576/9912422 [00:01<00:08, 988017.27it/s]
 12%|█▏        | 1179648/9912422 [00:02<00:14, 605906.03it/s]
 13%|█▎        | 1277952/9912422 [00:02<00:13, 663819.73it/s]
 14%|█▍        | 1376256/9912422 [00:02<00:12, 693449.10it/s]
 15%|█▌        | 1507328/9912422 [00:02<00:12, 699402.60it/s]
 18%|█▊        | 1736704/9912422 [00:02<00:08, 1015882.78it/s]
 19%|█▉        | 1867776/9912422 [0

[36m(train_mnist pid=11756)[0m Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw
[36m(train_mnist pid=11756)[0m 
[36m(train_mnist pid=11756)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
[36m(train_mnist pid=11756)[0m Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]
100%|██████████| 28881/28881 [00:00<00:00, 372763.06it/s]


[36m(train_mnist pid=11756)[0m Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw
[36m(train_mnist pid=11756)[0m 
[36m(train_mnist pid=11756)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
[36m(train_mnist pid=11756)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]
  2%|▏         | 32768/1648877 [00:00<00:05, 310995.00it/s]
  4%|▍         | 65536/1648877 [00:00<00:06, 228264.19it/s]
 10%|▉         | 163840/1648877 [00:00<00:03, 438226.73it/s]
 14%|█▍        | 229376/1648877 [00:00<00:02, 504727.24it/s]
 18%|█▊        | 294912/1648877 [00:00<00:04, 300121.57it/s]
 24%|██▍       | 393216/1648877 [00:01<00:03, 400206.33it/s]
 28%|██▊       | 458752/1648877 [00:01<00:02, 405807.26it/s]
 32%|███▏      | 524288/1648877 [00:01<00:02, 449043.83it/s]
 38%|███▊      | 622592/1648877 [00:01<00:01, 539444.92it/s]
 46%|████▌     | 753664/1648877 [00:01<00:01, 680788.84it/s]
 58%|█████▊    | 950272/1648877 [00:01<00:00, 946009.60it/s]
 72%|███████▏  | 1179648/1648877 [00:01<00:00, 1242782.26it/s]
 83%|████████▎ | 1376256/1648877 [00:02<00:00, 930608.69it/s] 
100%|██████████| 1648877/1648877 [00:02<00:00, 740925.36it/s] 


[36m(train_mnist pid=11756)[0m Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw
[36m(train_mnist pid=11756)[0m 
[36m(train_mnist pid=11756)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
[36m(train_mnist pid=11756)[0m Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 50666299.91it/s]


[36m(train_mnist pid=11756)[0m Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw
[36m(train_mnist pid=11756)[0m 
[36m(train_mnist pid=11756)[0m Epoch 1 --> loss = 0.5652521514050352


2023-12-21 14:40:40,584	ERROR tune_controller.py:1383 -- Trial task failed for trial train_mnist_6bda9_00000
Traceback (most recent call last):
  File "/home/szymon/.local/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "/home/szymon/.local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/szymon/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/szymon/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 2563, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): [36mray::ImplicitFunc.train()[39m (pid=11756, ip=192.168.169.223, actor_id=adbf5b7ed6bd2b1201df18e401000000, repr=train_mnist)
  File "/home/szymon/.local/lib/python3.10/site-packages/ray/tune/trainable/trainable.py",

[36m(train_mnist pid=11756)[0m Epoch 1: model accuracy 93.57%


2023-12-21 14:40:40,591	ERROR tune.py:1043 -- Trials did not complete: [train_mnist_6bda9_00000]
2023-12-21 14:40:40,592	INFO tune.py:1047 -- Total run time: 47.12 seconds (47.10 seconds for the tuning loop).


In [8]:
# Therefore I implemented a simple search for the best learning rate and batch size
import numpy as np


def train_mnist(n_epochs, lr, batch_size):
    model = MnistModel()

    loss_fn = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
    torch.manual_seed(42)


    trainset, testset = load_data()

    trainloader = torch.utils.data.DataLoader(
        trainset, batch_size=batch_size, shuffle=True)
    testloader = torch.utils.data.DataLoader(
        testset, batch_size=batch_size, shuffle=True)
    
    for epoch in range(n_epochs):
        losses = []
        for inputs, labels in trainloader:
            # forward, backward, and then weight update
            y_pred = model(inputs)
            loss = loss_fn(y_pred, labels)
            losses.append(loss.item())
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        print(f'Epoch {epoch + 1} --> loss = {np.mean(losses)}')

        acc = 0
        count = 0
        for inputs, labels in testloader:
            y_pred = model(inputs)
            acc += (torch.argmax(y_pred, 1) == labels).float().sum()
            count += len(labels)
        acc /= count
        print("Epoch %d: model accuracy %.2f%%" % (epoch+1, acc*100))
    print("Finished Training")
    return {"lr": lr, "batch_size": batch_size, "accuracy": acc*100}

In [9]:
# I chose a lower number of epochs due to the limited resources and time for training
n_epochs = 15
results = []
for learning_rate in [0.001, 0.005, 0.01]:
    for batch_size in [32, 16]:
        results.append(train_mnist(n_epochs=n_epochs, lr=learning_rate, batch_size=batch_size))

Epoch 1 --> loss = 1.0345506687482198
Epoch 1: model accuracy 88.56%
Epoch 2 --> loss = 0.3172469078083833
Epoch 2: model accuracy 92.52%
Epoch 3 --> loss = 0.23005055907169977
Epoch 3: model accuracy 93.75%
Epoch 4 --> loss = 0.19536977801024913
Epoch 4: model accuracy 94.82%
Epoch 5 --> loss = 0.1703737028516829
Epoch 5: model accuracy 95.44%
Epoch 6 --> loss = 0.1544893087312579
Epoch 6: model accuracy 95.52%
Epoch 7 --> loss = 0.1433250789185365
Epoch 7: model accuracy 96.02%
Epoch 8 --> loss = 0.1314623482055962
Epoch 8: model accuracy 96.26%
Epoch 9 --> loss = 0.12311351753920317
Epoch 9: model accuracy 96.62%
Epoch 10 --> loss = 0.11994809426863988
Epoch 10: model accuracy 96.68%
Epoch 11 --> loss = 0.11243394967572143
Epoch 11: model accuracy 96.84%
Epoch 12 --> loss = 0.10849212315777938
Epoch 12: model accuracy 97.15%
Epoch 13 --> loss = 0.1035498722984145
Epoch 13: model accuracy 97.05%
Epoch 14 --> loss = 0.0984101459559674
Epoch 14: model accuracy 97.04%
Epoch 15 --> loss 

In [10]:
for result in results:
    print(f'Hyperparamters: learning_rate = {result["lr"]}, batch_size = {result["batch_size"]}')
    print(f'Accuracy: {result["accuracy"]}')

Hyperparamters: learning_rate = 0.001, batch_size = 32
Accuracy: 97.3499984741211
Hyperparamters: learning_rate = 0.001, batch_size = 16
Accuracy: 97.8699951171875
Hyperparamters: learning_rate = 0.005, batch_size = 32
Accuracy: 98.29999542236328
Hyperparamters: learning_rate = 0.005, batch_size = 16
Accuracy: 98.18999481201172
Hyperparamters: learning_rate = 0.01, batch_size = 32
Accuracy: 98.07999420166016
Hyperparamters: learning_rate = 0.01, batch_size = 16
Accuracy: 97.75999450683594


As we can see the models are quite similar in terms of accuracy but the best-performing one has the hyperparameters: learning_rate = 0.005 and batch_size = 32
