**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a neural network using PyTorch to predict handwritten digits of MNIST.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


## >>STEP 1<<

In [1]:
#Import libraries
import torch.nn as nn
import torch.optim as optim
import torchvision
import numpy as np
import torch
from torch.utils.data import DataLoader
from torchvision import transforms

In [2]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.8,))])

In [3]:
trainset = torchvision.datasets.MNIST(root='./data', train=True, download= True, transform=transform)
testset = torchvision.datasets.MNIST(root='./data', train=False, download= True, transform=transform)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 121903602.49it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 43985364.50it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 32138983.29it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 2777854.88it/s]


Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



In [4]:
print(f'train {trainset}')
print(f'test {testset}')

train Dataset MNIST
    Number of datapoints: 60000
    Root location: ./data
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.5,), std=(0.8,))
           )
test Dataset MNIST
    Number of datapoints: 10000
    Root location: ./data
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.5,), std=(0.8,))
           )


In [5]:
batch_size = 32

In [6]:
trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True)
testloader = DataLoader(testset, batch_size=batch_size, shuffle=True)

# >>STEP 2<<

In [7]:
class MNISTConvNet(nn.Module):

  def __init__(self, hidden_layers):
    super().__init__()

    #The first layer must be here

    #first layer
    self.f1 = nn.Flatten()
    self.fc1 = nn.Linear(784, 300)
    self.act1 = nn.Sigmoid()

    #hidden layer
    self.fcX = nn.ModuleList([nn.Linear(300,300) for i in range(hidden_layers)])

    self.act2 = nn.Sigmoid()
    #last layer
    self.flat = nn.Flatten()
    self.fc2 = nn.Linear(300, 100)
    self.act3 = nn.Sigmoid()
    self.fc3 = nn.Linear(100,10)

  def forward(self,x):

    #First Layer
    x = self.f1(x)
    x = self.fc1(x)
    x = self.act1(x)

    for i, l in enumerate(self.fcX):
      x = self.fcX[i // 2](x) + l(x)

    x = self.act2(x)
    x = self.flat(x)
    x = self.fc2(x)
    x = self.act3(x)
    x = self.fc3(x)

    return x

In [8]:
#insert the number of hidden layers
hidden_layers = 5

In [9]:
model = MNISTConvNet(hidden_layers)
model

MNISTConvNet(
  (f1): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=784, out_features=300, bias=True)
  (act1): Sigmoid()
  (fcX): ModuleList(
    (0-4): 5 x Linear(in_features=300, out_features=300, bias=True)
  )
  (act2): Sigmoid()
  (flat): Flatten(start_dim=1, end_dim=-1)
  (fc2): Linear(in_features=300, out_features=100, bias=True)
  (act3): Sigmoid()
  (fc3): Linear(in_features=100, out_features=10, bias=True)
)

In [16]:
class MNISTConvNet(nn.Module):

  def __init__(self, hidden_layers):
    super().__init__()

#The first layer must be here

    self.first = nn.Sequential(
        nn.Flatten(),
        nn.Linear(784, 300),
        nn.Sigmoid()
    )

    self.list_modules = []
    for i in range(hidden_layers):
      self.list_modules.append(nn.Sequential(
          nn.Linear(300,300),
          nn.Sigmoid()
      ))

    self.last = nn.Sequential(
        nn.Flatten(),
        nn.Linear(300,100),
        nn.Sigmoid(),
        nn.Linear(100,10)
    )


  def forward(self,x):
    x = self.first(x)
    for i in range(hidden_layers):
      x = self.list_modules[i](x)
    x = self.last(x)

    return x

In [17]:
model = MNISTConvNet(hidden_layers)
model

0
1
2
3
4


MNISTConvNet(
  (first): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=300, bias=True)
    (2): Sigmoid()
  )
  (last): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=300, out_features=100, bias=True)
    (2): Sigmoid()
    (3): Linear(in_features=100, out_features=10, bias=True)
  )
)

As the use of list for nn.Sequential doesn't put into the model the hidden layers, i will use ModuleList instead. I will leave the output of the created model with nn.Sequential to show the difference.

# STEP 3

In [10]:
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9) #SGD = Stochastic gradient descent

# STEP 4 & STEP 5

In [11]:
n_epochs = 10

In [12]:
for epoch in range(n_epochs):
  losses = []
#training
  for inputs, labels in trainloader:

    y_pred = model(inputs)
    loss = loss_fn(y_pred, labels)

    losses.append(loss.item())

    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

  print(f'Epoch {epoch + 1} --> loss = {np.mean(losses)}')

  acc = 0
  count = 0
  for inputs, labels in testloader:

    y_pred = model(inputs)

    acc += (torch.argmax(y_pred,1) == labels).float().sum()
    count += len(labels)

  acc /= count

  print(f'Epoch {epoch + 1} --> model accuracy = {acc * 100}')

Epoch 1 --> loss = 2.303351174799601
Epoch 1 --> model accuracy = 10.279999732971191
Epoch 2 --> loss = 2.3019251496632895
Epoch 2 --> model accuracy = 11.350000381469727
Epoch 3 --> loss = 2.300648187637329
Epoch 3 --> model accuracy = 20.989999771118164
Epoch 4 --> loss = 2.2936515225728353
Epoch 4 --> model accuracy = 21.81999969482422
Epoch 5 --> loss = 2.000452421506246
Epoch 5 --> model accuracy = 29.660001754760742
Epoch 6 --> loss = 1.593509429359436
Epoch 6 --> model accuracy = 49.209999084472656
Epoch 7 --> loss = 1.174327398777008
Epoch 7 --> model accuracy = 61.43000030517578
Epoch 8 --> loss = 0.9982069915771484
Epoch 8 --> model accuracy = 71.17000579833984
Epoch 9 --> loss = 0.8350652021884918
Epoch 9 --> model accuracy = 76.9800033569336
Epoch 10 --> loss = 0.7072107039928436
Epoch 10 --> model accuracy = 81.20999908447266


# STEP 6

In [13]:
!pip install optuna
import optuna

Collecting optuna
  Downloading optuna-3.5.0-py3-none-any.whl (413 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m413.4/413.4 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.13.1-py3-none-any.whl (233 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.4/233.4 kB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting colorlog (from optuna)
  Downloading colorlog-6.8.0-py3-none-any.whl (11 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.0-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.6/78.6 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: Mako, colorlog, alembic, optuna
Successfully installed Mako-1.3.0 alembic-1.13.1 colorlog-6.8.0 optuna-3.5.0


In [14]:
def objective (trial):

  hidden_layers_num = trial.suggest_int('hidden_layers_num', 5, 8)
  number_epochs = trial.suggest_int('number_epochs', 10, 15)

  model = MNISTConvNet(hidden_layers_num)
  optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

  for epochs in range(number_epochs):

    losses = 0

    for input, label in trainloader:
      y_pred = model(input)

      loss = loss_fn(y_pred, label)

      losses += (loss.item())

      optimizer.zero_grad()

      loss.backward()

      optimizer.step()

    res = losses / len(trainloader)
  return res

In [15]:
study = optuna.create_study()
study.optimize(objective, n_trials=10, n_jobs=-1)


[I 2023-12-21 14:32:37,714] A new study created in memory with name: no-name-10fa4310-0238-4798-9a2e-0a844113f089
[I 2023-12-21 14:43:25,015] Trial 0 finished with value: 0.7195630326588949 and parameters: {'hidden_layers_num': 7, 'number_epochs': 10}. Best is trial 0 with value: 0.7195630326588949.
[I 2023-12-21 14:45:15,282] Trial 1 finished with value: 0.6679180945158005 and parameters: {'hidden_layers_num': 8, 'number_epochs': 11}. Best is trial 1 with value: 0.6679180945158005.
[I 2023-12-21 14:54:16,851] Trial 2 finished with value: 0.5590140865246455 and parameters: {'hidden_layers_num': 6, 'number_epochs': 11}. Best is trial 2 with value: 0.5590140865246455.
[I 2023-12-21 15:01:07,707] Trial 3 finished with value: 0.30409573155641556 and parameters: {'hidden_layers_num': 7, 'number_epochs': 15}. Best is trial 3 with value: 0.30409573155641556.
[I 2023-12-21 15:06:11,103] Trial 4 finished with value: 0.4979361178557078 and parameters: {'hidden_layers_num': 6, 'number_epochs': 12

We can notice that the best hyperparameters are:
hidden_layer = 5
number_epochs = 15

In [16]:
print(study.best_params)

{'hidden_layers_num': 7, 'number_epochs': 15}


In [17]:
from optuna.visualization import plot_optimization_history

plot_optimization_history(study)

Now we will use optuna to select the model with the best params.

In [18]:
best_model = MNISTConvNet(7)
best_model

MNISTConvNet(
  (f1): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=784, out_features=300, bias=True)
  (act1): Sigmoid()
  (fcX): ModuleList(
    (0-6): 7 x Linear(in_features=300, out_features=300, bias=True)
  )
  (act2): Sigmoid()
  (flat): Flatten(start_dim=1, end_dim=-1)
  (fc2): Linear(in_features=300, out_features=100, bias=True)
  (act3): Sigmoid()
  (fc3): Linear(in_features=100, out_features=10, bias=True)
)

And train again the model to evaluate with the best params.

In [19]:
n_epochs = 15

loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(best_model.parameters(), lr=0.001, momentum=0.9) #SGD = Stochastic gradient descent

for epoch in range(n_epochs):

  losses = []
  #training
  for inputs, labels in trainloader:

    y_pred = best_model(inputs)
    loss = loss_fn(y_pred, labels)

    losses.append(loss.item())

    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

  print(f'Epoch {epoch + 1} --> loss = {np.mean(losses)}')

  acc = 0
  count = 0
  for inputs, labels in testloader:

    y_pred = best_model(inputs)

    acc += (torch.argmax(y_pred,1) == labels).float().sum()
    count += len(labels)

  acc /= count

  print(f'Epoch {epoch + 1} --> model accuracy = {acc * 100}')

Epoch 1 --> loss = 2.303490075302124
Epoch 1 --> model accuracy = 11.350000381469727
Epoch 2 --> loss = 2.3026676475524903
Epoch 2 --> model accuracy = 11.350000381469727
Epoch 3 --> loss = 2.3017494482676186
Epoch 3 --> model accuracy = 11.350000381469727
Epoch 4 --> loss = 2.29985101331075
Epoch 4 --> model accuracy = 19.799999237060547
Epoch 5 --> loss = 2.194058504041036
Epoch 5 --> model accuracy = 29.96000099182129
Epoch 6 --> loss = 1.6027606295903525
Epoch 6 --> model accuracy = 45.33000183105469
Epoch 7 --> loss = 1.2425674817403158
Epoch 7 --> model accuracy = 57.459999084472656
Epoch 8 --> loss = 1.079195855553945
Epoch 8 --> model accuracy = 66.40999603271484
Epoch 9 --> loss = 0.8911442278067271
Epoch 9 --> model accuracy = 74.16999816894531
Epoch 10 --> loss = 0.7742028901576996
Epoch 10 --> model accuracy = 75.51000213623047
Epoch 11 --> loss = 0.7008344919204712
Epoch 11 --> model accuracy = 78.33999633789062
Epoch 12 --> loss = 0.6165890524148941
Epoch 12 --> model acc

At the end, we can see that with this hyperparameter optimization the accuracy will reach 91.21%!