**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a feedforward neural network using PyTorch to predict the species of iris flowers in a multiclass classification problem. The dataset used for this challenge is the Iris dataset, which consists of features like sepal length, sepal width, petal length, and petal width.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


In [1]:
import numpy as np
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim

transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor(), torchvision.transforms.Normalize((0.1307,), (0.3081,))])
trainset=torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
testset=torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 66851867.55it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 77601341.34it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz





Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 28780717.93it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 13003773.90it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






In [2]:
from torch.utils.data import DataLoader

trainloader = DataLoader(trainset, batch_size=32, shuffle=True)
testloader = DataLoader(testset, batch_size=32, shuffle=True)

#Let's see what are the dimensions of our data
for x,y in trainloader:
   print(f"x.shape: {x.shape}")
   print(f"y.shape: {y.shape}")
   break

x.shape: torch.Size([32, 1, 28, 28])
y.shape: torch.Size([32])


In [3]:
class irisNN(nn.Module):
  def __init__(self,n_hidden_layers):
    super().__init__()

    #First thing to do is to flatten the input matrix to a vector
    self.flat=nn.Flatten()

    self.first_layer=nn.Linear(28*28,300)
    self.act1=nn.Sigmoid()

    #the number of hidden layers is chosen by the user
    self.hidden_layers=[]
    self.hidden_acts=[]
    self.last_layer_neurons=300
    for layer in range(n_hidden_layers):
      self.hidden_layers.append(nn.Linear(self.last_layer_neurons,max(10,self.last_layer_neurons//2)))
      self.hidden_acts.append(nn.Sigmoid())
      self.last_layer_neurons=max(10,self.last_layer_neurons//2)

    self.output_layer=nn.Linear(self.last_layer_neurons,10)
    self.output_act=nn.Sigmoid()

  def forward(self,x):
    x=self.flat(x)
    x=self.first_layer(x)
    x=self.act1(x)
    for layer in range(len(self.hidden_layers)):
      x=self.hidden_layers[layer](x)
      x=self.hidden_acts[layer](x)
    x=self.output_layer(x)
    return self.output_act(x)

In [4]:
n_hidden_layers=2

model=irisNN(n_hidden_layers)
loss_fn=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(), lr=0.001)

In [72]:
n_epochs=20

for epoch in range(n_epochs):
  losses = []
  for inputs, labels in trainloader:
    y_pred = model(inputs)
    loss = loss_fn(y_pred, labels)
    losses.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

  print(f'Epoch {epoch + 1} --> loss = {np.mean(losses)}')

  #To compute model's accuracy
  acc = 0
  count = 0
  for inputs, labels in testloader:
    y_pred = model(inputs)
    acc += (torch.argmax(y_pred, 1) == labels).float().sum()
    count += len(labels)
  acc /= count
  print("\t\tmodel accuracy = %.2f%%" % (acc*100))

Epoch 1 --> loss = 1.9784926114400228
		model accuracy = 87.86%
Epoch 2 --> loss = 1.9409148801167806
		model accuracy = 83.34%
Epoch 3 --> loss = 1.9077621259053548
		model accuracy = 89.04%
Epoch 4 --> loss = 1.878575371805827
		model accuracy = 89.51%
Epoch 5 --> loss = 1.8524909580230713
		model accuracy = 89.18%
Epoch 6 --> loss = 1.828866345278422
		model accuracy = 89.29%
Epoch 7 --> loss = 1.8074119473775228
		model accuracy = 89.59%
Epoch 8 --> loss = 1.7879344846089682
		model accuracy = 90.13%
Epoch 9 --> loss = 1.770194287745158
		model accuracy = 90.46%
Epoch 10 --> loss = 1.7539946964263915
		model accuracy = 90.70%
Epoch 11 --> loss = 1.7392739086786906
		model accuracy = 90.96%
Epoch 12 --> loss = 1.7258897915522258
		model accuracy = 90.66%
Epoch 13 --> loss = 1.7135099859873453
		model accuracy = 90.84%
Epoch 14 --> loss = 1.70234063650767
		model accuracy = 91.15%
Epoch 15 --> loss = 1.6920346621195475
		model accuracy = 90.25%
Epoch 16 --> loss = 1.6827329060236613


In [8]:
#I used optuna for hyperparameters' optimization
!pip install optuna

Collecting optuna
  Downloading optuna-3.5.0-py3-none-any.whl (413 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m413.4/413.4 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.13.0-py3-none-any.whl (230 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m230.6/230.6 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting colorlog (from optuna)
  Downloading colorlog-6.8.0-py3-none-any.whl (11 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.0-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.6/78.6 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: Mako, colorlog, alembic, optuna
Successfully installed Mako-1.3.0 alembic-1.13.0 colorlog-6.8.0 optuna-3.5.0


In [76]:
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error

#I define the objective function to minimize (the Mean Squared Error)
def objective(trial):

  #these are the hyperparameters to test
  hidden_layers = int(trial.suggest_float("hidden_layers",0,3))
  learning_rate = trial.suggest_float("learning_rate",1e-5,1e-1)
  n_epochs=int(trial.suggest_float("n_epochs",1,30))

  model=irisNN(hidden_layers)

  for epoch in range(n_epochs):
    running_loss = 0.0

    for inputs, labels in trainloader:
        outputs = model(inputs)
        loss = loss_fn(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    mse = running_loss / len(trainloader)
    trial.report(mse, epoch+1)
    if trial.should_prune():
      raise optuna.TrialPruned()
  return mse

In [77]:
import optuna

#I make a random search of hyperparameters
study = optuna.create_study(study_name="IrisNN Optimization")
study.optimize(objective, n_trials=10)

#I cast float values of hyperparameters to integers when necessary (e.g. with number of layers)
res=study.best_params.copy()
for k, v in res.items():
    if k!='learning_rate':
      res[k]=int(round(v))

print("Best Params : {}".format(res))
print("\nBest error : {}".format(study.best_value))

[I 2023-12-19 18:29:01,780] A new study created in memory with name: IrisNN Optimization
[I 2023-12-19 18:35:37,025] Trial 0 finished with value: 2.305823643112183 and parameters: {'hidden_layers': 1.1729636855241017, 'learning_rate': 0.07661609597441187, 'n_epochs': 19.20327941177493}. Best is trial 0 with value: 2.305823643112183.
[I 2023-12-19 18:43:56,378] Trial 1 finished with value: 2.3062245498657226 and parameters: {'hidden_layers': 2.8325814152067097, 'learning_rate': 0.002305211690795002, 'n_epochs': 24.57496808639848}. Best is trial 0 with value: 2.305823643112183.
[I 2023-12-19 18:49:45,818] Trial 2 finished with value: 2.3053088305155436 and parameters: {'hidden_layers': 0.9069737194274639, 'learning_rate': 0.09644093962046002, 'n_epochs': 18.86840098944614}. Best is trial 2 with value: 2.3053088305155436.
[I 2023-12-19 18:58:56,421] Trial 3 finished with value: 2.304901186879476 and parameters: {'hidden_layers': 0.5863202361351276, 'learning_rate': 0.06602236925473373, 'n

Best Params : {'hidden_layers': 3, 'learning_rate': 0.02255578191301981, 'n_epochs': 20}

Best error : 2.3037887411753335


In [78]:
#Now I plot some interesting informations about the optimization, starting from the optimization history

from optuna.visualization import plot_contour, plot_edf, plot_optimization_history,\
  plot_parallel_coordinate, plot_param_importances, plot_slice

plot_optimization_history(study)

In [79]:
#contour plot of the study's parameter relationships
plot_contour(study)

In [80]:
#The hyperparameters' importances
plot_param_importances(study)

In [81]:
#Finally, the emprical distribution function plot
plot_edf(study)