**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a neural network using PyTorch to predict handwritten digits of MNIST.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


# **1. Data Preparation**

In [None]:
# insert code here
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, SubsetRandomSampler
import numpy as np

# Definisci le trasformazioni (compresa la normalizzazione)
#In questo codice, transforms.Normalize((0.5,), (0.5,)) applica una normalizzazione a ciascuna immagine del dataset.
#Questa trasformazione modifica i valori dei pixel in modo che abbiano una media di 0.5 e una deviazione standard di 0.5.
#È un passaggio comune nella preparazione dei dati, specialmente per i dati delle immagini, per migliorare
#la stabilità e le performance del processo di allenamento della rete neurale.#
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Carica il dataset MNIST
dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

# Calcola gli indici per lo split
dataset_size = len(dataset)
indices = list(range(dataset_size))
split = int(np.floor(0.2 * dataset_size))  # ad esempio, 20% per la validazione
np.random.shuffle(indices)

train_indices, val_indices = indices[split:], indices[:split]

# Crea samplers per il training e la validazione
train_sampler = SubsetRandomSampler(train_indices)
validation_sampler = SubsetRandomSampler(val_indices)

# Crea DataLoader per il training e la validazione
train_loader = DataLoader(dataset, batch_size=64, sampler=train_sampler)
validation_loader = DataLoader(dataset, batch_size=64, sampler=validation_sampler)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 83337598.68it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 67560342.34it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 28223059.52it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 10830317.66it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






# **2. Neural Network Architecture**


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class FlexibleFeedforwardNN(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size):
        super(FlexibleFeedforwardNN, self).__init__()

        # Create a list to hold the layers
        self.layers = nn.ModuleList()

        # Add the first layer (input layer)
        self.layers.append(nn.Linear(input_size, hidden_sizes[0]))

        # Add hidden layers, if more than one size is given
        for i in range(1, len(hidden_sizes)):
            self.layers.append(nn.Linear(hidden_sizes[i - 1], hidden_sizes[i]))

        # Add the output layer
        self.layers.append(nn.Linear(hidden_sizes[-1], output_size))

    def forward(self, x):
        # Flatten the input tensor
        #L'uso di x.view(x.size(0), -1) trasforma quindi il tensore x in una forma in cui ogni elemento del batch è un vettore unidimensionale.
        #Ad esempio, se x è un tensore che rappresenta un batch di immagini MNIST (con dimensione del batch, ad esempio, 64) dove ogni immagine
        #è 28x28 pixel, x avrà inizialmente una forma di [64, 28, 28].
        #Dopo l'applicazione di x.view(x.size(0), -1), la forma di x diventerà [64, 784],
        #dove 784 è il risultato dell'appiattimento delle dimensioni 28x28 delle immagini.
        x = x.view(x.size(0), -1)

        # Pass through each layer except for the last one with activation function
        #x = F.relu(layer(x))
        #In questo ciclo, per ogni strato (tranne l'ultimo), si esegue una trasformazione lineare seguita da una funzione di attivazione ReLu
        #In breve, layer(x) è una trasformazione lineare perché layer è un oggetto nn.Linear,
        # e la chiamata layer(x) invoca il metodo forward di nn.Linear, che implementa la trasformazione lineare.
        for layer in self.layers[:-1]:
            x = F.relu(layer(x))

        # No activation function for the last layer (output layer)
        x = self.layers[-1](x)
        return x


In [None]:
# Number of features for the MNIST dataset
input_size = 784  # 28x28 images
# Example hidden layer sizes
# Questo approccio rende la rete neurale flessibile, poiché puoi facilmente modificare
# il numero e le dimensioni degli strati nascosti modificando semplicemente questa lista.
# Ad esempio, se vuoi una rete con tre strati nascosti di dimensioni 128, 64 e 32,
#potresti impostare hidden_sizes = [128, 64, 32].
hidden_sizes = [128, 64]
# Number of classes in the MNIST dataset
output_size = 10

# Create the neural network
net = FlexibleFeedforwardNN(input_size, hidden_sizes, output_size)


# 3. **Loss Function and Optimizer**

In [None]:
import torch.optim as optim
import torch.nn as nn

In [None]:
# Assume 'net' is the instance of your neural network
# For example: net = FlexibleFeedforwardNN(input_size, hidden_sizes, output_size)

# Define the loss function (Cross-Entropy Loss)
criterion = nn.CrossEntropyLoss()

# Define the optimizer (You can choose between SGD, Adam, etc.)
# Using Adam optimizer
optimizer = optim.Adam(net.parameters(), lr=0.001)

# Or, using SGD optimizer
# optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)


# 4. **Training**

In [None]:
# Assuming the following have been defined:
# net - instance of the neural network
# criterion - loss function
# optimizer - optimization algorithm
# train_loader - DataLoader with training data

num_epochs = 5  # Number of training epochs

for epoch in range(num_epochs):  # loop over the dataset multiple times
    running_loss = 0.0

    for i, data in enumerate(train_loader, 0):
        # get the inputs; data is a tuple of (inputs, labels)
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward pass
        outputs = net(inputs)

        # calculate the loss
        loss = criterion(outputs, labels)

        # backward pass (backpropagation)
        loss.backward()

        # update the weights
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 200 == 199:    # print every 200 mini-batches
            print(f'[{epoch + 1}, {i + 1}] loss: {running_loss / 200:.3f}')
            running_loss = 0.0

print('Finished Training')


[1, 200] loss: 0.735
[1, 400] loss: 0.369
[1, 600] loss: 0.310
[2, 200] loss: 0.230
[2, 400] loss: 0.223
[2, 600] loss: 0.204
[3, 200] loss: 0.161
[3, 400] loss: 0.169
[3, 600] loss: 0.148
[4, 200] loss: 0.130
[4, 400] loss: 0.119
[4, 600] loss: 0.126
[5, 200] loss: 0.104
[5, 400] loss: 0.093
[5, 600] loss: 0.109
Finished Training


# **5. Testing**

In [None]:
# Set the model to evaluation mode
net.eval()

# Initialize the counters
correct = 0
total = 0

# Disable gradient calculations
with torch.no_grad():
    for data in validation_loader:
        images, labels = data

        # Forward pass
        outputs = net(images)

        # Get the predicted class with the highest score
        _, predicted = torch.max(outputs.data, 1)

        # Update the counters
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

# Calculate the accuracy
accuracy = 100 * correct / total
print(f'Accuracy of the model on the test images: {accuracy:.2f}%')


Accuracy of the model on the test images: 96.54%


# **6. Optimization**

In [None]:
import torch.optim as optim

# Define a list of learning rates and epochs to try
learning_rates = [0.001, 0.01, 0.1]
num_epochs_options = [5, 10, 15]

# Store the best accuracy found
best_accuracy = 0
best_lr = None
best_epoch = None

for lr in learning_rates:
    for num_epochs in num_epochs_options:
        # Initialize the neural network
        net = FlexibleFeedforwardNN(input_size, hidden_sizes, output_size)

        # Define the loss function and optimizer
        criterion = nn.CrossEntropyLoss()
        optimizer = optim.Adam(net.parameters(), lr=lr)

        # Training loop

        for epoch in range(num_epochs):  # loop over the dataset multiple times
            running_loss = 0.0

            for i, data in enumerate(train_loader, 0):
                # get the inputs; data is a tuple of (inputs, labels)
                  inputs, labels = data

                # zero the parameter gradients
                  optimizer.zero_grad()

                # forward pass
                  outputs = net(inputs)

                # calculate the loss
                  loss = criterion(outputs, labels)

                # backward pass (backpropagation)
                  loss.backward()

                # update the weights
                  optimizer.step()

                # print statistics
                  running_loss += loss.item()
                  if i % 200 == 199:    # print every 200 mini-batches
                    print(f'[{epoch + 1}, {i + 1}] loss: {running_loss / 200:.3f}')
                    running_loss = 0.0

        print('Finished Training')


        # Testing the model
        correct = 0
        total = 0
        with torch.no_grad():
            for data in validation_loader:
                images, labels = data
                outputs = net(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

        accuracy = 100 * correct / total
        print(f'LR: {lr}, Epochs: {num_epochs}, Accuracy: {accuracy:.2f}%')

        # Save the best model
        if accuracy > best_accuracy:
            best_accuracy = accuracy
            best_lr = lr
            best_epoch = num_epochs

    print(f'Best Accuracy: {best_accuracy:.2f}%, Learning Rate: {best_lr}, Epochs: {best_epoch}')


[1, 200] loss: 0.774
[1, 400] loss: 0.364
[1, 600] loss: 0.310
[2, 200] loss: 0.238
[2, 400] loss: 0.216
[2, 600] loss: 0.206
[3, 200] loss: 0.170
[3, 400] loss: 0.165
[3, 600] loss: 0.156
[4, 200] loss: 0.139
[4, 400] loss: 0.126
[4, 600] loss: 0.129
[5, 200] loss: 0.108
[5, 400] loss: 0.105
[5, 600] loss: 0.112
Finished Training
LR: 0.001, Epochs: 5, Accuracy: 95.50%
[1, 200] loss: 0.724
[1, 400] loss: 0.354
[1, 600] loss: 0.333
[2, 200] loss: 0.244
[2, 400] loss: 0.219
[2, 600] loss: 0.193
[3, 200] loss: 0.168
[3, 400] loss: 0.145
[3, 600] loss: 0.158
[4, 200] loss: 0.119
[4, 400] loss: 0.121
[4, 600] loss: 0.122
[5, 200] loss: 0.103
[5, 400] loss: 0.104
[5, 600] loss: 0.106
[6, 200] loss: 0.093
[6, 400] loss: 0.094
[6, 600] loss: 0.084
[7, 200] loss: 0.077
[7, 400] loss: 0.078
[7, 600] loss: 0.083
[8, 200] loss: 0.063
[8, 400] loss: 0.076
[8, 600] loss: 0.071
[9, 200] loss: 0.071
[9, 400] loss: 0.062
[9, 600] loss: 0.067
[10, 200] loss: 0.051
[10, 400] loss: 0.058
[10, 600] loss: 0

Per iterare anche sul numero di layer bisognerebbe aggiungere il seguente codice .
Per brevità di computazione si tralascia di eseguire anche questo step, che è ovvio.

In [None]:
import torch.optim as optim

# Definisci una lista di configurazioni per gli strati nascosti
hidden_layers_configurations = [
    [128, 64],  # Rete con 2 strati nascosti (128 neuroni e 64 neuroni)
    [128, 128, 64],  # Rete con 3 strati nascosti
    [256, 128, 64, 32]  # Rete con 4 strati nascosti
    # ... altre configurazioni ...
]

# Definisci una lista di learning rates e epoche da provare
learning_rates = [0.001, 0.01, 0.1]
num_epochs_options = [5, 10, 15]

# Inizializza le variabili per tenere traccia della migliore configurazione
best_accuracy = 0
best_config = None

for hidden_sizes in hidden_layers_configurations:
    for lr in learning_rates:
        for num_epochs in num_epochs_options:
            # Inizializza la rete neurale con la configurazione corrente
            net = FlexibleFeedforwardNN(input_size, hidden_sizes, output_size)

            # Definisci la funzione di perdita e l'ottimizzatore
            criterion = nn.CrossEntropyLoss()
            optimizer = optim.Adam(net.parameters(), lr=lr)

            # Ciclo di training
            for epoch in range(num_epochs):
                # ... (passaggi del training) ...
                pass  # Sostituisci con il codice del ciclo di training

            # Valutazione del modello
            # ... (passaggi del testing) ...
            # Calcola l'accuratezza e aggiorna la migliore configurazione se necessario

            if accuracy > best_accuracy:
                best_accuracy = accuracy
                best_config = (hidden_sizes, lr, num_epochs)

print(f'Best Accuracy: {best_accuracy:.2f}%, Configuration: {best_config}')
