**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a neural network using PyTorch to predict handwritten digits of MNIST.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


1. **Data Preparation**

In [15]:
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms
from sklearn.model_selection import train_test_split

# Step 1: Preparazione dei dati

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Caricamento del dataset MNIST
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Divisione del dataset di addestramento in set di addestramento e validazione
train_data, val_data, train_labels, val_labels = train_test_split(train_dataset.data, train_dataset.targets, test_size=0.2, random_state=42)






**2. Neural Network Architecture**

In [20]:
# Definizione della rete neurale
class NeuralNetwork(nn.Module):
    def __init__(self, input_size, output_size, hidden_layers):
        super(NeuralNetwork, self).__init__()
        self.input_size = input_size
        self.output_size = output_size
        self.hidden_layers = hidden_layers

        # Creazione dei moduli lineari e delle funzioni di attivazione per i livelli nascosti
        self.hidden = nn.ModuleList([nn.Linear(self.input_size, self.hidden_layers[0])])
        layer_sizes = zip(self.hidden_layers[:-1], self.hidden_layers[1:])
        self.hidden.extend([nn.Linear(h1, h2) for h1, h2 in layer_sizes])

        # Output layer
        self.output = nn.Linear(self.hidden_layers[-1], self.output_size)

        # Funzione di attivazione
        self.relu = nn.ReLU()

    def forward(self, x):
        for linear in self.hidden:
            x = self.relu(linear(x))
        x = self.output(x)
        return x

params = [
    {"lr": 0.01, "epochs": 15, "hidden_layers": [64, 32]},
    {"lr": 0.001, "epochs": 15, "hidden_layers": [64, 32]},
    {"lr": 0.001, "epochs": 20, "hidden_layers": [64, 32]},
    {"lr": 0.0001, "epochs": 20, "hidden_layers": [64, 32]}
]

**3-4-5. Loss Function, Optimizer, Training and Testing**


In [22]:
for idx, param_set in enumerate(params, start=1):
    print(f"Running Test {idx}:")
    
    # Creazione dell'istanza del modello
    input_size = 28 * 28
    output_size = 10
    model = NeuralNetwork(input_size, output_size, param_set["hidden_layers"])

    # Definizione del criterio di loss e dell'ottimizzatore
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=param_set["lr"])

    # Definizione del numero di epoche e del batch size
    num_epochs = param_set["epochs"]
    batch_size = 32

    # Creazione del DataLoader per il set di addestramento
    train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

    # Training del modello
    for epoch in range(num_epochs):
        running_loss = 0.0
        for i, (inputs, labels) in enumerate(train_loader, 0):
            inputs = inputs.view(-1, 28*28)

            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

            if (i+1) % 1000 == 0:
                print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {running_loss/1000:.4f}')
                running_loss = 0.0
                
    # Valutazione dell'accuratezza sul set di test
    test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
    model.eval()

    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs = inputs.view(-1, 28*28)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f'Accuracy on the test set: {accuracy:.2f}%')
    print("=" * 30)


Running Test 1:
Epoch [1/15], Step [1000/1875], Loss: 0.5031
Epoch [2/15], Step [1000/1875], Loss: 0.3080
Epoch [3/15], Step [1000/1875], Loss: 0.2934
Epoch [4/15], Step [1000/1875], Loss: 0.2912
Epoch [5/15], Step [1000/1875], Loss: 0.2752
Epoch [6/15], Step [1000/1875], Loss: 0.2679
Epoch [7/15], Step [1000/1875], Loss: 0.2598
Epoch [8/15], Step [1000/1875], Loss: 0.2648
Epoch [9/15], Step [1000/1875], Loss: 0.2669
Epoch [10/15], Step [1000/1875], Loss: 0.2559
Epoch [11/15], Step [1000/1875], Loss: 0.2592
Epoch [12/15], Step [1000/1875], Loss: 0.2508
Epoch [13/15], Step [1000/1875], Loss: 0.2442
Epoch [14/15], Step [1000/1875], Loss: 0.2478
Epoch [15/15], Step [1000/1875], Loss: 0.2373
Accuracy on the test set: 92.61%
Running Test 2:
Epoch [1/15], Step [1000/1875], Loss: 0.5095
Epoch [2/15], Step [1000/1875], Loss: 0.2301
Epoch [3/15], Step [1000/1875], Loss: 0.1721
Epoch [4/15], Step [1000/1875], Loss: 0.1446
Epoch [5/15], Step [1000/1875], Loss: 0.1199
Epoch [6/15], Step [1000/1875