**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a neural network using PyTorch to predict handwritten digits of MNIST.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


## 1. Data Preparation: Load the MNIST dataset using torchvision.datasets.MNIST. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, sklearn.model_selection.train_test_split(). Bonus scores: use PyTorch's built- DataLoader to split the dataset.

In [2]:
import torch
from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])


In [3]:
# Download and load the training data
trainset = datasets.MNIST('/content/MNIST_data/', download=True, train=True, transform=transform)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /content/MNIST_data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 71885523.28it/s]


Extracting /content/MNIST_data/MNIST/raw/train-images-idx3-ubyte.gz to /content/MNIST_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /content/MNIST_data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 15867919.02it/s]

Extracting /content/MNIST_data/MNIST/raw/train-labels-idx1-ubyte.gz to /content/MNIST_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /content/MNIST_data/MNIST/raw/t10k-images-idx3-ubyte.gz



100%|██████████| 1648877/1648877 [00:00<00:00, 24091558.40it/s]


Extracting /content/MNIST_data/MNIST/raw/t10k-images-idx3-ubyte.gz to /content/MNIST_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /content/MNIST_data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 4701512.53it/s]

Extracting /content/MNIST_data/MNIST/raw/t10k-labels-idx1-ubyte.gz to /content/MNIST_data/MNIST/raw






In [4]:
from torch.utils.data.sampler import SubsetRandomSampler
import numpy as np

# Define the size of the validation set
validation_size = 0.2

# Get the indices for training and validation
num_train = len(trainset)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(validation_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

# Define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)


In [5]:
from torch.utils.data import DataLoader

# How many samples per batch to load
batch_size = 20

# Prepare data loaders
trainloader = DataLoader(trainset, batch_size=batch_size, sampler=train_sampler)
validloader = DataLoader(trainset, batch_size=batch_size, sampler=valid_sampler)


## 2. Neural Network Architecture: Define a simple feedforward neural network using PyTorch's nn.Module. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. Bonus scores: Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.

In [20]:
import torch.nn as nn
import torch.nn.functional as F

class MNIST_Net(nn.Module):

    def __init__(self):
        super().__init__()

        # Define layers
        self.layer1 = nn.Linear(28*28, 128)  # First hidden layer with 128 neurons
        self.act1 = nn.ReLU()                # Apply layer with ReLU activation for hidden layers
        self.layer2 = nn.Linear(128, 64)     # Second hidden layer with 64 neurons
        self.act2 = nn.ReLU()                # Apply layer with ReLU activation for hidden layers
        self.output = nn.Linear(64, 10)      # Output layer with 10 neurons

    def forward(self, x):
        # Flatten the input tensor
        x = x.view(-1, 28*28)
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.output(x)
        return x

class MNIST_Net_Bonus(nn.Module):

  def __init__(self, input_size, hidden_sizes, output_size):
      super().__init__()

      # Create a list of all layer sizes: input, hidden, and output
      all_sizes = [input_size] + hidden_sizes + [output_size]

      # Create layers dynamically
      self.layers = nn.ModuleList()
      for i in range(len(all_sizes) - 1):
          self.layers.append(nn.Linear(all_sizes[i], all_sizes[i + 1]))

  def forward(self, x):
      # Flatten the input tensor
      x = x.view(-1, 28*28)

      # Pass data through all layers except for the last one using ReLU
      for layer in self.layers[:-1]:
          x = F.relu(layer(x))

      # No activation function for the last layer (output layer)
      x = self.layers[-1](x)

      return x


In [21]:
# model = MNIST_Net()
model_bonus = MNIST_Net_Bonus(784, [128, 64], 10)

## 3. Loss Function and Optimizer: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

In [22]:
# Loss function
loss_function = nn.CrossEntropyLoss()

In [23]:
import torch.optim as optim

# Optimizer - SGD
# optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# Optimizer - Adam
optimizer = optim.Adam(model_bonus.parameters(), lr=0.001)

## 4. Training: Write a training loop to iterate over the dataset. Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

In [24]:
# Number of epochs
num_epochs = 10

for epoch in range(num_epochs):

    # Loop over the dataset in batches
    for inputs, labels in trainloader:  # trainloader is our DataLoader
        # Forward pass
        outputs = model_bonus(inputs)
        loss = loss_function(outputs, labels)

        # Backward pass and optimization
        optimizer.zero_grad()  # Clear existing gradients
        loss.backward()        # Compute gradients of all variables wrt loss
        optimizer.step()       # Perform updates using calculated gradients

    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}")


Epoch 1/5, Loss: 0.16604934632778168
Epoch 2/5, Loss: 0.5756732225418091
Epoch 3/5, Loss: 0.017475536093115807
Epoch 4/5, Loss: 0.03351627662777901
Epoch 5/5, Loss: 0.33252060413360596


## 5. Testing: Evaluate the trained model on the test set. Calculate the accuracy of the model.

In [25]:
# Track the number of correct predictions
correct = 0
total = 0

# Disable gradient computation; we don't need it for evaluation
with torch.no_grad():
    for inputs, labels in validloader:  #  testloader is DataLoader for test set
        # Forward pass
        outputs = model_bonus(inputs)

        # Get indexes of predictions from the maximum value
        _, predicted = torch.max(outputs.data, 1)

        # Total number of labels
        total += labels.size(0)

        # Total correct predictions
        correct += (predicted == labels).sum().item()

# Calculate the accuracy
accuracy = 100 * correct / total
print(f'Accuracy of the model on the test set: {accuracy}%')


Accuracy of the model on the test set: 96.09166666666667%


## 6. Optimization: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g., GridSearchCV.

In [26]:
!pip install optuna

Collecting optuna
  Downloading optuna-3.5.0-py3-none-any.whl (413 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m413.4/413.4 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.13.1-py3-none-any.whl (233 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.4/233.4 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting colorlog (from optuna)
  Downloading colorlog-6.8.0-py3-none-any.whl (11 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.0-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.6/78.6 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: Mako, colorlog, alembic, optuna
Successfully installed Mako-1.3.0 alembic-1.13.1 colorlog-6.8.0 optuna-3.5.0


In [28]:
import optuna

def objective(trial):

    # Hyperparameters to optimize
    learning_rate = trial.suggest_categorical('learning_rate', [0.0001, 0.001, 0.01])
    num_epochs = trial.suggest_categorical('num_epochs', [5, 10, 15, 20])
    hidden_size = trial.suggest_categorical('hidden_size', [[128, 64], [128], [128, 64, 16]])
    batch_size = trial.suggest_categorical('batch_size', [64, 256, 512, 1024, 2048])

    trainloader = DataLoader(trainset, batch_size=batch_size, sampler=train_sampler)
    validloader = DataLoader(trainset, batch_size=batch_size, sampler=valid_sampler)
    input_size = 28*28
    output_size = 10
    model_optimize = MNIST_Net_Bonus(input_size, hidden_size, output_size)
    optimizer = optim.Adam(model_optimize.parameters(), lr=learning_rate)

    # Training loop
    model_bonus.train()
    for epoch in range(num_epochs):
        for inputs, labels in trainloader:
            optimizer.zero_grad()
            outputs = model_optimize(inputs)
            loss = loss_function(outputs, labels)
            loss.backward()
            optimizer.step()

    # evaluation
    model_optimize.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in validloader:
            outputs = model_optimize(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total

    return accuracy

# Create a study object and optimize
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=10)

# Get the best hyperparameters
best_params = study.best_params
best_learning_rate = best_params['learning_rate']
best_num_epochs = best_params['num_epochs']
best_hidden_size = best_params['hidden_size']
best_batch_size = best_params['batch_size']


print("Best Hyperparameters:")
print(f"Learning Rate: {best_learning_rate}")
print(f"Number of Epochs: {best_num_epochs}")
print(f"Hidden Size: {best_hidden_size}")
print(f"Batch Size: {best_batch_size}")

[I 2023-12-21 16:03:32,994] A new study created in memory with name: no-name-facf2b60-7068-439e-b818-a663fa7b4c3c
[I 2023-12-21 16:08:35,669] Trial 0 finished with value: 94.49166666666666 and parameters: {'learning_rate': 0.01, 'num_epochs': 15, 'hidden_size': [128, 64, 16], 'batch_size': 64}. Best is trial 0 with value: 94.49166666666666.
[I 2023-12-21 16:14:12,282] Trial 1 finished with value: 93.44166666666666 and parameters: {'learning_rate': 0.0001, 'num_epochs': 20, 'hidden_size': [128, 64], 'batch_size': 512}. Best is trial 0 with value: 94.49166666666666.
[I 2023-12-21 16:18:33,531] Trial 2 finished with value: 97.175 and parameters: {'learning_rate': 0.001, 'num_epochs': 15, 'hidden_size': [128, 64], 'batch_size': 256}. Best is trial 2 with value: 97.175.
[I 2023-12-21 16:20:12,917] Trial 3 finished with value: 92.80833333333334 and parameters: {'learning_rate': 0.0001, 'num_epochs': 5, 'hidden_size': [128], 'batch_size': 64}. Best is trial 2 with value: 97.175.
[I 2023-12-21

Best Hyperparameters:
Learning Rate: 0.001
Number of Epochs: 20
Hidden Size: [128, 64, 16]
Batch Size: 64


This required a lot of time (44 min) but I was in no rush.