**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a neural network using PyTorch to predict handwritten digits of MNIST.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


In [27]:
# insert code here
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Subset

In [28]:
transform = transforms.Compose([transforms.ToTensor()])

In [29]:
trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
testset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

In [30]:
batch_size = 32

trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True)
testloader = DataLoader(testset, batch_size=batch_size, shuffle=True)


In [31]:
class NeuralNetwork(nn.Module):
    def __init__(self, input_size, output_size, hidden_layers):
        super(NeuralNetwork, self).__init__()
        layers = []
        self.input_size = input_size
        self.output_size = output_size
        
        # input layer
        layers.append(nn.Linear(input_size, hidden_layers[0]))
        layers.append(nn.ReLU())
        
        # hidden layers
        for i in range(len(hidden_layers) - 1):
            layers.append(nn.Linear(hidden_layers[i], hidden_layers[i+1]))
            layers.append(nn.ReLU())
        
        # output layer
        layers.append(nn.Linear(hidden_layers[-1], output_size))
        self.model = nn.Sequential(*layers)
    
    def forward(self, x):
        x = x.view(-1, self.input_size)  
        return self.model(x)


In [32]:
# Hyperparameters
input_size = 28 * 28
output_size = 10
hidden_layers = [128, 64]

In [33]:
model = NeuralNetwork(input_size, output_size, hidden_layers)


In [34]:
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

In [35]:
# Training Loop with accuracy

n_epochs = 20
for epoch in range(n_epochs):
  losses = []
  for inputs, labels in trainloader:
    # forward, backward, and then weight update
    y_pred = model(inputs)
    loss = loss_fn(y_pred, labels)
    losses.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

  print(f'Epoch {epoch + 1} --> loss = {np.mean(losses)}')

  acc = 0
  count = 0
  for inputs, labels in testloader:
    y_pred = model(inputs)
    acc += (torch.argmax(y_pred, 1) == labels).float().sum()
    count += len(labels)
  acc /= count
  print("Epoch %d: model accuracy %.2f%%" % (epoch, acc*100))

torch.save(model.state_dict(), "cifar10model.pth")

Epoch 1 --> loss = 1.1600477184295654
Epoch 0: model accuracy 88.17%
Epoch 2 --> loss = 0.37913040666977565
Epoch 1: model accuracy 90.75%
Epoch 3 --> loss = 0.31166078842282297
Epoch 2: model accuracy 92.02%
Epoch 4 --> loss = 0.2747464917043845
Epoch 3: model accuracy 92.90%
Epoch 5 --> loss = 0.2458111511905988
Epoch 4: model accuracy 93.45%
Epoch 6 --> loss = 0.22104715250929197
Epoch 5: model accuracy 94.03%
Epoch 7 --> loss = 0.19925018932620683
Epoch 6: model accuracy 94.48%
Epoch 8 --> loss = 0.18024860512316226
Epoch 7: model accuracy 95.05%
Epoch 9 --> loss = 0.16365376180261373
Epoch 8: model accuracy 95.19%
Epoch 10 --> loss = 0.14939023410975932
Epoch 9: model accuracy 95.76%
Epoch 11 --> loss = 0.1367811261107524
Epoch 10: model accuracy 96.03%
Epoch 12 --> loss = 0.12597361603528262
Epoch 11: model accuracy 96.23%
Epoch 13 --> loss = 0.11655820522233844
Epoch 12: model accuracy 96.25%
Epoch 14 --> loss = 0.10849703327616056
Epoch 13: model accuracy 96.58%
Epoch 15 --> lo

In [38]:
model.eval()
correct = 0
total = 0

In [39]:
# Accuracy

with torch.no_grad():
    for images, labels in testloader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f"Accuracy on test set: {accuracy * 100:.2f}%")

Accuracy on test set: 97.20%


In [40]:
# Optimizing

learning_rates = [0.001, 0.01, 0.1]
num_epochs_list = [5, 10, 15]
hidden_layer_sizes = [[128, 64], [256, 128, 64], [64, 32]]
batch_sizes = [32, 64, 128]

best_accuracy = 0.0
best_hyperparameters = {}

# Iterate through hyperparameters
for lr in learning_rates:
    for num_epochs in num_epochs_list:
        for hidden_layers in hidden_layer_sizes:
            for batch_size in batch_sizes:
                # DataLoader 
                trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True)
                testloader = DataLoader(testset,batch_size=batch_size, shuffle=False)
                
                # create the model
                model = NeuralNetwork(input_size, output_size, hidden_layers)
                optimizer = optim.SGD(model.parameters(), lr=lr)
                
                # train the model and evaluate
                n_epochs = 20
                for epoch in range(n_epochs):
                  losses = []
                  for inputs, labels in trainloader:
                    # forward, backward and weight update
                    y_pred = model(inputs)
                    loss = loss_fn(y_pred, labels)
                    losses.append(loss.item())
                    optimizer.zero_grad()
                    loss.backward()
                    optimizer.step()
                  print(f'Epoch {epoch + 1} --> loss = {np.mean(losses)}')
                  acc = 0
                  count = 0
                  for inputs, labels in testloader:
                    y_pred = model(inputs)
                    acc += (torch.argmax(y_pred, 1) == labels).float().sum()
                    count += len(labels)
                  acc /= count
                  print("Epoch %d: model accuracy %.2f%%" % (epoch, acc*100))
                
                # Check if hyperparameters give a better accuracy
                if accuracy > best_accuracy:
                    best_accuracy = accuracy
                    best_hyperparameters = {
                        'learning_rate': lr,
                        'num_epochs': num_epochs,
                        'hidden_layers': hidden_layers,
                        'batch_size': batch_size
                    }

print("Best Accuracy:", best_accuracy)
print("Best Hyperparameters:", best_hyperparameters)


Epoch 1 --> loss = 2.279031266784668
Epoch 0: model accuracy 30.22%
Epoch 2 --> loss = 2.1683673913319907
Epoch 1: model accuracy 63.50%
Epoch 3 --> loss = 1.8736804149627686
Epoch 2: model accuracy 71.16%
Epoch 4 --> loss = 1.3527560563087464
Epoch 3: model accuracy 77.36%
Epoch 5 --> loss = 0.9267922252496084
Epoch 4: model accuracy 81.15%
Epoch 6 --> loss = 0.7155301062742869
Epoch 5: model accuracy 83.87%
Epoch 7 --> loss = 0.6030998376687368
Epoch 6: model accuracy 85.45%
Epoch 8 --> loss = 0.533059915359815
Epoch 7: model accuracy 86.68%
Epoch 9 --> loss = 0.48524137518405913
Epoch 8: model accuracy 87.74%
Epoch 10 --> loss = 0.45091147241592405
Epoch 9: model accuracy 88.30%
Epoch 11 --> loss = 0.42497333874702453
Epoch 10: model accuracy 88.68%
Epoch 12 --> loss = 0.404754472275575
Epoch 11: model accuracy 89.26%
Epoch 13 --> loss = 0.38868677516380945
Epoch 12: model accuracy 89.79%
Epoch 14 --> loss = 0.37558669815460843
Epoch 13: model accuracy 90.04%
Epoch 15 --> loss = 0.3