**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a feedforward neural network using PyTorch to predict the species of iris flowers in a multiclass classification problem. The dataset used for this challenge is the Iris dataset, which consists of features like sepal length, sepal width, petal length, and petal width.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


In [1]:
import torch
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader, random_split
import torch.nn as nn

# Load the MNIST dataset
mnist = MNIST(root='data/', train=True, download=True)

# Get the features and labels
features = mnist.data
labels = mnist.targets

# Normalize the features
features = features.float() / 255.0

# Flatten the features and convert them to tensors
features = features.view(-1, 28*28).clone().detach()

# Convert the labels to tensors
labels = labels.clone().detach()

# Combine features and labels into a dataset
dataset = torch.utils.data.TensorDataset(features, labels)
# Split the dataset into training and testing sets using DataLoader
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

# Create DataLoader for training and testing sets
batch_size = 64 #64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)


2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

In [2]:
print(features[0:2].shape)

torch.Size([2, 784])


In [3]:
# class MNISTNeuralNetwork(nn.Module):

#   def __init__(self, input_layer_size = 28*28, output_layer_size = 10):
#     super().__init__()
    
#     self.first_layer = nn.Linear(input_layer_size,256)
#     self.act1 = nn.ReLU()
#     self.second_layer = nn.Linear(256, 128)
#     self.act2 = nn.ReLU()

#     self.third_layer = nn.Linear(128, 64)
#     self.act3 = nn.ReLU()

#     self.output = nn.Linear(64, output_layer_size)
#     #self.output_act = nn.Softmax(dim=1) #we will use CrossEntropyLoss so we don't need softmax, because CE includes softmax

#   def forward(self, x):
#     x = self.act1(self.first_layer(x))
#     x = self.act2(self.second_layer(x))
#     x = self.act2(self.third_layer(x))
#     x = self.output(x)
#     return x

In [4]:
class MNISTNeuralNetwork(nn.Module):

  def __init__(self, input_layer_size = 28*28, hidden_layers = [256, 128, 64], output_layer_size = 10):
    super().__init__()

    self.layers = nn.ModuleList()

    # Add the first layer
    self.layers.append(nn.Linear(input_layer_size, hidden_layers[0]))
    self.layers.append(nn.ReLU())

    # Add the hidden layers
    for i in range(len(hidden_layers) - 1):
      self.layers.append(nn.Linear(hidden_layers[i], hidden_layers[i+1]))
      self.layers.append(nn.ReLU())

    # Add the output layer
    self.layers.append(nn.Linear(hidden_layers[-1], output_layer_size))

  def forward(self, x):
    for layer in self.layers:
      x = layer(x)
    return x

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

In [5]:
model = MNISTNeuralNetwork()

In [6]:
model

MNISTNeuralNetwork(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=128, bias=True)
    (3): ReLU()
    (4): Linear(in_features=128, out_features=64, bias=True)
    (5): ReLU()
    (6): Linear(in_features=64, out_features=10, bias=True)
  )
)

In [7]:
import torch.optim as optim
loss_fn = nn.CrossEntropyLoss()  # cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.001)

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

In [8]:
def train(model, epochs, train_loader, loss_fn, optimizer):
    model.train()  # Set the model to training mode
    total_loss = 0
    for epoch in range(epochs):
        for batch in train_loader:
            # Assuming your batch is a tuple (data, targets)
            data, targets = batch

            # Forward pass
            output = model(data)

            # Compute the loss
            loss = loss_fn(output, targets)

            # Backward pass and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            total_loss += loss.item()
        print(f'Finished epoch {epoch + 1}, latest loss {loss}')
    # Return average loss
    return total_loss / len(train_loader)

In [9]:
average_loss = train(model, 10, train_loader, loss_fn, optimizer)

print("The average loss is: ", average_loss)

Finished epoch 1, latest loss 0.18443210422992706
Finished epoch 2, latest loss 0.1391804814338684
Finished epoch 3, latest loss 0.1525273621082306
Finished epoch 4, latest loss 0.017943955957889557
Finished epoch 5, latest loss 0.018885096535086632
Finished epoch 6, latest loss 0.11280129104852676
Finished epoch 7, latest loss 0.04102528840303421
Finished epoch 8, latest loss 0.02543696202337742
Finished epoch 9, latest loss 0.030377862975001335
Finished epoch 10, latest loss 0.0013342633610591292
The average loss is:  0.8162300818092771


5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

In [10]:
def predict(model, dataloader, num_predictions=10):
    model.eval()  # Set the model to evaluation mode
    all_predictions = []
    all_actuals = []

    with torch.no_grad():  # Disable gradient computations
        for i, batch in enumerate(dataloader):
            # Assuming your batch is a tuple (data, targets)
            data, targets = batch

            output = model(data)
            _, predicted = torch.max(output, dim=1)  # Get the predicted class

            all_predictions.extend(predicted.tolist())
            all_actuals.extend(targets.tolist())

            # Break the loop after num_predictions batches
            if i >= num_predictions - 1:
                break

    return all_predictions[:num_predictions], all_actuals[:num_predictions]

In [11]:
predictions, actuals = predict(model, test_loader, 10)
print('Predictions:', predictions[:9])
print('Actuals:', actuals[:9])

Predictions: [6, 6, 9, 9, 0, 2, 0, 5, 0]
Actuals: [6, 6, 9, 9, 0, 2, 0, 5, 0]


6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.

I've tried multiple number of epochs and batch sizes. For a greater number of epochs than 0, the algorithm already starts to overfit, so 10 is a good number. The same thing applies for the number of batch sizes.
I tried with a bigger learning rate, but the training wouldn't be consistent, it won't find the minimum loss everytime.