**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a feedforward neural network using PyTorch to predict the species of iris flowers in a multiclass classification problem. The dataset used for this challenge is the Iris dataset, which consists of features like sepal length, sepal width, petal length, and petal width.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


In [1]:
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms
from torch.utils.data import DataLoader
import torch.optim as optim
import numpy as np

### 1) Data Preparation

In [2]:
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

batch_size = 16

train_loader = DataLoader(
  torchvision.datasets.MNIST(root='./data', train=True, download=True,transform=transform),
  batch_size=batch_size, shuffle=True)

test_loader = DataLoader(
  torchvision.datasets.MNIST(root='./data', train=False, download=True,
                             transform=transform),
  batch_size=batch_size, shuffle=True)

### 2) Neural Network Architecture

In [7]:
class MyNN(nn.Module):
    def __init__(self,nb_hidden_layers):
        super(MyNN,self).__init__()
        self.fc1 = nn.Linear(28*28, 512)
        self.hidden_list = nn.ModuleList([nn.Linear(512, 512) for _ in range (nb_hidden_layers)])
        self.fc3 = nn.Linear(512, 10)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)

    def forward(self, img): #convert + flatten
        x = img.view(-1, 28*28)
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        for layer in self.hidden_list: 
            x = self.relu(layer(x))
            x = self.dropout(x)
        x = self.fc3(x)
        return x
    
model = MyNN(nb_hidden_layers=2)

In [8]:
model

MyNN(
  (fc1): Linear(in_features=784, out_features=512, bias=True)
  (hidden_list): ModuleList(
    (0-1): 2 x Linear(in_features=512, out_features=512, bias=True)
  )
  (fc3): Linear(in_features=512, out_features=10, bias=True)
  (relu): ReLU()
  (dropout): Dropout(p=0.2, inplace=False)
)

### 3) Loss Function and Optimizer

In [9]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

### 4) Training

In [10]:
n_epochs = 20

model.train()

# Training loop over a specified number of epochs
for epoch in range(n_epochs):
    
    for data,target in train_loader:
        
        # Zero the gradients accumulated in previous iterations
        optimizer.zero_grad()

        # Forward pass
        output = model(data)

        # Calculate the loss
        loss = criterion(output, target)

        # Backward pass
        loss.backward()

        # Update the model parameters
        optimizer.step()

    # Print information about the current epoch
    print(f'Finished epoch {epoch}, latest loss {loss}')

Finished epoch 0, latest loss 0.35597026348114014
Finished epoch 1, latest loss 0.08412156999111176
Finished epoch 2, latest loss 0.3455660939216614
Finished epoch 3, latest loss 0.0978328287601471
Finished epoch 4, latest loss 0.018433762714266777
Finished epoch 5, latest loss 0.13362887501716614
Finished epoch 6, latest loss 0.07205747067928314
Finished epoch 7, latest loss 0.016014179214835167
Finished epoch 8, latest loss 0.03405395522713661
Finished epoch 9, latest loss 0.12341000884771347
Finished epoch 10, latest loss 0.003646095981821418
Finished epoch 11, latest loss 0.06788351386785507
Finished epoch 12, latest loss 0.004085300024598837
Finished epoch 13, latest loss 0.0982888713479042
Finished epoch 14, latest loss 0.1406562626361847
Finished epoch 15, latest loss 0.012005725875496864
Finished epoch 16, latest loss 0.0329718217253685
Finished epoch 17, latest loss 0.0033584078773856163
Finished epoch 18, latest loss 0.0011928463354706764
Finished epoch 19, latest loss 0.0067

### 5) Testing

In [11]:
class_correct = [0 for i in range(10)] 
class_total = [0 for i in range(10)]

with torch.no_grad():

    for data, target in test_loader:
        
        # Forward pass
        output = model(data)
        
        # Calculate the loss
        loss = criterion(output, target)
        
        # Convert output probabilities to predicted class
        _, pred = torch.max(output, 1)
        
        # Compare predictions to true label
        correct = np.squeeze(pred.eq(target.data.view_as(pred)))
        
        # Calculate test accuracy for each object class
        for i in range(batch_size):
            label = target.data[i]
            class_correct[label] += correct[i].item()
            class_total[label] += 1

for i in range(10):
    print(f'Test Accuracy of {i}: {round(100 * class_correct[i] / class_total[i])}% ({np.sum(class_correct[i])}/{np.sum(class_total[i])})') 
    
print(f'Test Accuracy Overall: {round(100 * np.sum(class_correct) / np.sum(class_total))}% ({np.sum(class_correct)}/{np.sum(class_total)})') 

Test Accuracy of 0: 99% (970/980)
Test Accuracy of 1: 99% (1125/1135)
Test Accuracy of 2: 97% (1003/1032)
Test Accuracy of 3: 98% (992/1010)
Test Accuracy of 4: 99% (971/982)
Test Accuracy of 5: 97% (865/892)
Test Accuracy of 6: 98% (937/958)
Test Accuracy of 7: 98% (1005/1028)
Test Accuracy of 8: 96% (934/974)
Test Accuracy of 9: 97% (976/1009)
Test Accuracy Overall: 98% (9778/10000)


### 6) Optimization

We have already a good accuracy on this model, I got those hyperparameters by test and retry, and it takes too much effort to do optimization only to get 2-3% (It's also long to train the model).