**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a feedforward neural network using PyTorch to predict the species of iris flowers in a multiclass classification problem. The dataset used for this challenge is the Iris dataset, which consists of features like sepal length, sepal width, petal length, and petal width.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


In [1]:

import torch.nn as nn
import torch.optim as optim
import torchvision
import torch

transform= torchvision.transforms.Compose([torchvision.transforms.ToTensor(), torchvision.transforms.Normalize((0.5,), (0.5,))])

trainset=torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)

testset=torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform )

# Get a sample image
sample_image, _ = trainset[0]

# Print the shape of the image
print("Image shape:", sample_image.shape)

batch_size=64
trainloader=torch.utils.data.DataLoader(trainset, batch_size=batch_size, drop_last=True, shuffle=True)

testloader=torch.utils.data.DataLoader(testset, batch_size=batch_size, drop_last=True, shuffle=True)

Image shape: torch.Size([1, 28, 28])


In [2]:
class MNISTNN(nn.Module):
    def __init__(self, hidden_layers=[]):
        super().__init__()
        
         # List to store hidden layer modules
        self.hidden_layers = nn.ModuleList()

        # Input layer
        input_size = 28 * 28
        current_size = input_size

        # Add hidden layers
        for hidden_size in hidden_layers:
            self.hidden_layers.append(nn.Linear(current_size, hidden_size))
            self.hidden_layers.append(nn.ReLU())
            current_size = hidden_size

        # Output layer
        self.output_layer = nn.Linear(current_size, 10)
        self.output_act = nn.Sigmoid()

    def forward(self, x):
        # Apply hidden layers
        for layer in self.hidden_layers:
            x = layer(x)

        # Output layer
        x = self.output_layer(x)
        x = self.output_act(x)
        return x

model =MNISTNN(hidden_layers=[128, 64, 32])
print (model)

MNISTNN(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=784, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=64, bias=True)
    (3): ReLU()
    (4): Linear(in_features=64, out_features=32, bias=True)
    (5): ReLU()
  )
  (output_layer): Linear(in_features=32, out_features=10, bias=True)
  (output_act): Sigmoid()
)


In [3]:
#Loss fuction 
loss_fn = nn.CrossEntropyLoss()  
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 5  # Adjust as needed

for epoch in range(num_epochs):
    model.train()  # Set the model to training mode

    for inputs, labels in trainloader:
        optimizer.zero_grad()  # Zero the gradients
        outputs = model(inputs.view(inputs.size(0), -1))  # Forward pass
        loss = loss_fn(outputs, labels)  # Compute the loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights

    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {loss.item()}")

Epoch 1/5, Loss: 1.5343281030654907
Epoch 2/5, Loss: 1.548176646232605
Epoch 3/5, Loss: 1.528997778892517
Epoch 4/5, Loss: 1.5280919075012207
Epoch 5/5, Loss: 1.5000522136688232


In [4]:
# Testing the model on the test set
model.eval()  # Set the model to evaluation mode
correct = 0
total = 0

with torch.no_grad():  # Disable gradient computation for evaluation
    for inputs, labels in testloader:
        outputs = model(inputs.view(inputs.size(0), -1))
        _, predicted = torch.max(outputs, 1)  # Get the index of the max log-probability
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f"Accuracy on the test set: {100 * accuracy:.2f}%")

Accuracy on the test set: 95.42%


In [5]:
def train_and_evaluate(hidden_layers, learning_rate, num_epochs):
    # Create the model
    model = MNISTNN(hidden_layers)
    
    # Loss function and optimizer
    loss_fn = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    # Training loop
    for epoch in range(num_epochs):
        model.train()
        for inputs, labels in trainloader:
            optimizer.zero_grad()
            outputs = model(inputs.view(inputs.size(0), -1))
            loss = loss_fn(outputs, labels)
            loss.backward()
            optimizer.step()

    # Testing the model on the test set
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in testloader:
            outputs = model(inputs.view(inputs.size(0), -1))
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = correct / total
    return accuracy

# Hyperparameter optimization
hidden_layers_options = [[128, 64, 32], [256, 128], [64, 64, 64]]
learning_rate_options = [0.001, 0.01]
num_epochs_options = [5, 10]

best_accuracy = 0
best_hyperparameters = None

# Iterate over hyperparameter options
for hidden_layers in hidden_layers_options:
    for learning_rate in learning_rate_options:
        for num_epochs in num_epochs_options:
            accuracy = train_and_evaluate(hidden_layers, learning_rate, num_epochs)
            print(f"Hyperparameters: {hidden_layers}, Learning Rate: {learning_rate}, Epochs: {num_epochs}, Accuracy: {100 * accuracy:.2f}%")

            # Update best hyperparameters if the current model has higher accuracy
            if accuracy > best_accuracy:
                best_accuracy = accuracy
                best_hyperparameters = {
                    'hidden_layers': hidden_layers,
                    'learning_rate': learning_rate,
                    'num_epochs': num_epochs
                }

print(f"\nBest Hyperparameters: {best_hyperparameters}, Best Accuracy: {100 * best_accuracy:.2f}%")

Hyperparameters: [128, 64, 32], Learning Rate: 0.001, Epochs: 5, Accuracy: 95.19%
Hyperparameters: [128, 64, 32], Learning Rate: 0.001, Epochs: 10, Accuracy: 96.04%
Hyperparameters: [128, 64, 32], Learning Rate: 0.01, Epochs: 5, Accuracy: 12.43%
Hyperparameters: [128, 64, 32], Learning Rate: 0.01, Epochs: 10, Accuracy: 21.38%
Hyperparameters: [256, 128], Learning Rate: 0.001, Epochs: 5, Accuracy: 96.31%
Hyperparameters: [256, 128], Learning Rate: 0.001, Epochs: 10, Accuracy: 97.18%
Hyperparameters: [256, 128], Learning Rate: 0.01, Epochs: 5, Accuracy: 10.11%
Hyperparameters: [256, 128], Learning Rate: 0.01, Epochs: 10, Accuracy: 10.34%
Hyperparameters: [64, 64, 64], Learning Rate: 0.001, Epochs: 5, Accuracy: 94.01%
Hyperparameters: [64, 64, 64], Learning Rate: 0.001, Epochs: 10, Accuracy: 95.97%
Hyperparameters: [64, 64, 64], Learning Rate: 0.01, Epochs: 5, Accuracy: 14.00%
Hyperparameters: [64, 64, 64], Learning Rate: 0.01, Epochs: 10, Accuracy: 9.77%

Best Hyperparameters: {'hidden_l