<a href="https://colab.research.google.com/github/JaredKeithAveritt/AI_methods_in_advanced_materials_research/blob/main/pytorch_non_linear_classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework: train a Nonlinear Classifier

[NonlinearClassifier as we introduced at the end of week 4](https://github.com/JaredKeithAveritt/AI_methods_in_advanced_materials_research/blob/main/Week_4/01_pytorch_mnist.ipynb).


Do the following to train the `NonlinearClassifier` and evaluate its performance on test data, using the following steps:

1. **Initialize the Nonlinear Classifier Model.**
2. **Create Data Loaders for Training and Test Data.**
3. **Define a Loss Function and an Optimizer.**
4. **Train the Model.**
5. **Evaluate the Model on Test Data.**

### Experimenting with Improvements

To improve the model, consider experimenting with:
- **Increasing Model Complexity**: Adding more layers or increasing the number of neurons in existing layers.
- **Changing Activation Functions**: Experimenting with different activation functions like LeakyReLU or ELU.
- **Adjusting the Learning Rate**: Tuning the learning rate or using learning rate schedulers.
- **Using Different Optimizers**: Trying out optimizers like Adam or RMSprop instead of SGD.
- **Implementing Regularization**: Adding dropout layers or using L2 regularization to prevent overfitting.

Remember, compare models using training and validation data. The test data should only be used as a final check to assess generalization capability.



---

# Step 1 -- Dataset Loading and Preprocessing of Input Features

This is the same as from the notebook from week 4.

In [None]:
%matplotlib inline
#This is a magic command (can only use 1 per code block, and must be the first line) for Jupyter notebooks and IPython environments. It ensures that the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The plots will be stored in the notebook document.

import torch
import torchvision
from torch import nn

import numpy
import matplotlib.pyplot as plt
import time

In [None]:


# Load and transform the training data
training_data = torchvision.datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor()
)

# Load and transform the test data
# Similar to the training data loader but with 'train=False' to specify that we want to
# load the test (or validation) portion of the MNIST dataset. This data is used to evaluate
# the model's performance on unseen data, providing an estimate of its generalization ability.
test_data = torchvision.datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=torchvision.transforms.ToTensor()
)

total_size = len(training_data)  # Total size of the dataset

# Calculate split sizes
train_size = int(total_size * 0.8)  # 80% of the dataset for training
validation_size = total_size - train_size  # Remaining 20% for validation

# Split the dataset into training and validation sets
training_data, validation_data = torch.utils.data.random_split(
    training_data, [train_size, validation_size],
    generator=torch.Generator().manual_seed(55)  # Ensure reproducibility
)

# 'training_data' now contains the training subset,
# 'validation_data' contains the validation subset.

# Print the size of training, validation, and test datasets
print('MNIST data loaded: train:', len(training_data), 'examples,',
      'validation:', len(validation_data), 'examples,',
      'test:', len(test_data), 'examples')

# Print the shape of the input data by accessing the first example in the training dataset
# Note: training_data[0][0] accesses the first image tensor, and .shape retrieves its dimensions
print('Input shape:', training_data[0][0].shape)

pltsize = 1
# Initialize figure with dimensions proportional to the number of images
plt.figure(figsize=(10*pltsize, pltsize))

# Display the first 10 images from the training dataset
for i in range(10):
    plt.subplot(1, 10, i+1)  # Prepare subplot for the ith image
    plt.axis('off')  # Hide the axis for a cleaner look
    # Display the image, reshaping it to 28x28 pixels, in grayscale
    plt.imshow(numpy.reshape(training_data[i][0], (28, 28)), cmap="gray")
    # Add a title with the class of the digit
    plt.title('Class: '+str(training_data[i][1]))

---
# Step 1: Define Classifier and initialize it
I already did this for you.

In [None]:
class NonlinearClassifier(nn.Module):
    def __init__(self):
        super(NonlinearClassifier, self).__init__()
        # Flatten the input image to a vector
        self.flatten = nn.Flatten()

        # Define a stack of layers: linear transformations followed by ReLU activations
        self.layers_stack = nn.Sequential(
            nn.Linear(28*28, 50),  # First layer with 784 inputs and 50 outputs
            nn.ReLU(),  # Nonlinear activation function
            # nn.Dropout(0.2),  # Optional dropout for regularization (commented out)
            nn.Linear(50, 50),  # Second layer, from 50 to 50 neurons
            nn.ReLU(),  # Another ReLU activation
            # nn.Dropout(0.2),  # Another optional dropout layer (commented out)
            nn.Linear(50, 10)   # Final layer that outputs to the 10 classes
        )

    def forward(self, x):
        # Flatten and then pass the data through the layers stack
        x = self.flatten(x)
        x = self.layers_stack(x)
        return x


# Initialize the Nonlinear Classifier Model
nonlinear_model = NonlinearClassifier()

---

# Step 2: Create Data Loaders for Training and Test Data
Assuming `training_data` and `test_data` have already been defined as in previous examples.

Hint: [see the documentation on PyTorch site](https://pytorch.org/docs/stable/data.html)

In [None]:
from torch.utils.data import DataLoader

# Code goes here






---

# Step 3: Define a Loss Function and an Optimizer


In [None]:
# Code goes here


---

# Step 4: Train your model

Hint: A training step is comprised of:
- A forward pass: the input is passed through the network
- Backpropagation: A backward pass to compute the gradient $\frac{\partial J}{\partial \mathbf{W}}$ of the loss function with respect to the parameters of the network.
- Weight updates $\mathbf{W} = \mathbf{W} - \alpha \frac{\partial J}{\partial \mathbf{W}} $ where $\alpha$ is the learning rate.

In [None]:
# code goes here








#### Train the model for a certain number of epochs:



In [None]:
# code goes here








---

# Step 5: Evaluate the Model on Test Data
Now, let's evaluate the model's performance on the test data:

In [None]:
# code goes here






# EXTRA: Experimenting with Improvements

To improve the model, consider experimenting with:
- **Increasing Model Complexity**: Adding more layers or increasing the number of neurons in existing layers.
- **Changing Activation Functions**: Experimenting with different activation functions like LeakyReLU or ELU.
- **Adjusting the Learning Rate**: Tuning the learning rate or using learning rate schedulers.
- **Using Different Optimizers**: Trying out optimizers like Adam or RMSprop instead of SGD.
- **Implementing Regularization**: Adding dropout layers or using L2 regularization to prevent overfitting.

In [None]:
# code goes here
