# Introduction

In this project, you will build a neural network of your own design to evaluate the MNIST dataset.

Some of the benchmark results on MNIST include can be found [on Yann LeCun's page](https://webcache.googleusercontent.com/search?q=cache:stAVPik6onEJ:yann.lecun.com/exdb/mnist) and include:

88% [Lecun et al., 1998](https://hal.science/hal-03926082/document)

95.3% [Lecun et al., 1998](https://hal.science/hal-03926082v1/document)

99.65% [Ciresan et al., 2011](http://people.idsia.ch/~juergen/ijcai2011.pdf)


MNIST is a great dataset for sanity checking your models, since the accuracy levels achieved by large convolutional neural networks and small linear models are both quite high. This makes it important to be familiar with the data.

## Installation

In [None]:
# Update the PATH to include the user installation directory. 
import os
os.environ['PATH'] = f"{os.environ['PATH']}:/root/.local/bin"

# Restart the Kernel before you move on to the next step.

#### Important: Restart the Kernel before you move on to the next step.

In [None]:
# Install requirements
!python -m pip install -r requirements.txt

## Imports

In [None]:
## This cell contains the essential imports you will need – DO NOT CHANGE THE CONTENTS! ##
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
from torch.utils.data import DataLoader

## Load the Dataset

Specify your transforms as a list if you intend to .
The transforms module is already loaded as `transforms`.

MNIST is fortunately included in the torchvision module.
Then, you can create your dataset using the `MNIST` object from `torchvision.datasets` ([the documentation is available here](https://pytorch.org/vision/stable/datasets.html#mnist)).
Make sure to specify `download=True`! 

Once your dataset is created, you'll also need to define a `DataLoader` from the `torch.utils.data` module for both the train and the test set.

In [None]:
# Update Jupyter
!pip install --upgrade jupyter

# Update ipywidgets
!pip install --upgrade ipywidgets

# Enable ipywidgets extension
!jupyter nbextension enable --py widgetsnbextension --sys-prefix

# Install widgetsnbextension
!pip install widgetsnbextension


In [None]:
# Define transforms
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert images to PyTorch tensors
    transforms.Normalize((0.5,), (0.5,))  # Normalize the images to have mean 0.5 and std 0.5
])

# Create training set and define training dataloader
# Load the MNIST training dataset
train_dataset = torchvision.datasets.MNIST(root='data', train=True, download=True, transform=transform)
# Create DataLoader for the training dataset
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Split the training dataset into training and validation sets
validation_split = 0.2
shuffle_dataset = True
random_seed = 42

# Creating data indices for training and validation splits:
dataset_size = len(train_dataset)
indices = list(range(dataset_size))
split = int(np.floor(validation_split * dataset_size))
if shuffle_dataset:
    np.random.seed(random_seed)
    np.random.shuffle(indices)
train_indices, val_indices = indices[split:], indices[:split]

# Creating PT data samplers and loaders:
train_sampler = SubsetRandomSampler(train_indices)
valid_sampler = SubsetRandomSampler(val_indices)

train_loader = DataLoader(train_dataset, batch_size=64, sampler=train_sampler)
val_loader = DataLoader(train_dataset, batch_size=64, sampler=valid_sampler)

# Create test set and define test dataloader
# Load the MNIST test dataset
test_dataset = torchvision.datasets.MNIST(root='data', train=False, download=True, transform=transform)
# Create DataLoader for the test dataset
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

## Justify your preprocessing

In your own words, why did you choose the transforms you chose? If you didn't use any preprocessing steps, why not?

**It was used to convert image data into Tensor**

## Explore the Dataset
Using matplotlib, numpy, and torch, explore the dimensions of your data.

You can view images using the `show5` function defined below – it takes a data loader as an argument.
Remember that normalized images will look really weird to you! You may want to try changing your transforms to view images.
Typically using no transforms other than `toTensor()` works well for viewing – but not as well for training your network.
If `show5` doesn't work, go back and check your code for creating your data loaders and your training/test sets.

In [None]:
## This cell contains a function for showing 5 images from a dataloader – DO NOT CHANGE THE CONTENTS! ##
def show5(img_loader):
    dataiter = iter(img_loader)
    
    batch = next(dataiter)
    labels = batch[1][0:5]
    images = batch[0][0:5]
    for i in range(5):
        print(int(labels[i].detach()))
    
        image = images[i].numpy()
        plt.imshow(image.T.squeeze().T)
        plt.show()

In [None]:
# Explore data
print("No. of MNIST train data examples: {}".format(len(train_dataset)))
print("No. of MNIST test data examples: {}".format(len(test_dataset)))
dataiter = iter(train_loader)
images, labels = next(dataiter)
print("No. of images: {}".format(images.shape))
print("No. of labels: {}".format(labels.shape))
show5(train_loader)

## Build your Neural Network
Using the layers in `torch.nn` (which has been imported as `nn`) and the `torch.nn.functional` module (imported as `F`), construct a neural network based on the parameters of the dataset.
Use any architecture you like. 

*Note*: If you did not flatten your tensors in your transforms or as part of your preprocessing and you are using only `Linear` layers, make sure to use the `Flatten` layer in your network!

In [None]:
# Define a simple neural network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)  # First fully connected layer
        self.fc2 = nn.Linear(128, 64)       # Second fully connected layer
        self.fc3 = nn.Linear(64, 10)        # Output layer

    def forward(self, x):
        x = x.view(-1, 28 * 28)            # Flatten the image
        x = F.relu(self.fc1(x))        # Apply ReLU activation
        x = F.relu(self.fc2(x))        # Apply ReLU activation
        x = self.fc3(x)                    # Output layer
        return x

net = Net()

Specify a loss function and an optimizer, and instantiate the model.

If you use a less common loss function, please note why you chose that loss function in a comment.

In [None]:
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()  # Cross-entropy loss for classification
optimizer = optim.Adam(net.parameters(), lr=0.001)  # Adam optimizer

## Running your Neural Network
Use whatever method you like to train your neural network, and ensure you record the average loss at each epoch. 
Don't forget to use `torch.device()` and the `.to()` method for both your model and your data if you are using GPU!

If you want to print your loss **during** each epoch, you can use the `enumerate` function and print the loss after a set number of batches. 250 batches works well for most people!

In [None]:
# Establish lists for loss history
train_loss_history = []
val_loss_history = []

num_epochs = 5  # Number of epochs

for epoch in range(num_epochs):
    net.train()  # Set the network to training mode
    train_loss = 0.0
    train_correct = 0
    for i, data in enumerate(train_loader):
        inputs, labels = data
        
        optimizer.zero_grad()        # Zero the parameter gradients
        outputs = net(inputs)        # Forward pass
        loss = criterion(outputs, labels)  # Compute loss
        loss.backward()              # Backward pass
        optimizer.step()             # Optimize the weights

        _, preds = torch.max(outputs.data, 1)  # Get the index of the max log-probability
        train_correct += (preds == labels).sum().item()
        train_loss += loss.item()

    train_loss /= len(train_loader)  # Calculate average training loss
    train_accuracy = train_correct / len(train_loader.dataset)  # Calculate training accuracy
    train_loss_history.append(train_loss)

    print(f'Epoch {epoch + 1} training accuracy: {train_accuracy * 100:.2f}% training loss: {train_loss:.5f}')

    # Validation step
    net.eval()  # Set the network to evaluation mode
    val_loss = 0.0
    val_correct = 0
    with torch.no_grad():  # No need to track gradients during evaluation
        for data in val_loader:
            inputs, labels = data
            
            outputs = net(inputs)
            loss = criterion(outputs, labels)

            _, preds = torch.max(outputs.data, 1)
            val_correct += (preds == labels).sum().item()
            val_loss += loss.item()

    val_loss /= len(val_loader)  # Calculate average validation loss
    val_accuracy = val_correct / len(val_loader.dataset)  # Calculate validation accuracy
    val_loss_history.append(val_loss)

    print(f'Epoch {epoch + 1} validation accuracy: {val_accuracy * 100:.2f}% validation loss: {val_loss:.5f}')

Plot the training loss (and validation loss/accuracy, if recorded).

In [None]:
# Plot the training and validation loss history
plt.plot(train_loss_history, label="Training Loss")
plt.plot(val_loss_history, label="Validation Loss")
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss History')
plt.show()

## Testing your model
Using the previously created `DataLoader` for the test set, compute the percentage of correct predictions using the highest probability prediction. 

If your accuracy is over 90%, great work, but see if you can push a bit further! 
If your accuracy is under 90%, you'll need to make improvements.
Go back and check your model architecture, loss function, and optimizer to make sure they're appropriate for an image classification task.

In [None]:
def test():
    net.eval()  # Set the network to evaluation mode
    test_loss = 0.0
    test_correct = 0
    with torch.no_grad():  # No need to track gradients during evaluation
        for i, data in enumerate(test_loader):
            inputs, labels = data
            
            outputs = net(inputs)  # Forward pass
            loss = criterion(outputs, labels)  # Compute loss

            _, preds = torch.max(outputs.data, 1)  # Get the index of the max log-probability
            test_correct += (preds == labels).sum().item()
            test_loss += loss.item()

    test_loss /= len(test_loader)  # Calculate average test loss
    test_accuracy = test_correct / len(test_loader.dataset)  # Calculate test accuracy
    print(f'Test accuracy: {test_accuracy * 100:.2f}% test loss: {test_loss:.5f}')

for epoch in range(num_epochs):
    test()


## Improving your model

Once your model is done training, try tweaking your hyperparameters and training again below to improve your accuracy on the test set!

In [None]:
# Define a modified neural network with dropout
class Net2(nn.Module):
    def __init__(self):
        super(Net2, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)  # First fully connected layer
        self.fc2 = nn.Linear(128, 64)       # Second fully connected layer
        self.fc3 = nn.Linear(64, 10)        # Output layer
        self.dropout = nn.Dropout(0.2)      # Dropout layer with 0.2 probability

    def forward(self, x):
        x = x.view(-1, 28 * 28)             # Flatten the image
        x = F.relu(self.fc1(x))             # Apply ReLU activation
        x = self.dropout(x)                 # Apply dropout
        x = F.relu(self.fc2(x))             # Apply ReLU activation
        x = self.dropout(x)                 # Apply dropout
        x = self.fc3(x)                     # Output layer
        return x

# Create the modified neural network and move it to the device
net2 = Net2()


In [None]:
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()  # Cross-entropy loss for classification
optimizer = optim.Adam(net2.parameters(), lr=0.001)  # Adam optimizer

In [None]:
# Establish lists for loss history
train_loss_history = []
val_loss_history = []

num_epochs = 5  # Number of epochs

for epoch in range(num_epochs):
    net2.train()  # Set the network to training mode
    train_loss = 0.0
    train_correct = 0
    for i, data in enumerate(train_loader):
        inputs, labels = data
        
        optimizer.zero_grad()        # Zero the parameter gradients
        outputs = net2(inputs)        # Forward pass
        loss = criterion(outputs, labels)  # Compute loss
        loss.backward()              # Backward pass
        optimizer.step()             # Optimize the weights

        _, preds = torch.max(outputs.data, 1)  # Get the index of the max log-probability
        train_correct += (preds == labels).sum().item()
        train_loss += loss.item()

    train_loss /= len(train_loader)  # Calculate average training loss
    train_accuracy = train_correct / len(train_loader.dataset)  # Calculate training accuracy
    train_loss_history.append(train_loss)

    print(f'Epoch {epoch + 1} training accuracy: {train_accuracy * 100:.2f}% training loss: {train_loss:.5f}')

    # Validation step
    net2.eval()  # Set the network to evaluation mode
    val_loss = 0.0
    val_correct = 0
    with torch.no_grad():  # No need to track gradients during evaluation
        for data in val_loader:
            inputs, labels = data
            
            outputs = net2(inputs)
            loss = criterion(outputs, labels)

            _, preds = torch.max(outputs.data, 1)
            val_correct += (preds == labels).sum().item()
            val_loss += loss.item()

    val_loss /= len(val_loader)  # Calculate average validation loss
    val_accuracy = val_correct / len(val_loader.dataset)  # Calculate validation accuracy
    val_loss_history.append(val_loss)

    print(f'Epoch {epoch + 1} validation accuracy: {val_accuracy * 100:.2f}% validation loss: {val_loss:.5f}')

In [None]:
# Plot the training and validation loss history
plt.plot(train_loss_history, label="Training Loss")
plt.plot(val_loss_history, label="Validation Loss")
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss History')
plt.show()

In [None]:
def test2():
    net2.eval()  # Set the network to evaluation mode
    test_loss = 0.0
    test_correct = 0
    with torch.no_grad():  # No need to track gradients during evaluation
        for i, data in enumerate(test_loader):
            inputs, labels = data
            
            outputs = net2(inputs)  # Forward pass
            loss = criterion(outputs, labels)  # Compute loss

            _, preds = torch.max(outputs.data, 1)  # Get the index of the max log-probability
            test_correct += (preds == labels).sum().item()
            test_loss += loss.item()

    test_loss /= len(test_loader)  # Calculate average test loss
    test_accuracy = test_correct / len(test_loader.dataset)  # Calculate test accuracy
    print(f'Test accuracy: {test_accuracy * 100:.2f}% test loss: {test_loss:.5f}')

for epoch in range(num_epochs):
    test2()


## Saving your model
Using `torch.save`, save your model for future loading.

In [None]:
torch.save(Net, "originalmodel.pth")
torch.save(Net2, "improvedmodel.pth")