In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("D8.ipynb")

# Discussion 8: Neural Networks


Neural networks are a type of machine learning model that is loosely inspired by the structure of the human brain. They consist of interconnected nodes, or "neurons," organized into layers. Each neuron receives input from the neurons in the previous layer, processes that input, and then sends output to the neurons in the next layer.

In a supervised machine learning context, the goal of a neural network is to learn a function that maps inputs to outputs based on a set of labeled training examples. During training, the network adjusts the weights on the connections between neurons in order to minimize the difference between its predicted outputs and the true outputs for the training examples.

There are many different types of neural networks, each with its own architecture and training algorithm. Some common types include feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). Each type is suited to different types of problems and data.

Neural networks are a powerful tool for machine learning, and have been used successfully in many applications, including image and speech recognition, natural language processing, and more. However, they can also be computationally intensive and require careful tuning to achieve good performance.


## Instructions: 

In this lab, you will practice implementing a simple neural network from scratch using **Pytorch**. 


Please note that there are **no hidden tests** for this discussion lab. In other words, if you pass the public test cases, you will gain full points for the questions. 

Read the markdown cells carefully, as they'll provide hints towards writing your solutions. For each question, you'll be expected to import the required packages and implement based upon the description. All necessary information will be provided, so again, read carefully!



## Task Description: Classifying digits (0-9) using a feedforward Neural Network

In this lab, we will focus on creating a simple **feed-forward classification neural network** to classify handwritten digits between 0-9 from the MNIST dataset into their respective classes. 


A feed-forward neural network is a classification algorithm that consists of a large number of perceptrons, organized in layers & each unit in the layer is connected with all the units or neurons present in the previous layer. These connections are not all equal and can differ in strengths or weights. The weights on these connections cipher the knowledge of the network.

When the data enters at the inputs and passes through the network, layer by layer, there is no feedback in between the layers until it arrives at the outputs. This is the reason why they are known as a feedforward neural network.

Before, we get started we need to install **Pytorch**. 

About Pytorch - https://pytorch.org/

Pytorch is an open-source machine learning and deep learning framework widely used in applications such as natural language processing, image classification and computer vision applications. It was developed by Facebook’s AI Research and later adapted by several conglomerates such as  Uber, Twitter, Salesforce, and NVIDIA. 

PyTorch comes with several specially developed modules like torchtext,  torchvision and other classes such as torch.nn, torch.optim, Dataset, and Dataloader to help you create and train neural networks to work with a different machine and deep learning areas. 


In [None]:
# !pip install torch==1.12.0
# !pip install torchvision

**Note: Please restart your kernel after pip installing the above packages. Click kernel > Restart.** 


Then run the following import cell. 

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
%config InlineBackend.figure_format='retina'

Cool! Now that we imported torch and torchvision, let's focus on loading our dataset. 

### About the Dataset


Torchvision provides many built-in datasets in the torchvision.datasets module, as well as utility classes for building your own datasets. - https://pytorch.org/vision/stable/datasets.html


We'll be using one of these built-in datasets for our classification task. 



The **MNIST** dataset, also known as the Modified National Institute of Standards and Technology dataset, consists of 60,000 small square 28×28 grayscale images of handwritten digits between 0 to 9 divided into ten different classes. This dataset is mainly used for text classification using deep learning models.

The MNIST database contains **60,000 training images** and **10,000 testing images**. 


### Load the MNIST dataset from Pytorch

The following cell transforms the dataset into a Pytorch friendly format. 

In [None]:
from torch.utils.data import Subset

# Read in train_data and test_data from built-in MNIST dataset, 
# transform into Pytorch tensor format

train_data = torchvision.datasets.MNIST(
    root='data',
    train=True,
    transform=transforms.ToTensor(),
    download=True
)

test_data = torchvision.datasets.MNIST(
    root='data',
    train=False,
    transform=transforms.ToTensor(),
    download=True
)

**Question 1:** Store the size of the our training dataset and testing dataset in the following variables.

In [None]:
train_size = ...
test_size = ...

In [None]:
grader.check("Q1.1 dataset_size")

Since the entire dataset is around 70k images - which will take a TON of time to train on a CPU, we will take the first 10% of images for our train and test set. Don't worry if you don't fully understand the details - just run the code provided! If you are courious about the purpose and the function of each line, feel free to post a Piazza post.

In [None]:
# Since the entire dataset is around 60k images 
# - which will take a TON of time to train on a CPU
# We subset the entire set - and pick the first 10% for train and test. 
train_dataset = Subset(train_data, indices=range(len(train_data) // 10))
test_dataset = Subset(train_data, indices=range(len(test_data) // 10))

PyTorch's dataloader takes a dataset object as input, which is responsible for loading and returning individual data samples. The dataloader then takes care of batching, shuffling, and multiprocessing the data samples, making it easy to feed them into a deep learning model.

In [None]:
# Just run this cell to use Pytorch dataloader to load train and test sets 
# Note that we specify batch size = 100. 
# This means that we will have 60 batches in total 
# - and each batch contains 100 images for the train set 

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=100, 
                                           shuffle=True)

# Note that we specify batch size = 100. 
# This means that we will have 10 batches in total 
# - and each batch contains 100 images for the test set 
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=100, 
                                          shuffle=False)

In [None]:
# A quick check to ensure our train_loader loaded our 6000 train images properly 
print('For the train set:')
print('Total number of batches:', len(train_loader))
print('Number of images in each batch in train set:', train_loader.batch_size)
print('Total number of images in train set:', len(train_loader.dataset))
print()


# A quick check to ensure our test_loader loaded our 1000 test images properly 
print('For the test set:')
print('Total number of batches:', len(test_loader))
print('Number of images in each batch in test set:', test_loader.batch_size)
print('Total number of images in test set:', len(test_loader.dataset))

Great! Now that we have successfully loaded our train and test sets, let's take a quick look at what our test set images look like. 


Please run the following code cell to take a look at the 10 random images in our test set. 

The ground truth labels are displayed in blue as the title of the plots. 

In [None]:
examples = iter(test_loader)
example_data, example_targets = next(examples)


params = {"text.color" : "blue",
          "xtick.color" : "black",
          "ytick.color" : "black"}
plt.rcParams.update(params)


import numpy as np
indices = np.random.randint(0, len(test_loader), size=10)


fig, axs = plt.subplots(2, 5, figsize=(10, 5))
axs = axs.flatten()
examples = iter(test_loader)

for i, index in enumerate(indices):
    # Get the image and ground truth label

    example_data, example_targets = next(examples)
    image, label = example_data[index][0], example_targets[index].item()

    # Plot the image with its ground truth
    axs[i].imshow(image.reshape(28, 28), cmap='gray')
    axs[i].set_title(f'GT: {label}')
    axs[i].axis('off')

plt.tight_layout()
plt.show()

Now, let's focus on building our fully connected neural network that will classify these test images into one of 10 different classes, i.e the digits (0-9). 

## Creating our Fully Connected Network with one hidden layer

We will first define our hyperparameters for our neural network. 

**Question 2:** Given the hand written digit as the input to our model, what should be the input size of our Fully Connected Network?

In [None]:
examples = iter(test_loader)
example_data, example_targets = next(examples)
image, label = example_data[index][0], example_targets[index].item()

# Take a look at our input: image
print(image)

# Based on that input, what should be our network's input size?
input_size = ...

In [None]:
grader.check("Q1.2 Input Size")

- We then set our hidden layer size to 500 units.

- num_classes is set to 10 since this is a multiclass classification problem with 10 classes. 

- As mentioned before, our batch_size is set to 100. 

- Set learning rate of network to 0.001

In [None]:
# Our hidden layer will have input size 500. 
hidden_size = 500 

# num_classes = 10, since we want to classify digits into one of 10 classes 
num_classes = 10

num_epochs = 3

batch_size = 100

learning_rate = 0.001

## Question 1:  Linear Layers 

The code cell below defines a fully connected neural network with a single hidden layer. Your job is to fill in the lines for the first and second linear layer. 

You can accomplish this using the **nn.Linear** function and setting the appropriate input and output sizes for each layer - https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

**Hint**: Use the hyperparameters we defined above! 




In [None]:
# Fully connected neural network with one hidden layer

# The neural network is defined as a class called NeuralNet, which inherits from the nn.Module class in PyTorch. 
# This allows the network to take advantage of the built-in functionality of PyTorch for training and optimization.

class NeuralNet(nn.Module):
    
    # initializes the neural network and sets its parameters. 
    # It takes three arguments - input_size, hidden_size, and num_classes 
    # input_size - the size of the input layer, 
    # hidden_size - the number of neurons in the hidden layer, 
    # num_classes - the number of output classes
    
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.input_size = input_size

        # Your task: Create the first linear layer using nn.Linear 
        # Hint: Think about the input size of the first layer. 
        # Since the hidden layer is next, what should the output size of this first linear layer be?
        self.l1 = ...
        
        
        # a Rectified Linear Unit (ReLU) activation function applied to first layer 
        self.relu = nn.ReLU()
        
        # Your task: Create the second linear layer using nn.Linear 
        # Hint: Think about the input size of the second layer (This layer is connected to the hidden layer!)
        # This layer produces the final output of the network so what should the output size be?
        self.l2 = ...
 

    # defines how the input data is processed through the neural network. 
    # connect each layer together as following:  l1 -> relu -> l2
    def forward(self, x):
        ...
        return out
    

# Create an instance of NeuralNet and store it in model
model = NeuralNet(input_size, hidden_size, num_classes)

In [None]:
grader.check("Q1.3 Linear Layers")

Great! Now that we've created an instance of NeuralNet called model, let's take a look at our model architecture. 

Run the following code cell provided. 

In [None]:
model

What do you observe? Does your network have two linear layers? Make sure your network architecture is defined correctly before moving on to the next part. 


## Testing the network - before Training 


Let's compare the accuracy of our network on the test images **before** and **after** training. 

We expect the network to have really poor accuracy (since all the weights are randomly initialized, and no learning or weight updates have occurred). 


Run the following code cell to see the accuracy of our un-trained network on 1000 test images. 

In [None]:
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    n_correct = 0
    n_samples = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28)
        outputs = model(images)
        # max returns (value ,index)
        _, predicted = torch.max(outputs.data, 1)
        n_samples += labels.size(0)
        n_correct += (predicted == labels).sum().item()

    acc = 100.0 * n_correct / n_samples
    print(f'Accuracy of the network on the 1000 test images: {acc} %')

As expected, our network has really poor accuracy! Let's take a quick look at our predictions on the images. 


In [None]:
# Plot random 10 images from MNIST test set with ground truth and predicted label 

import numpy as np
indices = np.random.randint(0, len(test_loader), size=10)


fig, axs = plt.subplots(2, 5, figsize=(10, 5))
axs = axs.flatten()
examples = iter(test_loader)

for i, index in enumerate(indices):
    # Get the image and ground truth label

    example_data, example_targets = next(examples)
    image, label = example_data[index][0], example_targets[index].item()

    # Make a prediction with the model
    with torch.no_grad():
        image = image.reshape(-1, 28*28)
        prediction = model(image)
        predicted_label = torch.argmax(prediction, dim=1).item()

    # Plot the image with its ground truth and predicted labels
    axs[i].imshow(image.reshape(28, 28), cmap='gray')
    axs[i].set_title(f'GT: {label}, Pred: {predicted_label}')
    axs[i].axis('off')

plt.tight_layout()
plt.show()

You'll notice that the network does a REALLY bad job at classifying the digits - often predicting the same digit for many images. These are **RANDOM and incorrect**. 

Since the weights of the network are initialized randomly when the network is created, the output of the network will also be random. 
Without training the network, the weights remain unchanged, and the network has no ability to recognize patterns in the input data. 
Therefore, the network has not learned any meaningful patterns in the data, and will likely make random predictions for each input image.

## Question 2: Training the network


In the following code cells, we will set up the training loop for the network. 

Your task will be to fill in the loss function within the loop which is used to calculate the error between the predicted output and the actual labels.

In [None]:
# Specify loss function
loss_func = nn.CrossEntropyLoss()

# Specify optimization algorithm to be used 
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [None]:
n_total_batches = len(train_loader)
print('Total Batches in train set:', n_total_batches)
losses = []

# Outer loop - runs over number of epochs 
for epoch in range(num_epochs):
    # Loop over each image and label 
    for i, (images, labels) in enumerate(train_loader):  
        
        # origin shape: [100, 1, 28, 28]
        # resized: [100, 784] to be able to pass into network
        images = images.reshape(-1, 28*28)

        # Your task: Fill in the forward pass
        # Forward pass - pass input image through network 
        outputs = ...
        
        # Your task: Fill in the loss function 
        # that calculates the error between the predicted output and the actual labels
        # Hint: We already defined this above. Think about what arguments a loss function should take. 
        loss = ...
 
        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        losses.append(loss.item())

        if (i+1) % 10 == 0:
            print (f'Epoch [{epoch+1}/{num_epochs}], Batch [{i+1}/{n_total_batches}], Loss: {loss.item():.4f}')

In [None]:
grader.check("Q2.1 Training Loop")

Great! We have successfully trained our network. Since we're only training this network for 2 epochs, you may see a fluctuation in the loss values per batch. As a whole however, the loss will generally decrease, as you train the network for a higher number of epochs. 

## Testing the network - After Training 


Let's compare the accuracy of our network on the test images now **after** training. 

We expect the network to have improved accuracy (since all the weights have been updated during the training process and we expect the network to have learned to recognize visual features to distinguish between input images.)


Run the following code cell to see the accuracy of our **trained network** on 1000 test images. 

In [None]:
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    n_correct = 0
    n_samples = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28)
        outputs = model(images)
        # max returns (value ,index)
        _, predicted = torch.max(outputs.data, 1)
        n_samples += labels.size(0)
        n_correct += (predicted == labels).sum().item()

    acc = 100.0 * n_correct / n_samples
    print(f'Accuracy of the network on the 1000 test images: {acc} %')

Whew, that's a huge jump in accuracy! Seems like the weights in our network have been adjusted to minimize error between predicted and output labels, and our network has learned to accurately recognize and classify digits in the test images. 


Let's compare the ground truth and predictions for 10 random images in our test set. 

In [None]:
# Plot random 10 images from MNIST test set with ground truth and predicted label 

import numpy as np
indices = np.random.randint(0, len(test_loader), size=10)


fig, axs = plt.subplots(2, 5, figsize=(10, 5))
axs = axs.flatten()
examples = iter(test_loader)

for i, index in enumerate(indices):
    # Get the image and ground truth label

    example_data, example_targets = next(examples)
    image, label = example_data[index][0], example_targets[index].item()

    # Make a prediction with the model
    with torch.no_grad():
        image = image.reshape(-1, 28*28)

        prediction = model(image)
        predicted_label = torch.argmax(prediction, dim=1).item()

    # Plot the image with its ground truth and predicted labels
    axs[i].imshow(image.reshape(28, 28), cmap='gray')
    axs[i].set_title(f'GT: {label}, Pred: {predicted_label}')
    axs[i].axis('off')

plt.tight_layout()
plt.show()

We can see that the predictions now match our ground truth for most images. 


It's important to note that the performance of the network can depend on several factors, such as the network architecture, hyperparameters, and the size and complexity of the dataset. Additionally as with other supervised ML algorithms, the network may not be able to generalize well to new, unseen data if it was overfitted on the training data.





Congratulations! You have successfully trained your first neural network and used it to classify 10000 images from MNIST. 

# Submission Guidelines
## DO NOT USE `shutil`. Please Directly Submit this `D8.ipynb`

Have a look back over your answers, and also make sure to `Restart & Run All` from the kernel menu to double check that everything is working properly. This restarts everything and runs your code from top to bottom.

When you are ready to submit your assignment, you can click `Validate` at the top. Note that in some assignments the code will take too long to run and validation may fail. Validation is just a final check that all the asserts are passing without failing.

Once you're happy with your work, click the disk icon to save, and submit the zip file onto gradescope. **You MUST submit all the required component to receive credit.**

Note that you can submit at any time, but **we grade your most recent submission**. This means that **if you submit an updated notebook after the submission deadline, it will be marked as late**.