<a href="https://colab.research.google.com/github/MarinaWolters/Coding-Tracker/blob/master/R11_FCNN_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CIS-545 Recitation 9 - Neural Networks in PyTorch

There are various libraries in Python to do Deep Learning such as, [PyTorch](https://pytorch.org/docs/stable/index.html), [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/all_symbols) and [MxNet](https://mxnet.apache.org/versions/1.9.1/api). We can also use other libraries with built-in automatic differentiation such as [Jax](https://jax.readthedocs.io/en/latest/index.html) and its Deep Learning counterpart [Flax](https://flax.readthedocs.io/en/latest/)!

For the context of this course, we will be working with PyTorch - an open-source easy to use library. 

## Loading Dependecies

In [None]:
!pip install torchinfo

In [None]:
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader
from torchinfo import summary

# Setting the device to do computations on - GPU's are generally faster!
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(torch.__version__)
print(device)

## Torch Tensors and Computational Graphs

PyTorch has its own data structure to facilitate computing on CPU/GPU as well use automatic differentiation for backpropagation.

In [None]:
# Initialize a 2D torch tensor with ones of size 5 x5
ones_matrix = torch.ones((5, 5))

# Intialize a vector with ones of size 5 x 1
ones_vector = torch.ones((5, 1))


In [None]:
# Matrix-vector multiplication with PyTorch
torch.matmul(ones_matrix, ones_vector)

In [None]:
ones_matrix @ ones_vector

For people familiar with NumPy, there are similar functions that operate on torch tensors, to almost every function in NumPy that operate on NumPy N-d arrays.

However, apart from GPU support PyTorch also offers something else that is extremely important in designing Neural Networks.

Let's consider a 5 x 5 - matrix $A$ and a 5 x 1 - vector $b$. We will compute $l$ as given:

\begin{gather*}
  z = A \cdot b \\
  l = \sum_{i=1}^{5} z_{i}
\end{gather*}

We can now actually calculate gradients:

\begin{gather*}
  \left( \frac{\partial l}{\partial b} \right)_{i} = \sum_{j=1}^{5} A_{ji}
\end{gather*}

In [None]:
tensor_A = torch.rand((5, 5))
tensor_A.requires_grad = True
tensor_b = torch.rand((5, 1))
tensor_b.requires_grad = True

l_var = torch.sum(tensor_A @ tensor_b) 

In [None]:
l_var.backward()

In [None]:
# Torch autograd
tensor_b.grad

In [None]:
# Expected gradient
torch.sum(tensor_A, axis=0).detach().reshape((-1, 1))

## Load the Dataset

### PyTorch supports a lot of inbuilt datasets to play around with. In this recitation, we will be working with the famous [MNIST](https://en.wikipedia.org/wiki/MNIST_database#:~:text=The%20MNIST%20database%20(Modified%20National,the%20field%20of%20machine%20learning.) dataset.


Documentation to load the MNIST dataset from torchvision can be found [here](https://pytorch.org/vision/stable/datasets.html).

<img src='https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png'>

First, we load the dataset from torchvision.

In [None]:
mnist_dataset = torchvision.datasets.MNIST('./', download=True, train=True)

## Visualising the dataset


In [None]:
# Look at all the class labels
mnist_dataset.classes

In [None]:
# Looking at the images
idx = 1000

datapoint = mnist_dataset[idx]
datapoint[0].resize([128, 128]).show()
print('Label:', datapoint[1])

In [None]:
# Size and shape of our training data
mnist_dataset.data.shape

## Transforms and Dataloaders

Torchvision Transforms are a neat way to preprocess image data before passing it to our neural networks. We will be using transforms for normalizing our data and converting them to torch tensors.

However, transforms are often also used for data augmentation in PyTorch. Something that you can find very useful while training Neural Networks in practice, such as in your final projects!

You can find the Torchvsion documentation on transforms [here](https://pytorch.org/vision/0.9/transforms.html).

In [None]:
# We could see what our data mean and variance are, and scale accordingly 
# - However we don't really need to worry too much about this here! Any idea why?

transforms = transforms.Compose([# Fill transforms, which ones do we need?
                                 ])

In [None]:
# Load these datasets as we did earlier, but do remember to add the transforms above
train_data = 
test_data = 

Dataloaders facilitate how our network consumes the data while training. Often optimsation algorithms such as Mini-batch Gradient Descent in Deep Learning, train data in batches. We can also choose whether to shuffle the data on each iteration, 

In [None]:
# Batch-size - a hyperparameter
batch = 64

# Use DataLoader class to make dataloader objects that we will use to train our models
# use the batch size given above, and set the shuffle paramater
train_loader = 
test_loader = 

## A basic fully-connected Neural Network

To define models in PyTorch we make use of the torch.nn module. It has all the common layers and activation functions built in. 

In [None]:
class FCNN(nn.Module):
    def __init__(self):
        super().__init__()

        # To flatten your images as vectors so that FCNN can read them
        self.flatten = nn.Flatten()

        # We will be using activation functions
        self.relu = nn.ReLU()
        
        # Hidden and output layers

        # Add a the first input linear layer, what is the number of input dimensions? 
        # Set the number of output dimensions as 256
        self.input = 

        # Add a the second linear layer, what is the number of input dimensions? 
        # Set the number of output dimensions as 64
        self.hidden2 = 

        # Add a the final output layer, what is the number of input dimensions? 
        # What is the number of output dimensions for this layer?
        self.output = 

    def forward(self, x):
        outputs = nn.Sequential(self.flatten, self.hidden1, self.relu, self.hidden2, self.relu, self.output)(x)
        return outputs

In [None]:
# Initialize model and check out its summary
model_fcnn = FCNN().to(device)
summary(model_fcnn, (batch, 1, 28, 28), device=device)

Training the model:

In order to train the model we need to decide on a loss function and set an optimizer. For a multi-class classification, we typically use the [Cross-Entropy loss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html).

In [None]:
criterion = 

Setting the Optimizer: We will use [Stochastic Gradient Descent](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) Optimizer to optimize our loss function while training. 

In [None]:
# Learning rate - another hyperparameter

lr_fcnn = 0.001
optimizer_fcnn = 

Training loop:

In [None]:
num_epochs_fcnn = 10 # hyperparameter
losses_fcnn = []
accuracies_fcnn = []
train_size = len(train_data)

for epoch in range(num_epochs_fcnn): # Dataloader loop
  total_loss = 0 
  correct = 0
  for inputs, labels in train_loader:
    inputs = inputs.to(device) 
    labels = labels.to(device)

    # Apply our model to get outputs
    outputs = model_fcnn(input)

    # Apply criterion to get loss
    loss = criterion(outputs, labels)

    preds = torch.argmax(outputs, axis=1) # Predictions from the neural network output
    correct += (preds == labels).sum().item()
    total_loss += loss.item() * len(labels)

    # Extremely important! - to clear out gradients from previous iterations
    optimizer_fcnn.zero_grad()
    loss.backward() # Calculate all the gradients
    optimizer_fcnn.step() # Take an update step

  total_loss = total_loss / train_size
  accuracy = correct / train_size
  losses_fcnn.append(total_loss)
  accuracies_fcnn.append(accuracy)
  print("Epoch:", epoch+1, ", Loss:", round(total_loss, 4), ", Training Accuracy:", round(accuracy, 3))

In [None]:
plt.plot(range(1, num_epochs_fcnn + 1), losses_fcnn)
plt.title('Learning Curve')
plt.xlabel('Epochs')
plt.ylabel('Training Loss')

In [None]:
plt.plot(accuracies_fcnn)

In [None]:
# Testing accuracy
test_size = len(test_data)
correct = 0

for inputs, labels in test_loader:
  # Set inputs and outputs to the correct device
  inputs = inputs.to(device) 
  labels = labels.to(device)

  # Get model outputs
  outputs = model_fcnn(input)

  # Calculate accuracy
  # preds = torch.argmax(outputs, axis=1) # Predictions from the neural network output
  # correct += (preds == labels).sum().item()
  
  
  

test_accuracy = 

In [None]:
print(test_accuracy)


## Convolutional Neural Networks

Convolutional Neural Networks work by applying convolutional filters to the images. The output dimension of these layers depend on the various parameters such as filter size, padding and stride.

In each dimension of the image i.e. x or y:

\begin{gather*}
  o_{x} = \left \lfloor \frac{i_{x} + 2p_{x} - f_{x}}{s_{x}} \right \rfloor + 1
\end{gather*}

Similarly,

\begin{gather*}
  o_{y} = \left \lfloor \frac{i_{y} + 2p_{y} - f_{y}}{s_{y}} \right \rfloor + 1
\end{gather*}

Normally, we work with odd-sized, square-shaped kernels and have the padding as well as stride to have equal values in both dimensions.

The depth of the output convolutional block, i.e. number of channels is given by the number of filters used in the previous convolutional layer.

## Defining a CNN model on PyTorch

You will need the documentation for [2D-convolutional layers](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html).

In [None]:
class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        
        # Activation function
        self.relu = nn.ReLU()

        # Convolutional layer-1, how many input channels? (was 28 x 28 x 1)
        # Use 16, 5x5 kernels, with same padding, and stride of 1
        self.conv1 = 
        
        # Pooling layer, use a 2x2 pooling layer with a stride of 2 (before 28 x 28 x 16)
        self.maxpool = 

        # Output size: 16 x 14 x 14

        # Convolutional layer-2: use 32, 3x3 filters, same convolution, stride of 1
        self.conv2 = 

        # Maxpool
        # Output size: 32 x 7 x 7
        
        # Fully connected layers

        # Flatten your output
        self.flatten = 

        # A final output layer, how many input and output dimensions should it have?
        self.output = 
        
    
    def forward(self, x):
        outputs = nn.Sequential(self.conv1, self.relu, self.maxpool,
                                # Fill the rest of the layers here
                                )(x)
        return outputs

In [None]:
# Initialize our CNN
cnn = CNN().to(device)
summary(cnn, (batch, 1, 28, 28))

## Train the CNN

In [None]:
# Sending the data to device (CPU or GPU)
lr_cnn = 1e-4
criterion_cnn = 
optimizer_cnn = 

In [None]:
num_epochs_fcnn = 10 # hyperparameter
losses_cnn = []
accuracies_cnn = []
train_size = len(train_data)

for epoch in range(num_epochs_fcnn):
  total_loss = 0
  correct = 0

  for inputs, labels in train_loader:
    # Get model outputs and predictions

    # Update steps for Gradient Descent
    
  total_loss = 
  accuracy = 
  losses_cnn.append(total_loss)
  accuracies_cnn.append(accuracy)
  print("Epoch:", epoch+1, ", Loss:", round(total_loss, 4), ", Training Accuracy:", round(accuracy, 3))

In [None]:
plt.plot(losses_cnn)

In [None]:
plt.plot(accuracies_cnn)

## Calculating the Accuracy

In [None]:
# Testing accuracy
test_size = len(test_data)
correct = 0

for inputs, labels in test_loader:
  # Set inputs and outputs to the correct device

  # Get model outputs
  outputs = 

  # Calculate accuracy
  

test_accuracy = 

In [None]:
print(test_accuracy)

## Pre-defined Models in TorchVision

Torchvision provides predefined model architectures such as VGG-16, Resnet34, Resnet50 etc. You can use these model architectures and train them on new data or you can even use pretrained weights to do tasks on similar data. 

You can also use the pretrained models as templates and tweak them for your problem. This is called as transfer learning. It is extremely useful when we have limited data but also helps train models faster.

[Here](https://github.com/vsa1920/Facial-Recognition-with-Masks-using-One-shot-learning) is an example of it from my CIS-581 project, in case you are interested in learning more on how to use pretrained models and weights. 