In this notebook, we will implement a Convoulutional Neural Network (CNN) using pytorch for MNIST Classification.

Expectations: Please provide solutions to the questions in the cells at the end of the notebook.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from torchvision import models,transforms
from torchvision.utils import make_grid
from torchvision.datasets import MNIST
from torch.utils.data.sampler import SubsetRandomSampler
from torch.utils.tensorboard import SummaryWriter
from torchsummary import summary
from torch.utils.data import Dataset, DataLoader

We will be using MNIST Disgits datasets. The MNIST Digits dataset consists of 70000 28x28 grayscale images of digits from 0 to 9, with 6000 images per class. There are 60000 training images and 10000 test images. <br>

Following are the some random samples from the dataset.

![MNIST Samples](https://www.yunzhew.com/project/mnist-digit-net/featured_hudee2c27f78ea2485e0d3aa44abbfc53c_218555_720x2500_fit_q75_h2_lanczos_3.webp)

We will use pytorch datasets to fetch the MNIST Digits dataset as it provides a handy way to get and use the dataset. More information about pytorch datasets [here](https://pytorch.org/vision/stable/datasets.html).

In [None]:
batch_sz=64 # this is batch size i.e. the number of rows in a batch of data

train_dataset = MNIST(root='./datasets', train=True, download=True, transform = transforms.ToTensor())
test_dataset = MNIST(root='./datasets', train=False, download=True, transform = transforms.ToTensor())

train_loader = DataLoader(train_dataset, batch_size = batch_sz)
test_loader = DataLoader(test_dataset, batch_size = batch_sz)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./datasets/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 104112364.92it/s]


Extracting ./datasets/MNIST/raw/train-images-idx3-ubyte.gz to ./datasets/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./datasets/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 69418735.72it/s]


Extracting ./datasets/MNIST/raw/train-labels-idx1-ubyte.gz to ./datasets/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./datasets/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 28494698.97it/s]


Extracting ./datasets/MNIST/raw/t10k-images-idx3-ubyte.gz to ./datasets/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./datasets/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 22333562.45it/s]


Extracting ./datasets/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./datasets/MNIST/raw



In [None]:
len(train_loader)

938

## Convolutional Neural Networks

Now, we will construct a Convolutional Neural Network. Convolutional neural networks are a type of neural networks which are typically applied to image data. They work by convolving a filter on an image. Filters act as weights of CNN and we learn these filters to extract useful information from an image. Filters are also sometimes called kernels.
<br>

We element-wise muliply a filter with a patch of input data and then sum the result.<br>
$$z_{ij} = W \star x_{ij} = \sum^{m-1}_{a=0}\sum^{m-1}_{b=0} W_{ab} \: x_{(i+a)(j+b)}$$
![convolution](https://upload.wikimedia.org/wikipedia/commons/1/19/2D_Convolution_Animation.gif) [source](https://commons.wikimedia.org/wiki/File:2D_Convolution_Animation.gif)

<br>

A convolutional layer in a CNN consists of a number applying a number of such filters to the input. The output of each of these filters is stacked in the form of multiple channels (just like we have 3 channesl in an RGB image). The filters then also become 3-dimensional.

![2d conv](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*oFVlkvZp848nh-QoD3pREw.png) [source](https://towardsdatascience.com/a-comprehensive-introduction-to-different-types-of-convolutions-in-deep-learning-669281e58215)

### Stride

When we are dealing with convolution on images of very large size, its not always required to convolve over each and every pixel of an image. So we can set the subsequent convolutions to be shifted by more than one pixel in either the vertical or horizontal axis. This shift in subsequent convolutions is called the stride.

![stride](https://miro.medium.com/v2/resize:fit:1400/1*BNLPHcNxLCgtwlJHnSs9oA.gif) [source](https://medium.com/swlh/convolutional-neural-networks-part-2-padding-and-strided-convolutions-c63c25026eaa)

### Pooling

The pooling operation involves sliding a two-dimensional filter over each channel of feature map and summarising the features lying within the region covered by the filter. 

Pooling layers are used to reduce the dimensions of the feature maps and summarising the featre maps.

Types of Pooling Layers:
1.   Max Pooling: <br>
Max pooling is a pooling operation that selects the maximum element from the region of the feature map covered by the filter. This is the most commonly used pooling layer.
![Max Pooling](https://media.geeksforgeeks.org/wp-content/uploads/20190721025744/Screenshot-2019-07-21-at-2.57.13-AM.png) [source](https://www.geeksforgeeks.org/cnn-introduction-to-pooling-layer/)


2.   Average Pooling: <br>
Average pooling computes the average of the elements present in the region of feature map covered by the filter.
![Avg Pooling](https://media.geeksforgeeks.org/wp-content/uploads/20190721030705/Screenshot-2019-07-21-at-3.05.56-AM.png) [source](https://www.geeksforgeeks.org/cnn-introduction-to-pooling-layer/)



### Convolutional Neural Network
A common CNN model architecture is to have a number of convolution and pooling layers stacked one after the other.
![cnn](https://indiantechwarrior.com/wp-content/uploads/2021/04/LeNet-1024x393.png) [source](https://indiantechwarrior.com/convolutional-neural-network-architecture/)



We will use the categorical crossentropy loss here which is traditionally used for classification.

$$\mathcal{L} = \frac{1}{N} \sum^{N}_{i=1} \left ( \sum^{C}_{j=1} -y_{i,j}\: log(\hat{y}_{i,j}) \right )$$

where $y_i,j$ is the groundtruth, $\hat{y}_{i,j}$ is the prediction, $C$ is the number classes and $N$ is the number of data samples.

In [None]:
#Q1: define a 2 layer simple NN for mnist digit classificaiton

In [None]:
#Q2: Define a CNN with 2 conv layer and 2 linear layers for mnist digit classification

In [None]:
#Q3: Train both networks for 10 epochs and compare their performance 

In [None]:
#Q4: Compare the accuarcay of both networks on the test set

In [None]:
#Q5: go through the testset and plot some samples of incorrect results

In [None]:
#Q6: show the output of the intermediate layers

In [None]:
#Q7: Compare the time of training on CPU and GPU