A convolutional neural network uses featured maps, cooled featured maps etc and then flattens out the image (flattenend layer) and then passes it onto neurons.
To understand train and deploy our cnn we use the MNIST datset.
Each image here is 28x28 pixels, it makes it a 28x28 array --> The background is all 0.
Then for the image itself---> The darkest shad will be 1 and all the other shades will be in relation to this. This identification of the border of these images is feature mapping.
Basic skeleton of a CNN:
Input ---> (Convolution-->Pooling)n ---> Flattened layer ---> Fully connected neurons --> Output

Image filters and kernels:
A kernel is a smaller matrix that goes around the bigger image matrix of 0s and <1s performing addition or multiplication of matrices around them.
Suppose you have:
Input image: 6 × 6
Kernel: 3 × 3
Steps:
Place the kernel on the top-left of the image
Multiply overlapping values
Add them --> one number
Slide the kernel --> repeat
The result is a feature map. Each of these kernels will learn to recognise a kind of pattern.
Stride --> How many steps a kernel moves in each epoch
Padding --> Border operations: Valid (Shrinked o/p), Same (No change)
Filter --> This is an operation done on an image using an appropriate kernel.

We use cnn because neurons are locally connected making it a lot more efficient and faster. Each kernels condense to pooling that further goes to local neurons.
Each picture is a 3D tensor of: height, width and color channels i.e. 3 layers
but each RGB splits as a seperater parameter i.e. HWR, HWG, HWB

Pooling:
Downsampling a convolutional matrix even more. Methods like maxpooling and avgpooling are used.
The kernel moves along forming the convolution and the max value of each kernel is used and then pooled (DANGERRR losing data is a possibility)
Avg takes the avg of the elements in each kernel.
torch / nn → brain
datasets / DataLoader → eyes
transforms → glasses
optimizer / loss → learning
matplotlib / sklearn → self-reflection

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torchvision.utils import make_grid
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
%matplotlib inline

We need to convert MNIST 2d images into tensors of 4D for CNN operations (Number,height,width,colour)

In [2]:
data_dir = "Data/mnist"

Now, we will transform the data

In [3]:
transform = transforms.Compose([transforms.ToTensor(),
                                 transforms.Normalize((0.5,), (0.5,))])

Normalise shifts the data around 0 and then the model learns. The parameters are mean, std her the numbers are known for MNIST but it can ofc be calculated.
mean = dataset.data.float().mean() / 255
std = dataset.data.float().std() / 255
Also to normalise a pixel (x) means x(n) = (x-mean)/std
Now load test and train datasets

In [4]:
train_dataset = datasets.MNIST(root=data_dir, train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root=data_dir, train=False, transform=transform, download=True)

100%|██████████| 9.91M/9.91M [00:01<00:00, 5.02MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 130kB/s]
100%|██████████| 1.65M/1.65M [00:01<00:00, 1.24MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 14.2MB/s]


Now wrap these datasets in dataloaders. Basically this is creating those kernels so that the data will move in batches and batches only.
The batch_size is at 64 because its optimal this basically describes how many elements at once.
Shuffling is the equivalent of back propagation in a BNN, therefore we dont need it on the test dataset

In [5]:
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

Now verify it's correct loading (or) sanity check, the same can be used in a for loop but this will shorten the code and verify faster
NOTE:
DataLoader is NOT a list nor a datset it's an iterator indexing will fail here

In [6]:
images, labels = next(iter(train_loader))
print(images.shape)
print(labels.shape)

torch.Size([64, 1, 28, 28])
torch.Size([64])


Now we build the convolutional layer and the pooling layer. Layer by layer
nn.conv2d(inputsize, outputsize, kernelsize, stride) -- can also give padding here.

In [7]:
conv1 = nn.Conv2d(1, 6, 3, 1)
conv2 = nn.Conv2d(6, 16, 3, 1)

Now we can take a single mnist record and carry forward our operation

In [10]:
for i, (X_train, y_train) in enumerate(train_loader):
    break
x = X_train[0].view(1, 1, 28, 28)

Now push it through the convolution using relu or rectified linear units. We can pass it without specifying padding because in MNIST data the image core is in the middle of the images

In [11]:
x = F.relu(conv1(x))
print(x.shape)

torch.Size([1, 6, 26, 26])


After passing it through convolutional layer now we can pass it through the pooling layer after convolutions, (x,2,2) where 2,2 are kernel size and stride

In [12]:
x = F.max_pool2d(x, 2, 2)
print(x.shape)

torch.Size([1, 6, 13, 13])


Now the next convolutional layer and pooling layer because we want to build 2 of these. Make sure to use the same x variable because its all linear i.e. one after another

In [13]:
x = F.relu(conv2(x))
print(x.shape)
x = F.max_pool2d(x, 2, 2)
print(x.shape)

torch.Size([1, 16, 11, 11])
torch.Size([1, 16, 5, 5])
