# Introduction to Pytorch Convolutional neural network

Welcome, In this workshop you will learn how to build convolutional neural networks using Pytorch.


You've already learned how to create a neural networks with pytorch, after mastering the basics of AI you will have the chance to dig deeper in the subject with convolutions and probabilistic prediction.

Fully connected neural networks aren't the answer to every problem. Some probleme like object detection have low success rate when using fully connected neural networks, most object detection AI uses convolutional neural networks to solve this problem.

In this exercise you will learn what are convolutional neural networks and how to create them. The end goal of the workshop is to create a AI to identify numbers.


The first step will be to install and import the library.

In [None]:
!pip3 install torch
!pip3 install torchvision

In [None]:
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt

from torchvision import transforms

### Getting the data for the AI

Before actually learning about convolution, let's start by creating a fully connected neural network to identify numbers.\
This will be useful to compare the efficiency of both AI.


Let's start by downloading our dataset

In [None]:
train_set = torchvision.datasets.MNIST(
  root="./data/MNIST",
  train=True,
  download=True,
  transform=transforms.Compose([
    transforms.ToTensor()
  ])
)

test_set = torchvision.datasets.MNIST(
  root="./data/MNIST",
  train=False,
  download=True,
  transform=transforms.Compose([
    transforms.ToTensor()
  ])
)

Let's take a look at our sample

In [None]:
fig = plt.figure()
for i in range(6):
  plt.subplot(2,3,i+1)
  plt.tight_layout()
  plt.imshow(train_set[i][0].reshape(28, 28, 1), cmap='gray', interpolation='none')
  plt.xticks([])
  plt.yticks([])
print(fig)

In [None]:
image, label = train_set[0]
print("total images :", len(train_set)) # pixels value
print("shape :", image.shape) # pixels value
print("label :", label) # Number represented in the image

As you can see, we have images 28 pixels high and 28 pixels wide, with one channel. A colored has 3 channel for each primary color (RGB: Red, Green, and Blue).

These images represent a number from 0 to 9, we have 10 different labels (or 10 different possible output). The first picture represent a 5, thus its label is 5.

### Using Batch

60000 is a lot of images to process one by one, to make it easier to process this data by our model while training we are going to use `batch`

`Batch` is a hyperparameter that defines the number of samples to work through before updating the internal model parameters. In other words, before calculating the error and apply backpropagation after each image, if our batch size is 64 we will go through 64 images before doing it. This improves the learning rate of our AI by applying the backpropagation on the error average.

In [None]:
train_loader = torch.utils.data.DataLoader(train_set, batch_size=64)

print("nuber of batches :", len(train_loader))

batch = next(iter(train_loader)) #take the first batch
images, labels = batch
print("shape :", images.shape)
print("labels :", labels.shape)


We now have 938 'blocks' containing 64 image each (and their equivalent labels). This will drasticlly decrease our training time because with one pass, 64 images are processed.

Pytorch is built to be used with batch, it is thus quite simple to implement it in our code.

Here is a simple example:

In [None]:
class MyNetwork(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = torch.nn.Linear(1, 10)
        self.linear2 = torch.nn.Linear(10, 1)
        
    def forward(self, x):
        x = torch.nn.functional.relu(self.linear1(x))
        x = torch.nn.functional.relu(self.linear2(x))
        return x

model = MyNetwork()
input = torch.ones((64, 1), dtype=torch.float)

mse = torch.nn.MSELoss() # Loss function
expected = torch.ones((64, 1), dtype=torch.float)

output = model.forward(input)
loss = mse(output, expected)

print("Input  :", input.shape)
print("Output :", output.shape)
print("loss :", output.shape)

As you can see, the model accepts and return the data and prediction in batch. This also applies in every loss function, we can pass the batch of the predictions and the batch of expected values. The final loss will be the average of each loss in the batch.

### Probabilistic Prediction

A key difference is going to be the output of our model. In this exercise, we are using a probabilistic approach. This means that are model will output a "probality" for each labels. We have 10 labels, our model should output 10 values. The label with the highest value will be the label predicted by the model.

The mean squared error isn't meant to deal with this type of output. To calculate the loss we are gonna need the cross entropy loss

In [None]:
loss_fonct = torch.nn.CrossEntropyLoss() # Loss function

output = torch.rand((64, 10), dtype=torch.float) # theoretical output of the model
expected = torch.ones((64), dtype=torch.long) # expected labels

loss_fonct(output, expected)

Cross entropy loss takes the output of the model and the expected label. You don't need to calculate the label of the prediction.

### Neural Network with Probabilistic Prediction

**Exercise :**\
With all this information, it's your turn to build a simple model (without convolution) to identify the number in the given images:

In [None]:
class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # code
        
    def forward(self, x):
        # code
        pass

In [None]:
EPOCH = 2

model = MyModel()
loss_fonct = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

for epoch in range(EPOCH): #training
    for batch in train_loader:
        # code
        pass


total, correct = 0, 0
for image, label in test_set: #testing
    output = model.forward(image.reshape(1, 28 * 28))
    if (output.argmax(dim=1).item() == label):
        correct += 1
    total += 1

print("Accuracy:", correct / total)

**Expected :**\
`An accuracy above 75% is acceptable`

### Introduction to Convolution

Now that we built a fully connected model to identify numbers in images. We will try to do the same but with a convolution model.

Convolution can be discribe by taken a filter (kernel) and applying it to a given image, this might not make sense but let me illustrate it:

<div>
    <center>
    <img src="./.img/conv.gif" width="600" style="padding-left: 20px;"/>
    </center>
</div>

In this exemple we have our image (5 x 5), and a filter (3 x 3).
We take the first (3 x 3) square in our image, and multiply it by our filter:\
`7 * 1 + 2 * 0 + 3 * -1 +`\
`4 * 1 + 5 * 0 + 3 * -1 +`\
`3 * 1 + 3 * 0 + 2 * -1 = 6`\
After that we move to the right by 1 column, and repeat. Once we arrive to the far right, we go down one row and start from the left.

This is what happens when a image is passed through a convolution layer.
Here are concrete examples:

<div>
    <center>
    <img src="./.img/conv_exemple.png" width="600" style="padding-left: 20px;"/>
    </center>
</div>

In a fully connected network, the weights and bias are changed to improve the prediction. In convolutional network, the filter are changed in each backward pass.

Each convolution layer has its own filter, with multiple parameters such as:\
`Kernel size:` size of the filter(kernel)\
`stride:`the amount of column / row we move to the right / bottom\
`padding:` column / row of zeros added to the edge of the image

<div>
    <center>
    <img src="./.img/params_exemple.png" width="600" style="padding-left: 20px;"/>
    </center>
</div>

The shape of th eoutput can be calculated with the following formula:

## CNN output size formula (square)
- we have an $n * n$ input
- we have an $f * f$ filter
- we have a padding $p$
- we have a filter $f$
- we have an output size $O$

## $O = \frac{n - f +2p}{s} + 1$

## CNN output size formula (non square)
- we have an $Nh * Nw$ input
- we have an $Fh * Fw$ filter
- we have a padding $p$
- we have a filter $f$
- we have an output size $O$

## $Oh = \frac{Nh - Fh +2p}{s} + 1$
## $Ow = \frac{Nw - Fw +2p}{s} + 1$

Now that you know have convolutions work, let's try to use them

**Exercise :**\
By changing the convolution parameters, try to match the expected output size

In [None]:
image = torch.randn(1, 1, 28, 28)
filter = torch.randn(1, 1, 3, 3)

out_feat_F = F.conv2d()# code
out_feat_F.shape

**Expected :** `torch.Size([1, 1, 26, 26])`

In [None]:
image = torch.randn(1, 1, 28, 28)
filter = torch.randn(1, 1, 3, 3)

out_feat_F = F.conv2d()# code
out_feat_F.shape

**Expected :** `torch.Size([1, 1, 28, 28])`

In [None]:
image = torch.randn(1, 1, 28, 28)
filter = torch.randn(1, 1, 2, 4)

out_feat_F = F.conv2d()# code
out_feat_F.shape

**Expected :** `torch.Size([1, 1, 15, 14])`

### Introduction to Pooling layer

Multiple convolution layer can be added one after the other but it can be useful to add a `Pooling` after some convolution layers.

A limiting factor of convolutional layers is that they record the precise position of features (objects you're trying to detect) in the input. This means that small movements in the position of the feature in the input image will result in a different prediction. This can happen with re-cropping, rotation, shifting, and other minor changes to the input image

A common approach to addressing this problem from signal processing is called down sampling. This is where a lower resolution version of an input signal is created that still contains the large or important structural elements, without the fine detail that may not be as useful to the task. This can be achieved by using `Pooling Layer`

The most common pooling layer are `Max Pooling` and `Average Pooling`.

They work similarly to simple convolution layers but instead of being multiplied by a kernel.

Here is an example of bot pooling layer with a (2 x 2) filter, and a stride of 2.

<div>
    <center>
    <img src="./.img/pooling.png" width="400" style="padding-left: 20px;"/>
    </center>
</div>

`Max pooling` will take the pixel with the highest value in the kernel.\
`Average pooling` will take the average of each pixel in the kernel.


Convolution and Poling layer are implemented to highlight features in an image. These highlighted features must then go through fully connected layers to actually have an output in the end. After your convolution and pooling layer, you must flatten your data, to pass it into fully connected layer(s) to output 10 labels.

<div>
    <center>
    <img src="./.img/full_model.png" width="800" style="padding-left: 20px;"/>
    </center>
</div>

**Exercise :**\
With all of this knowledge, it's your turn to create a convolution model.

### Your turn

In [None]:
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        # code
        
    def forward(self, x):
        # code
        pass

In [None]:
EPOCH = 5

model = MyModel()
loss_fonct = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

for epoch in range(EPOCH): #training
    for batch in train_loader:
        # code
        pass

total, correct = 0, 0
for image, label in test_set: #testing
    output = model.forward(image.reshape(1, 1, 28, 28))
    if (output.argmax(dim=1).item() == label):
        correct += 1
    total += 1
print("Accuracy:", correct / total)

**Expected :**\
`An accuracy above 85% is acceptable`

# Congratz

Congratulations for having reached the end of this workshop!\
You have been able to create your own convolutional neural network using Pytorch.

See you for the next topic!