# Part 3

## Lecture 3

Deep learning uses representation learning to learn the hyperparameters which handles extraction of features and act as classifier.

<img src="./image/CNN_average.png" height="250" />

A convolution that performs neighbourhood average.
- Pixels are lost at the border
    - Solution: add padding
        - If kernel is even, it has no center, thus uneven padding
        - Ex. 100x100 px image -> **5x5 filter without padding**
            - Lose 2 px on all sides.
                - **784 border pxls lost**.

<img src="./image/Kernel_size.png" height="250" />

**Small kernel**
- more focus in back

**Big kernel**
- more focus in front

**Spatial pooling**
- Images are down-sampled 
    - faster to compute
    - the same kernel can detect larger features in the next layer

**Feed forward network vs Convolutional network**
A feed forward Toeplitz matrix (diagonal-constant matrix) is same as convolution
- Which is a sparse matrix, localized and shares parameters
- CNN is thus limited parameter version of FFN
    - increase receptive field _**linearly**_
    - A convolution can be written as **matrix multiplication**
- Less parameters is a good thing because Curse of dimensionality

**Equivariance**:
If input shits left, out also does
- $f()$ is equivariant to $g()$: $f(g(x))$ = $g(f(x))$
    - so if $f()$ was a concolution and $g()$ a translation, it wouldn't matter which order it was done.
- Camera position is accidental, objects may appear anywhere
    - Add prior knowledge (convolution) to deep nets
        - saves params and compute

<img src="./image/Pooling.png" height="150" />

**Pooling**:
- Summaries outcome over a region
- increase receptive field _**multiplicatively**_
- Is (approximately) invariant to local translations
- **Feature presence** is _more_ important then feature location
- Reduce memory usage is advantage of pooling
- Sub-sampling (pooling) allows to quickly 'see' more of the image.

<img src="./image/Stride.png" height="150" />

**Stride**:
- kernel will move (stride) pixels per time during a convolution
- Convolution increases receptive field (RF) linearly, pooling increases RF multiplicatively
- Ex:
    - two size 3 convolutions leads to 7 pixels **receptive field**.



<img src="./image/Network_with_CNN.png" height="250" />

Below we perform two convolutions with subsampling. Shown in image above is an example of such network.

<img src="./image/receptive_field_calc.jpg" height="250" />

Above, is a calculation of the receptive field for given example, which is 16x16

## [Assignment 3: Convolution](https://colab.research.google.com/drive/154rVpzuR3MlEUIRIYmYlZ3uy7Jczv0CX)

A Linear (fully connected) layer treats pixels **far apart and close equally**. A CNN has 
- only a **receptive field**
    - region hidden neuron can observe. 
- only learns the weights connected to that region.
- Has shared weights.

<img src="./image/Hidden_neuron_receptive_field.png" height="250" />

Properties:
- **Stride**: pixels between each receptive field pixel
- **Padding**: prevent becoming smaller
- **Kernel**: receptive field size
- **Pooling**: simplify information CNN output; used to reduce the dimensions of the feature maps, thus reducing nr of parameters to learn & computation.
    - max pooling: condense information by taking maximum value  Usage:e.g. dark background images
    - average pooling: condense information by taking average value Usage: cannot deal with sharp features
    - L2 pooling: condense information by taking square root of sum of squares. Usage: smooth features whilst retain property of max pooling

Network:
- training? net.train
- evaluating? net.eval

<img src="./image/shallow_cnn_ex.jpeg" height="250" />

In [None]:
import torch
import torch.nn as nn
from torchinfo import summary

batch_size = 16
width_image = 32
heigh_image = 32
layers_image = 3 # Example: 3 -> RGB
input_image = (width_image, heigh_image, layers_image)
wh_1 = 5 # 1st layer: (Width, Height)
f_1 = 5 # 1st layer: nr of filters
wh_2 = 3 # 1st layer: (Width, Height)
f_2 = 2 # 1st layer: nr of filters
bias=False

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(layers_image, f_1, wh_1, bias=bias)
        self.conv2 = nn.Conv2d(f_1, f_2, wh_2, bias=bias)

    def forward(self, x):
        return self.conv2(self.conv1(x))


model_ouput = summary(
    Net(), 
    (batch_size, layers_image, heigh_image, width_image),
    verbose=2,
    col_width=16,
    col_names=["kernel_size", "input_size", "output_size", "num_params"],)

Layer (type:depth-idx)                   Kernel Shape     Input Shape      Output Shape     Param #
├─Conv2d: 1-1                            [3, 5, 5, 5]     [16, 3, 32, 32]  [16, 5, 28, 28]  375
├─Conv2d: 1-2                            [5, 2, 3, 3]     [16, 5, 28, 28]  [16, 2, 26, 26]  90
Total params: 465
Trainable params: 465
Non-trainable params: 0
Total mult-adds (M): 0.35
Input size (MB): 0.19
Forward/backward pass size (MB): 0.64
Params size (MB): 0.00
Estimated Total Size (MB): 0.83


### Example but with stride set to 2.
When padding is not added with kernel of size k:
- k = odd: k // 2 = px removed each side
- k = even: k // 2 = px removed left and top, (k // 2) - 1 = px removed right and bottom (or reverse)
Stride (for odd example) is taken into account after previous pixels are removed:
- ( height - (2 * (k // 2) )) / stride = new height
- ( width - (2 * (k // 2) )) / stride = new width

In [None]:
import torch
import torch.nn as nn
from torchinfo import summary

batch_size = 16
width_image = 226
heigh_image = 226
layers_image = 1 # Example: 3 -> RGB
input_image = (width_image, heigh_image, layers_image)
wh_1 = 7 # 1st layer: (Width, Height)
f_1 = 7 # 1st layer: nr of filters
stride = 2
bias=False

model_ouput = summary(
    nn.Conv2d(layers_image, f_1, wh_1, stride=stride, bias=bias), 
    (batch_size, layers_image, heigh_image, width_image),
    verbose=2,
    col_width=16,
    col_names=["kernel_size", "input_size", "output_size", "num_params"],)

Layer (type:depth-idx)                   Kernel Shape     Input Shape      Output Shape     Param #
└─Conv2d: 0-1                            [1, 7, 7, 7]     [16, 1, 226, 226] [16, 7, 110, 110] 343
Total params: 343
Trainable params: 343
Non-trainable params: 0
Total mult-adds (M): 66.40
Input size (MB): 3.27
Forward/backward pass size (MB): 10.84
Params size (MB): 0.00
Estimated Total Size (MB): 14.11


### Example but with stride set to 3.
Below we use max pooling, stride 3, causes a decrease of size by 3.

In [None]:
import torch
import torch.nn as nn
from torchinfo import summary

batch_size = 16
width_image = 200
heigh_image = 200
layers_image = 1
input_image = (width_image, heigh_image, layers_image)
k = 2 # kernel 1st layer: (Width, Height)
stride = 3
padding = 1 # k // 2
bias=False

model_ouput = summary(
    nn.MaxPool2d((k, k), padding=padding, stride=stride), 
    (batch_size, layers_image, heigh_image, width_image),
    verbose=2,
    col_width=16,
    col_names=["kernel_size", "input_size", "output_size", "num_params"],)

Layer (type:depth-idx)                   Kernel Shape     Input Shape      Output Shape     Param #
└─MaxPool2d: 0-1                         --               [16, 1, 200, 200] [16, 1, 67, 67]  --
Total params: 0
Trainable params: 0
Non-trainable params: 0
Total mult-adds (M): 0.00
Input size (MB): 2.56
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 2.56


## Convolution as a matrix multiplication
- We make use of a Toeplitz matrix with \[-1, 1, -1] kernel
- We have:
    - Kernel of 2
    - Padding of 1

In [None]:
input_tensor = torch.tensor([30,20,25,32,60,90,19])

conv = torch.tensor([[-1., 0., 0., 0., 0.],
        [1., -1., 0., 0., 0.],
        [-1., 1., -1., 0., 0.],
        [0.,-1., 1., -1., 0.],
        [0., 0.,-1., 1., -1.],
        [0., 0., 0., -1., 1.],
        [0., 0., 0., 0., -1.]])

convolve = torch.matmul(input_tensor, conv.long())
convolve = convolve.view(1, convolve.shape[0]).float()

print(convolve)

# Perform max pooling
mp = nn.MaxPool1d(2, padding=1)

# batch size
print(mp(convolve))

tensor([[-35., -27., -53., -62.,  11.]])
tensor([[-35., -27.,  11.]])


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=de0be7a9-29e1-4ab6-9ce7-607fa646094e' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>