# Convolutions

In [None]:
import torch
from torch import nn

In [None]:
import  numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
from torch import tensor

from torch.utils.data import DataLoader

In the context of an image, a feature is a visually distinctive attribute. For example, the number 7 is characterized by a horizontal edge near the top of the digit, and a top-right to bottom-left diagonal edge underneath that.

It turns out that finding the edges in an image is a very common task in computer vision, and is surprisingly straightforward. To do it, we use a *convolution*. A convolution requires nothing more than multiplication, and addition.

## Creating the CNN

In [None]:
def conv(input_filters, output_filters, kernel_size=3, stride=2):
    res = nn.Conv2d(input_filters, output_filters, stride=stride, kernel_size=kernel_size, padding=kernel_size//2)
    return res

### Understanding Convolution Arithmetic

In an input of size `64x1x28x28` the axes are `batch,channel,height,width`. This is often represented as `NCHW` (where `N` refers to batch size).

We have 1 input channel, 4 output channels, and a 3×3 kernel.

In [None]:
simple_cnn = conv(1, 4)
simple_cnn

In [None]:
simple_cnn.weight.shape

In [None]:
simple_cnn.bias.shape

The *receptive field* is the area of an image that is involved in the calculation of a layer. After 3 convolutions, this is what a single pixel represents.

<img width="308" src="notebook_ims/receptive_field.png">

The blue highlighted cells are its *precedents*—that is, the cells used to calculate its value. These cells are the corresponding 3×3 area of cells from the input layer, and the cells from the filter.

In this example, we have just two convolutional layers. We can see that a 5x5 area of cells in the input layer is used to calculate the single cell in the Conv2 layer. This is the *receptive field*

The deeper we are in the network (specifically, the more stride-2 convs we have before a layer), the larger the receptive field for an activation in that layer.

## Color Images

A colour picture is a rank-3 tensor:

In [None]:
from torchvision.io import read_image

In [None]:
im = read_image('../homework/corgi.png')
im.shape

In [None]:
plt.imshow(im.permute(1,2,0))

In [None]:
_,axs = plt.subplots(1,3)
for corgi, ax, color in zip(im,axs,('Reds','Greens','Blues')):
    ax.imshow(255-corgi, cmap=color)

<img src="notebook_ims/chapter9_rgbconv.svg" id="rgbconv" caption="Convolution over an RGB image" alt="Convolution over an RGB image" width="550">

These are then all added together, to produce a single number, for each grid location, for each output feature.

<img src="notebook_ims/chapter9_rgb_conv_stack.svg" id="rgbconv2" caption="Adding the RGB filters" alt="Adding the RGB filters" width="500">

We have `ch_out` filters like this, so in the end, the result of our convolutional layer will be a batch of images with `ch_out` channels.

When using color RGB images, we will define the convolution layer with the 3 channels of the images like so:

In [None]:
simple_cnn = conv(3, 16)
simple_cnn

In this example, the 3 RGB layers of the images are all used to compute all the outputs using 16  3x3 filters.

In [None]:
simple_cnn.weight.shape