<a href="https://colab.research.google.com/github/Angus-Eastell/Intro_to_AI/blob/main/9_1_convolutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Convolutions in PyTorch

One of the key things to understand with convolutional networks is what size you get out with particular kernel-size, strides and padding.

This notebook will challenge you to predict the sizes of tensors after various convolutional / pooling layers, and to design layers with particular output sizes.

As a reference, you can use the lecture notes, and also the PyTorch docs:
* [Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)
* [MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#maxpool2d)
* [AvgPool2d](https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html#avgpool2d)

First, lets define an input tensor, which has one channel, and has w

In [1]:
import torch as t
import torch.nn as nn

N = 1 #batch size
C = 1 #channels
H = 5 #height
W = 7 #width

x = t.randn(N, C, W, H)
x.shape

torch.Size([1, 1, 7, 5])

#### Q1

First, lets consider a "default" convolution, with `in_channels=1` and `out_channels=1` and `kernel_size=3`.  Notice that I haven't explicitly set the stride and padding.  Looking at the [Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) docs, the default has `padding=0`, `stride=1`.

In [2]:
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3)
# What is:
conv2d(x).shape
# try to answer before evaluating the code!
# if you're not sure, try drawing a picture like the ones in the notes!

torch.Size([1, 1, 5, 3])


#### Q2

Notice that in the previous question, the output was smaller than the input!

Write out a convolutional layer with padding that gives an output that's the same size as the input.

Again, if you're not sure, try drawing a picture!

In [4]:
#Replace ???? with the right amount of padding
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)
conv2d(x).shape

torch.Size([1, 1, 7, 5])

In [None]:
# @title Answer
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)
conv2d(x).shape

#### Q3

The amount of padding you need depends on the kernel size.  So how much padding do you need with `kernel_size=5`?

In [5]:
#Replace ???? with the right amount of padding
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=5, padding=2)
conv2d(x).shape

torch.Size([1, 1, 7, 5])

In [None]:
# @title Answer
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=5, padding=2)
conv2d(x).shape

#### Q4

Now lets think about strides.

In [6]:
conv2d = nn.Conv2d(in_channels=1, out_channels=1, stride=2, kernel_size=3)
# What is:
conv2d(x).shape
# try to answer before evaluating the code!

torch.Size([1, 1, 3, 2])

In [7]:
conv2d = nn.Conv2d(in_channels=1, out_channels=1, stride=2, kernel_size=3, padding=1)
# What is:
conv2d(x).shape
# try to answer before evaluating the code!

torch.Size([1, 1, 4, 3])

#### Q5

`stride`, `padding` and `kernel_size` should work the same way for pooling layers (specifically `nn.AvgPool2d` and `nn.MaxPool2d`).

Here, I confirm that in one very simple setting:

In [None]:
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=2, padding=0)
pool = nn.AvgPool2d(kernel_size=3, stride=2, padding=0)

print(conv2d(x).shape)
print(pool(x).shape)

torch.Size([1, 1, 3, 2])
torch.Size([1, 1, 3, 2])


Write more examples, to show that `Conv2d` and pooling give the same size output for:
* `padding=1`
* `kernel_size=5`
* `stride=1`

In [8]:
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=5, stride = 1, padding=1)
pool = nn.AvgPool2d(kernel_size=5, stride=1, padding=1)

print(conv2d(x).shape)
print(pool(x).shape)

torch.Size([1, 1, 5, 3])
torch.Size([1, 1, 5, 3])
