In [1]:
import torch
import torchvision as vision
import torch.nn as nn

Note the following:

1. The **depth** of every filter **must** be equal to the number of input channels
1. A filter has different values for each input channel
1. The convolution operation sums the contributions from all the channels
1. Therefore the number of output channels is exactly equal to the number of filters

In [14]:
# a and b have 1 channel and dimension 4x4
a=torch.ones([2,3,3])
a[1,:,:]=3.
b=torch.ones([2,3,3])
# stack a and b together to create a sample of size 2
x=torch.stack([a,b])

print("X's size={}".format(list(x.size())))
print("a[0]=\n{}".format(a[0].numpy()))
print("a[1]=\n{}".format(a[1].numpy()))

X's size=[2, 2, 3, 3]
a[0]=
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
a[1]=
[[3. 3. 3.]
 [3. 3. 3.]
 [3. 3. 3.]]


Recall dimensions used in the convolution operation.
Let $s,i,j,f$ be the number of samples,output height index,output width index, and the filter index respectively. The convolution operation is defined as
\begin{align*}
O_{s,f,i,j}=b_f+ \sum_c\sum_{m,n}X_{s,c,i+m,j+n}*K_{f,c,m,n}
\end{align*}

- The ```c``` dimensions is fixed by the number of channels in the input.
- In the case of ```a```, ```b```, and by extension ```x```, the # of channels is 2
- In this example we choose 3 filters, each with height and width of 2x2
- In Pytorch a convolution layer is created as follows

```nn.Conv2d(in_channels=2,out_channels=3,kernel_size=2,bias=False)```
- Note that we chose to omit the bias for simplicity

In [25]:
# the in_channels must be 2 to match # channels of a and b
l=nn.Conv2d(in_channels=2,out_channels=3,kernel_size=2,bias=False)
with torch.no_grad():
    # all values set to 1 except the second channel to 3 (for all filters)
    l.weight.fill_(1.)
    l.weight[:,1,:,:].fill_(3.)
p=l.parameters()
w=next(p)
s=w.size()
print("filter idx={},channel={},height={},width={}".format(s[0],s[1],s[2],s[3]))

filter idx=3,channel=2,height=2,width=2


In [27]:
d=l(x)
s=d.size()
print("output samples={},channels={},height={},width={}".format(s[0],s[1],s[2],s[3]))

output samples=2,channels=3,height=2,width=2


Consider the output for the first sample (the convolution of ```a```). Since all the kernels are the same we inspect one output channel. Therefore, we are considering the values in ```d[0,0,:,:]```.

```a``` has two input channels: the first has values all ones and the second all 3s.
The filter has values ones for the first channel and 3s for the second. Since the receptive field is 2x2 then the convolution would result in:
\begin{align*}
(1\times 1+1\times 1+1\times 1+1\times 1)+(3\times 3+3\times 3+3\times 3+3\times 3)\\
=40
\end{align*}

In [31]:
# convolution of a
print(d[0,0,:,:].detach().numpy())

[[40. 40.]
 [40. 40.]]
[[40. 40.]
 [40. 40.]]
[[40. 40.]
 [40. 40.]]


The output of the second sample (the convolution of ```b```) is similar
\begin{align*}
(1\times 1+1\times 1+1\times 1+1\times 1)+(1\times 3+1\times 3+1\times 3+1\times 3)\\
=16
\end{align*}

In [33]:
# convolution of b
print(d[1,0,:,:].detach().numpy())

[[16. 16.]
 [16. 16.]]


### Max pooling

- Recall that max pooling with kernel size $h\times w$ and stride $s$ computes the maximum value of the input in the window $h\times w$.
- It then "slides" that window by $s$.
- Max pooling is performed in PyTorch using ```nn.MaxPool2d```
- Example:
- Input is a single channel with size 3x3
- By default the kernel is square so specifying 2 means 2x2
- By default the stride is the same as the kernel size


In [52]:
a=torch.tensor([[1,2,3],[4,5,6],[7,8,9]],dtype=torch.float32)
b=torch.tensor([[10,11,12],[13,14,15],[16,17,18]],dtype=torch.float32)
# stack a and b together to create a sample of size 2
x=torch.stack([a,b])

# By default a square kernel so 2 is the same as 2x2
# If the stride is no specified it defaults to kernel size
maxpool=nn.MaxPool2d(2)
y=maxpool(x)
print(y.numpy())

[[[ 5.]]

 [[14.]]]


- Notice how the above has a single value. 
- Because when a stride of 2 is applied the kernel "overshoots" the input
- Therefore only a single computation is done

In [50]:
maxpool=nn.MaxPool2d(kernel_size=2,stride=1)
y=maxpool(x)
print(y)

tensor([[[ 5.,  6.],
         [ 8.,  9.]],

        [[14., 15.],
         [17., 18.]]])
