## Pooling 
- pooling layer is used to reduce the spatial dimensions of the input volume. This is because as layers get deeper we get more and more parameters and the network becomes more complex and computationally expensive.
- similar to cross-correlation but no learnable parameters / kernel
- Max and Average pooling are the most common pooling techniques.
  - Max pooling: takes the maximum value from the window of the input tensor. preffered over average pooling as it retains the most important features. (legit just get max value from the window)
  - Average pooling: takes the average value from the window of the input tensor. (akin to downsampling)

In [1]:
import torch
from torch import nn
from d2l import torch as d2l

In [6]:
# pooling. similar to cross-corelation but no K needed
def pool2d(X: torch.Tensor, pool_size: tuple, mode='max'):
    p_h, p_w = pool_size
    Y = torch.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            if mode == 'max':
                Y[i, j] = X[i: i + p_h, j: j + p_w].max()
            elif mode == 'avg':
                Y[i, j] = X[i: i + p_h, j: j + p_w].mean()
    return Y

In [8]:
X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
print(pool2d(X, (2, 2)))
print(pool2d(X, (2, 2), "avg"))


tensor([[4., 5.],
        [7., 8.]])
tensor([[2., 3.],
        [5., 6.]])


In [9]:

X = torch.arange(16, dtype=torch.float32).reshape((1, 1, 4, 4))
X

tensor([[[[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.]]]])

In [10]:
# padding and stride - deep learning framewroks default to match pooling window sized and stride
# eg windoe (3x3) will give stride (3x3)
pool2d = nn.MaxPool2d(3)
# Pooling has no model parameters, hence it needs no initialization
pool2d(X)


tensor([[[[10.]]]])

In [11]:
# can be overriden
pool2d = nn.MaxPool2d(3, padding=1, stride=2)
pool2d(X)


tensor([[[[ 5.,  7.],
          [13., 15.]]]])

In [12]:
pool2d = nn.MaxPool2d((2, 3), stride=(2, 3), padding=(0, 1))
pool2d(X)

tensor([[[[ 5.,  7.],
          [13., 15.]]]])

In [13]:
# multi-channel - we do each input layer separately
X = torch.cat((X, X + 1), 1)
X

tensor([[[[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.]],

         [[ 1.,  2.,  3.,  4.],
          [ 5.,  6.,  7.,  8.],
          [ 9., 10., 11., 12.],
          [13., 14., 15., 16.]]]])

In [14]:
pool2d = nn.MaxPool2d(3, padding=1, stride=2)
pool2d(X)

tensor([[[[ 5.,  7.],
          [13., 15.]],

         [[ 6.,  8.],
          [14., 16.]]]])