<a href="https://colab.research.google.com/github/broadJeff/ColabsNotebooks/blob/main/DUDL_convolution_pooling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##The notebook explains the concept of pooling in convolution

#### There are three main pooling techniques:
1. Max Pooling
2. Mean Pooling
3. Min Pooling

Pooling is a technique where we sequentially select an area of the input tensor and find the max/mean/min of all the values present in that area. After which, the window that selects the area moves to another area next to the selected area and performs the same operation (the specific number of steps that the windows moves is decide by the value of stride). Thus covering the entire tensor space. Every time the windows finds the intended value after the operation, it adds to a new tensor. Thus, leading to a creation of downsampled or low dimensional tensor

- ##### Pooling of 3D image using 2D kernel will result in a 3D output. The reason for this is that the 2D kernel will be applied to each slice of the 3rd axis of the input image and thus resulting in a 3D output

- ##### Pooling of a 3D image with a 3D kernel will result in a 2D output.

In [51]:
# Import the necessary libraries
import torch
import torch.nn as nn

In [52]:
# Create a maxpool instance

# parameters
stride = 2
kernel_size = 2

# Create an instance of maxpooling

p2 = nn.MaxPool2d(stride=stride,
                  kernel_size=kernel_size)

p3 = nn.MaxPool3d(stride=stride,
                  kernel_size=kernel_size)


# Visualize the pooling layers
print(p2)
print(p3)


MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
MaxPool3d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)


In [53]:
# Create a sample 2D and 3D image
img2 = torch.rand(1, 1, 64, 64)
img3 = torch.rand(1, 3, 64, 64)

# Apply convolutions
img2pool2 = p2(img2)
print(f"2D image 2D maxpool: {img2pool2.shape}")

# the below commented lines will give an error
# 2D image cannot be pooled by a 3D kernel
# img2pool3 = p3(img2)
# print(f"2D image 3D maxpool: {img2pool3.shape}")

img3pool2 = p2(img3)
print(f"3D image 2D maxpool: {img3pool2.shape}")

img3pool3 = p3(img3)
print(f"3D image 3D maxpool: {img3pool3.shape}")

2D image 2D maxpool: torch.Size([1, 1, 32, 32])
3D image 2D maxpool: torch.Size([1, 3, 32, 32])
3D image 3D maxpool: torch.Size([1, 1, 32, 32])


### Create a simple CNN

In [54]:
from torch.nn.modules.pooling import MaxPool2d
littlenet = nn.Sequential(
    nn.Conv2d(3, 10, 5, 3, 2),
    nn.ReLU(),
    nn.AvgPool3d(stride=3,
                 kernel_size=3),

    nn.Flatten(),
    nn.Linear(588, 1),
    nn.Sigmoid()


)

In [55]:
img = torch.rand(1, 3, 128, 128)

In [56]:
littlenet(img)

tensor([[0.5019]], grad_fn=<SigmoidBackward0>)

#### Breaking down how the above we got 556 in the linear layer

In [57]:

A = nn.Conv2d(3, 10, 5, 3, 2) # Conv2d instance
B = nn.AvgPool3d(3, 3) # Average Pool 3d instance
C = nn.Flatten() # Flatten instance

In [58]:
x = A(img) # Perform convolution
print(x.shape)

torch.Size([1, 10, 43, 43])


In [59]:
x = B(x) # Appy average 3 pooling
print(x.shape)

torch.Size([1, 3, 14, 14])


In [60]:
print(3 * 14 * 14)

588


In [61]:
x = C(x) # Flatten to multiply all dims to one dims
print(x.shape) # should be exactly same as O/P of the above cell

torch.Size([1, 588])


### Another example to test the understanding

In [93]:
# Create a 3D image
img2 = torch.rand(1, 3, 256, 256)

# Parameters to use
in_channels = 3
out_channels = 12
kernel_size = 9
stride = 1
padding = 1

# Create a conv2D instance
c2 = nn.Conv2d(in_channels = in_channels,
               out_channels = out_channels,
               kernel_size=kernel_size,
               stride=stride,
               padding=padding)

# Create average pooling instance
avp3 = nn.AvgPool3d(kernel_size=3,
                    stride=3)

# Create Flatten instance
flat =  nn.Flatten()


In [94]:
# Perform convolution
x = c2(img2)
x = nn.ReLU()(x)
print(f"The shape of the convolved image: {x.shape}")

The shape of the convolved image: torch.Size([1, 12, 250, 250])


In [95]:
# Perform average pooling
x = avp3(x) # (250 + 2 * 0 - 3)/3 + 1 = 83
print(f"The shape after performing pooling operation: {x.shape}")


The shape after performing pooling operation: torch.Size([1, 4, 83, 83])


The channel reduces 3D to 2D. In this case we have 12 channels, and hence the 3D will cause the 12 channel reduce to 4

In [81]:
# Finally flatten the tensor
x = flat(x)
print(f"The shape after flattening is: {x.shape}")

The shape after flattening is: torch.Size([1, 27556])


In [73]:
4 * 83 * 83 == x.shape[1]

True

In [86]:
import numpy as np

82.33333333333333

253