As we generalized in Section 6.2, assuming that the input shape is  𝑛ℎ×𝑛𝑤  and the convolution kernel shape is  𝑘ℎ×𝑘𝑤 , then the output shape will be  (𝑛ℎ−𝑘ℎ+1)×(𝑛𝑤−𝑘𝑤+1) .

In several cases, we incorporate techniques, including padding and strided convolutions, that affect the size of the output. As motivation, note that since kernels generally have width and height greater than  1 , after applying many successive convolutions, we tend to wind up with outputs that are considerably smaller than our input. If we start with a  240×240  pixel image,  10  layers of  5×5  convolutions reduce the image to  200×200  pixels, slicing off  30%  of the image and with it obliterating any interesting information on the boundaries of the original image. Padding is the most popular tool for handling this issue.



In general, if we add a total of  𝑝ℎ  rows of padding (roughly half on top and half on bottom) and a total of  𝑝𝑤  columns of padding (roughly half on the left and half on the right), the output shape will be

(6.3.1)¶

(𝑛ℎ−𝑘ℎ+𝑝ℎ+1)×(𝑛𝑤−𝑘𝑤+𝑝𝑤+1).

# Padding

In [2]:
import torch 
from torch import nn

def comp_conv2d(conv2d, X):
    # (1,1) Indicates that the batch size and the number of channels are both 1
    X = X.reshape((1,1)+X.shape)
    Y = conv2d(X)
    return Y.reshape(Y.shape[2:])

conv2d = nn.Conv2d(1,1,kernel_size=3, padding=1)
X = torch.rand(size=(8,8))
comp_conv2d(conv2d, X).shape

[W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.


torch.Size([8, 8])

When the height and width of the convolution kernel are different, we can make the output and input have the same height and width by setting different padding numbers for height and width.

In [3]:
conv2d = nn.Conv2d(1,1,kernel_size=(5,3), padding=(2,1))
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

# Stride

![%E1%84%89%E1%85%B3%E1%84%8F%E1%85%B3%E1%84%85%E1%85%B5%E1%86%AB%E1%84%89%E1%85%A3%E1%86%BA%202021-09-10%20%E1%84%8B%E1%85%A9%E1%84%8C%E1%85%A5%E1%86%AB%209.57.14.png](attachment:%E1%84%89%E1%85%B3%E1%84%8F%E1%85%B3%E1%84%85%E1%85%B5%E1%86%AB%E1%84%89%E1%85%A3%E1%86%BA%202021-09-10%20%E1%84%8B%E1%85%A9%E1%84%8C%E1%85%A5%E1%86%AB%209.57.14.png)

By formula, We can get the shape of output

n_h = height of input
k_h = height of kernel
p_h = height of padding
s_h = height of stride

w corresponds to width

In [4]:
conv2d = nn.Conv2d(1,1,kernel_size=(3,5), padding=(0,1), stride=(3,4))
comp_conv2d(conv2d, X).shape

torch.Size([2, 2])