# Padding and Stride

Padding and Stride offer us more options to customize the output of our convolutions. As a motivation, note that since kernels generally have width and height greater than 1, after applying many sucessive convolutions, we tend to wind up with outputs that are considerably smaller than our input. If we start with a 240 x 240 pixel image, 10 layers of 5 x 5 convolutions reduce the image to 200 x 200 pixels, slicing off 30% of the image with it obliterating any interesting information on the boundaries of the original image. __Padding__ is the most popular tool for handlign this issue. In other cases, we may want to reduce the dimensionality drastically, e.g., if we fidn the original input resolution to be unwieldly. __Strided Convolutions__ are a popular technique that can help in these instances. 

# Padding

Since we typically use small kernels, for any given convolution, we might only lose a few pixels, but this can add up as we apply many sucessive convolutional layers. One straightforward soltion to this problem is to add extra pixels of filler around the boundary of our input image, thus increasing the effective size of the image. Typically, we set the values of the extra pixel to zero. 

### Mathematical Relationship between the padding and the output or the feauture map


In In general, if we add a total of ph rows of padding (roughly half on top and half on bottom) and a total of pw columns of padding (roughly half on the left and half on the right), the output shape will be:

            (nh - kh + ph + 1) x (nw - kw + pw + 1)

In many cases, we will want to set ph = kh -1 and pw = kw - 1 to give the input and output the same height and width. This will make it easier to predict the output shape of each layer when constructing the network. 



### NOTE: 
CNNs commonly use convolution kernels with odd height and width values, such as 1, 3, 5, or 7. Choosing odd kernel sizes has the benefit that we can preserve the dimensionality while padding with the same number of rows on top and bottom, and the same number of columns on left and right.

In [2]:
import torch
from torch import nn
# We define a helper function to calculate convolutions. It initializes the
# convolutional layer weights and performs corresponding dimensionality
# elevations and reductions on the input and output
def comp_conv2d(conv2d, X):
    # (1, 1) indicates that batch size and the number of channels are both 1
    X = X.reshape((1, 1) + X.shape)
    Y = conv2d(X)
    # Strip the first two dimensions: examples and channels
    return Y.reshape(Y.shape[2:])

# 1 row and column is padded on either side, so a total of 2 rows or columns
# are added
conv2d = nn.LazyConv2d(1, kernel_size=3, padding=1)
X = torch.rand(size=(8, 8))
comp_conv2d(conv2d, X).shape



torch.Size([8, 8])

## Stride

When computing the cross-correlation, we start with the convolution window at the upper-left corner of the input tensor, and then slide it over all locations both down and to the right. In the previous examples, we defaulted to sliding one element at a time. However, sometimes, either for computational efficiency or because we wish to downsample, we move our window more than one element at a time, skipping the intermediate locations. This is particularly useful if the convolution kernel is large since it captures a large area of the underlying image.

We refer to the number of rows and columns traversed per slide as stride. So far, we have used strides of 1, both for height and width. Sometimes, we may want to use a larger stride.

In [3]:
conv2d = nn.LazyConv2d(1, kernel_size=3, padding=1, stride=2)
comp_conv2d(conv2d, X).shape

torch.Size([4, 4])

In [4]:
conv2d = nn.LazyConv2d(1, kernel_size=(3, 5), padding=(0, 1), stride=(3, 4))
comp_conv2d(conv2d, X).shape

torch.Size([2, 2])