<!--TITLE: Moving Windows-->
# Introduction #

Last lesson, we looked at the first two operations in the feature extraction process: filtering with a convolution and detection with the ReLU activation. We're going to continue our discussion in this lesson by looking at a few more parameters affecting convolutional layers: **stride** and **padding**. With the addition of these parameters, defining a convolutional layer becomes:

In [None]:
import tensorflow.keras as keras
import tensorflow.keras.layers as layers

model = keras.Sequential([
    layers.Conv2D(filters=64,
                  kernel_size=(3, 3),
                  stride=(1, 1),
                  padding='same'), # activation is None
    # More layers follow
])

# The Moving Window #

The stride and pooling parameters are related to a kind of computation called a **moving window**. The maximum pooling layer we mentioned in the last lesson also uses a moving window computation. It is these moving windows that give convnets most of their characteristic structure. By putting the kernel into motion we get the complete convolution computation:

<!--TODO: 2D moving window-->
<figure>
<img src="" width=400 alt="A 2D moving window.">
</figure>

By scanning over an image with a kernel in this way, the convolution extracts a kind of "local summary" of the image data. How fine or how coarse this summary is depends on the size of the kernel.

<!--TODO: local summary-->
<figure>
<img src="" width=400 alt="Features produced by a 3x3 kernel and a 9x9 kernel, respectively.">
</figure>

Small kernels filter for features on a small scale. Large kernels filter for features on a large scale.

# Stride #

The distance the window moves at each step is called the **stride**. We need to specify the stride in both dimensions of the image: one for moving left to right and one for moving top to bottom.

<!--TODO: stride-->
<figure>
<img src="" width=400 alt="(1, 1) stride compared to (2, 2) stride.">
</figure>

In TensorFlow, image data has shape `(height, width)`. So in the figure above, the window on the left would have `stride=(1, 1)`, and the window on the right would have `stride=(3, 2)`.

Whenever the stride in either direction is greater than 1, the moving window will skip over some of the pixels in the input at each step. In other words, the stride parameter is a way for the convolution to perform *subsampling*. Smaller values for `stride` produce high resolution feature maps. Large values for `stride` produce low-resolution feature maps.

<!--TODO: resolution-->
<figure>
<img src="" width=400 alt="Large stride compared to small stride in feature maps.">
</figure>

Because we want high-quality features to use for classification, convolutional layers will most often have `stride=(1, 1)`. Increasing the stride means that we miss out on potentially valuble information in our summary. Maximum pooling layers, however, will almost always have stride values greater than 1, often `(2, 2)` or `(3, 3)`. We'll discuss the roll of pooling in the next lesson.

# Padding #

When performing the moving window computation, there is a question as to what to do at the boundaries of the input. Staying entirely inside the input means the window will never sit squarely over these boundary pixels like it does for every other pixel in the input. Since we aren't treating all the pixels exactly the same, could there be a problem?

What the convolution does with these boundary values is determined by its `padding` parameter. In TensorFlow, you have two choices: either `padding='same'` or `padding='valid'`. There are trade-offs with each.

<figure>
<img src="" width=400 alt="Illustration of valid and same padding.">
</figure>

When we set `padding='valid'`, the convolution window will stay entirely inside the input. The drawback is that this means you lose a few pixels in each dimension at the edges with the output. For modern convnets with 50+ layers, this can be a real disadvantage.

The alternative is to use `padding='same'`. The trick here is just to *pad* the edges of the input with 0's. The potential drawback here is that we're using values in the convolution that aren't in the original image. As we'll see in the next section, when followed by maximum pooling, 0-padding the input turns out not to affect the features produced by the convolution. For this reason, `padding='same'` tends to be the most common choice.

# Example - Stride #

In this example, we'll see how the stride affects the resolution of the features a convolutional layer will produce. Instead of applying the functions one-by-one (like in the last lesson), we'll use a helper function we've included to just see the result.

We'll continue with the kernel and image from Lesson 2.

In [None]:
#$HIDE_INPUT$
import tensorflow as tf
import matplotlib.pyplot as plt
plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')

# Show kernel and input image

# Edge detector
kernel = tf.constant([
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1],
])

image_path = '/kaggle/input/computer-vision-resources/car_feature.jpg'
image = tf.io.read_file(image_path)
image = tf.io.decode_jpeg(image)

plt.figure(figsize=(12, 8))
gs = gridspec.GridSpec(1, 2, width_ratios=[0.75, 1], wspace=0.2) 
plt.subplot(gs[0])
visiontools.show_kernel(kernel)
plt.title("Kernel")
plt.subplot(gs[1])
plt.imshow(tf.squeeze(image), cmap='gray')
plt.axis('off')
plt.title("Input")
plt.show();

# Conclusion #

In fact, the moving windows we looked at in this tutorial are also widely used in models for time series and natural language. 

The key points are:
- the size of the kernel determines the size of the feature extracted: `kernel_size=3` is a good default
- the stride determines the resolution of the feature: for convolution, stick with the default value, `stride=1`
- the padding method determines what happens with boundary pixels: when followed by a maximum pooling layer, `padding='same'` is a good choice