<!--TITLE: Moving Windows-->
# Introduction #

In the previous two lessons, we learned about the three operations that carry out feature extraction from an image:
1. *filter* with a **convolution** layer
2. *detect* with **ReLU** activation
3. *condense* with a **maximum pooling** layer

The convolution and pooling operations share a common feature: they are both performed over a **moving window**. With convolution, this "window" is given by the dimensions of the kernel, the parameter `kernel_size`. With pooling, it is the pooling window, given by `pool_size`.

<figure>
<img src="https://i.imgur.com/LueNK6b.gif" width=400 alt="A 2D moving window.">
</figure>

There are two additional parameters affecting both convolution and pooling layers -- these are the `strides` of the window and whether to use `padding` at the image edges. The `strides` parameter says how far the window should move at each step, and the `padding` parameter describes how we handle the pixels at the edges of the input.

With these two parameters, defining the two layers becomes:

In [None]:
import tensorflow.keras as keras
import tensorflow.keras.layers as layers

model = keras.Sequential([
    layers.Conv2D(filters=64,
                  kernel_size=3,
                  strides=1,
                  padding='same',
                  activation='relu'),
    layers.MaxPool2D(pool_size=2,
                     strides=1,
                     padding='same')
    # More layers follow
])

# Stride #

The distance the window moves at each step is called the **stride**. We need to specify the stride in both dimensions of the image: one for moving left to right and one for moving top to bottom. This animation shows `strides=(2, 2)`, a movement of 2 pixels each step.

<figure>
<img src="https://i.imgur.com/Tlptsvt.gif" width=400 alt="Moving window with a stride of (2, 2).">
</figure>

What effect does the stride have? Whenever the stride in either direction is greater than 1, the moving window will skip over some of the pixels in the input at each step.

Because we want high-quality features to use for classification, convolutional layers will most often have `strides=(1, 1)`. Increasing the stride means that we miss out on potentially valuble information in our summary. Maximum pooling layers, however, will almost always have stride values greater than 1, like `(2, 2)` or `(3, 3)`, but not larger than the window itself.

Finally, note that when the value of the `strides` is the same number in both directions, you only need to set that number; for instance, instead of `strides=(2, 2)`, you could use `strides=2` for the parameter setting.

# Padding #

When performing the moving window computation, there is a question as to what to do at the boundaries of the input. Staying entirely inside the input image means the window will never sit squarely over these boundary pixels like it does for every other pixel in the input. Since we aren't treating all the pixels exactly the same, could there be a problem?

What the convolution does with these boundary values is determined by its `padding` parameter. In TensorFlow, you have two choices: either `padding='same'` or `padding='valid'`. There are trade-offs with each.

When we set `padding='valid'`, the convolution window will stay entirely inside the input. The drawback is that the output shrinks (loses pixels), and shrinks more for larger kernels. This will limit the number of layers the network can contain, especially when inputs are small in size.

The alternative is to use `padding='same'`. The trick here is to **pad** the input with 0's around its borders, using just enough 0's to make the size of the output the *same* as the size of the input. This can have the effect however of diluting the influence of pixels at the borders. The animation below shows a moving window with `'same'` padding.

<figure>
<img src="https://i.imgur.com/RvGM2xb.gif" width=400 alt="Illustration of zero (same) padding.">
</figure>

The VGG model we've been looking at uses `same` padding for all of its convolutional layers. Most modern convnets will use some combination of the two. (Another parameter to tune!)

# Example - Exploring Moving Windows #

Modern convnet architectures have used a variety...

We'll look at AlexNet and VGG16 here, and you'll have a chance to explore ResNet50 and LeNet in the exercises.

For this example, we'll just use a single kernel -- a horizontal line detector. This next hidden cell sets everything up and loads the image we'll use.

In [None]:
#$HIDE_INPUT$
import tensorflow as tf
import matplotlib.pyplot as plt
from matplotlib import gridspec
import visiontools
import warnings

plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')
warnings.filterwarnings("ignore") # to clean up output cells

# Define kernel
kernel = visiontools.bottom_sobel

# Define input images
SIZE = [64, 64]
circle = visiontools.circle(SIZE, val=1.0, r_shrink=3)
circle = tf.reshape(circle, [*circle.shape, 1])

IMAGE_PATH = '/kaggle/input/computer-vision-resources/car_feature.jpg'
SIZE = [300, 300]
car = visiontools.read_image(IMAGE_PATH)
car = tf.image.resize(car, size=SIZE, method='nearest')
car = tf.image.convert_image_dtype(car, dtype=tf.float32)

# Show kernel, circle, car
plt.figure(figsize=(16, 8))
gs = gridspec.GridSpec(1, 3, width_ratios=[0.75, 1, 1], wspace=0.2) 
plt.subplot(gs[0])
visiontools.show_kernel(kernel)
plt.subplot(gs[1])
plt.imshow(tf.squeeze(circle))
plt.axis('off')
plt.subplot(gs[2])
plt.imshow(tf.squeeze(car), cmap='gray')
plt.axis('off')
plt.show();

## Explore Pooling Size ##

We've been using a pooling window with dimensions `(2, 2)`. Let's see what happens when we enlarge it to `(4, 4)`. The stride is the same as before, so now the pooling windows overlap.

In [None]:
visiontools.show_extraction(
    car, subplot_shape=(1, 2), figsize=(8, 3),
    kernel=kernel,
    pool_size=(4, 4),
    pool_stride=(2, 2),
    ops=['detect', 'condense'],
)

## Explore Pooling Stride ##

Now let's increase the stride to `(4, 4)` but keep the window to `(2, 2)`. This means the pooling window will skip over 2 pixels from step-to-step.

In [None]:
visiontools.show_extraction(
    circle, subplot_shape=(1, 2), figsize=(8, 3),
    kernel=kernel,
    pool_size=(2, 2),
    pool_stride=(4, 4),
    ops=['detect', 'condense'],
)

# Conclusion #

In this lesson, we looked at a characteristic computation common to both convolution and pooling: the **moving window** and the parameters affecting its behavior in these layers. This style of windowed computation contributes much of what is characteristic of convolutional networks and is an essential part of their functioning. Move on to the exercise where you'll explore the effect of `strides` and `padding`, and also learn about how *stacking* convolutional layers can increase the effective window size, and also about how convolution can be used with *one-dimensional* data, like time series.