# Introduction #

In the last lesson, we learned how the activations in the network represent features and how the weights defined filters. We saw how these features developed as the array of activations flowed through a convolutional block.

Convolution and pooling -- implemented in the layers of the network -- are what give a convnet its characteristic structure. Our goal in this lesson is to understand *why* this structure produces features favorable to solving the image classification problem. We want to understand why convolution and pooling are so good at visual feature extraction.

# Moving Windows #
 
Convolution and pooling both make use of a device called a **moving window**.

You can think about a moving window as a box that scans over an image -- left to right, top to bottom -- summarizing pixel values along the way.

For one-dimensional data (like a sequence or time series), it looks like this:

<!--TODO: 1D moving window-->

For two-dimensional data (like images), it looks like this:

<!--TODO: 2D moving window-->

A moving window is the right sort of device to use for feature extraction because it makes use of the *position* of the pixels. A computation like the mean of all the pixel values would be less informative because it ignores location.

(In fact, their use of moving windows is one thing that makes convnets useful on other kinds of ordered data, like time series or natural language texts.)

The moving window computation gives convolutional layers an advantage over dense layers, which also use positional information. In natural images, information tends to be highly *local*: groups of pixels close together will tend to contain more information than pixels far apart. This is what makes convolutional layers much more efficient than dense layers at feature extraction.

## Stride ##

The distance the window moves at each step is called the **stride**.

A convolutional layer will usually use a stride of 1, so that the windows overlap from step to step.

<!--TODO: convolution stride-->

Typically, a convolutional layer will also use **zero padding** so that the output has the same dimensions as the input. Otherwise, the image would lose a pixel on each side after the convolution.

<!--TODO: padding -->

A pooling layer will usually set its stride so that there is no overlap in windows from step to step. In other words, it will set the stride equal to its window size.

<!--TODO: pooling stride-->

Whenever the stride is greater than 1, the each dimension of the output will be reduced by that factor. For instance, a stride of 2 will reduce each dimension by a factor of 2, for a total reduction in pixels by a factor of 4.

<!--TODO: dimension reduction-->

This is how the pooling layer is able to condense features.

## Example ##

Let's observe the effect of using different strides and widow sizes.

# Translation Invariance #

The pooling operation gives a convnet a property called **translation invariance**. This just means that it tends not to distinguish features by their *location* in the image.

<!-- TODO: position of features in an image -->

A convnet with translation invariance will treat these the same.

Why is that? Watch what happens when we repeatedly apply maximum pooling to the following feature maps.

<!-- TODO: repeated maxpool-->

The two circles differed only by their position. Maximum pooling caused them to lose that information, which means the network would treat the original images as being the same.

This is a nice property to have for a convnet used as a classifier and it is why pooling layers are often used in such networks. For a different problem, it might not be appropriate.

## Example ##

Let's see what happens when we repeatedly apply maximum pooling to a feature map.

# Conclusion #

The properties of convolution and pooling we looked at in this tutorial mean that the convnet will tend to detect features at a small scale (like the texture of fur or the shape of an ear), but ignore features at a large scale. For image classification, this is usually a good choice.