# Introduction #

In the last lesson we saw that a convnet performs feature extraction through a sequence of three operations -- filter, detect, condense. In this lesson, we'll describe these operations in terms of **activations** and **weights** as you learned about in *Introduction to Deep Learning*. This will also be an introduction to **convolution**, **ReLU**, and **pooling**, which we will continue in Lesson 4.

# Filter #

Recall that a **convolutional layer** will typically carry out the filtering step. The **weights** a network learns during training are primarily contained in its convolutional layers. We can represent them as small arrays called **kernels**

<!--TODO: a kernel-->

A convolutional layer will usually contain many kernels. They are what determine the kind of filtering that occurs.

The pattern of numbers in the array defines the effect of a kernel. You can think about a kernel as a kind of polarized lens, letting through only a certain pattern of information. The kernel above will filter for vertical lines.

The **activations** in the network we call **feature maps**. They are what result when we apply a filter to an image; they are the visual features the network extracts.

The first feature maps are the color channels of the image. A grayscale image has one channel.

<!--TODO: grayscale channel-->

A color image will have three channels.

<!--TODO: color channel -->

As more extraction operations are applied, the feature maps become increasingly refined.

<!--TODO: feature maps simple to complex-->

Here are some examples of kernels applied to feature maps:

<!--TODO: kernels applied to feature maps-->

## Channels: The Depth Dimension ##

*(This section makes some terminology we'll be using a bit more precise. It's not essential to anything that follows, so feel free to skip it if you want. The important thing is just to understand that a convolutional layer contains many kernels.)*

When an image first enters a network, it exists as a set of channels. We usually think about the channels as being the *depth* dimension, with the width and height as the two spatial dimensions. In the language of TensorFlow, an image is a tensor with shape `[height, width, channels]`.

<!--TODO: shape of an image-->

A convolutional layer will apply a kernel to each channel in the input, and the collection of these kernels is what we call a **filter**. One filter produces one feature map.

<!--TODO: filter to feature map-->

A convolutional layer may produce many feature maps. So, if a convolutional layer producing 16 feature maps is applied to an image with 3 channels, it will contain 16*3=48 kernels.

# Detect #

After filtering, the feature maps pass through the activation function. The **ReLU activation** has a graph like this:

<!--TODO: ReLU -->

(*ReLU* stands for *Rectified Linear Unit*.)

You could think about the activation function as normalizing the pixel values according to some measure of importance. The ReLU function says that negative values are not important and so sets them to 0. ("Everything unimportant is equally unimportant.")

Like other activation functions used in neural networks, the ReLU function is *nonlinear*. Essentially this means that the total effect of all the layers in the network is different than what we would get by just adding the effects together, becoming, in effect, a network with one layer only.

The ReLU function ensures that only pixels with positive activation remain in the feature map. This is desireable because we don't want any negative activations destroying the features we detect deeper in the network, which is what would happen if we simply added them together.

Here is ReLU applied to some feature maps. Notice how it succeeds at isolating the feature of interest.

# Condense #

Notice that after applying the ReLU function the feature map ends up with a lot of "dead space," that is, large areas containing only 0's.

<!--TODO: feature map with 0s-->

Carrying these 0 activations through the entire network would be very wasteful of memory resources. Instead, we would like to **condense** the feature map to retain only the actual feature.

This is what the pooling operation does, in particular, the **maximum pooling** operation. Max pooling will take a block of activations in the original feature map and replace them with the maximum activation in that block.

<!--TODO: max pooling-->

Altogether, the extraction operation looks like this:

# Example - Apply Operations #

Let's apply these operations to an image to get a feel for what they do.

Here is the image we'll use for this example:

In [None]:
# img =

We'll start by importing the extraction operations from the Keras backend. These are the primitive functions the layers of the network apply to their inputs.

In [2]:
from tensorflow.keras.backend import conv2d, relu, pool2d

For the filtering step, we'll define a kernel and then apply it to the image by convolution. The kernel in this case is an "edge detection" kernel.

In [None]:
import tensorflow as tf

kernel = tf.constant([[-1, -1, -1],
                      [-1,  8, -1],
                      [-1, -1, -1]])

img = conv2d(img, kernel=kernel)

display(img)

Next is the detection step with the ReLU function.

In [None]:
img = relu(img)

display(img)

And last is condensing with maximum pooling.

In [None]:
img = pool2d(img, pool_mode='max')

display(img)

# Example - Visualize Activations #

It can be instructive to look at the activations an image produces in a network throughout its layers. We've included a function in a utility script that will plot them for you. Let's look at some activations in the network from Lesson 2 (using the same image as before).

In [None]:
from visiontools import plot_activations

plot_activations(img, model=convolutional_classifier)

You can see how the features extracted become more and more refined as the activations flow deeper into the network.

# Conclusion #