<!-- TITLE: Filter, Detect, Condense -->

# Introduction #

In the last lesson we saw that a convnet performs feature extraction through a sequence of three operations -- filter, detect, condense. In this lesson, we'll describe these operations in terms of **activations** and **weights** as you learned about in *Introduction to Deep Learning*. This will also be an introduction to **convolution**, **ReLU**, and **pooling**, which we will continue in Lesson 4.

# Filter #

Recall that a **convolutional layer** carries out the filtering step. The **weights** a convnet learns during training are primarily contained in its convolutional layers. These weights we call **kernels**. We can represent them as small arrays:

<figure>
<!-- <img src="./images/3-kernel.png" width="600" alt="A 2x2 kernel."> -->
<img src="https://i.imgur.com/Hxboo61.png" width="150" alt="A 2x2 kernel.">
</figure>

A convolutional layer will usually contain many kernels -- often hundreds or thousands. They are what determine the kind of filtering that occurs. They do this through the pattern of numbers they contain. You can think about a kernel as a kind of polarized lens, letting through only a certain pattern of information. 

<figure>
<!-- <img src="./images/3-kernel-lens.png" width="400" alt="A kernel acts as a kind of lens."> -->
<img src="https://i.imgur.com/j3lk26U.png" width="250" alt="A kernel acts as a kind of lens.">
</figure>

The **activations** in the network we call **feature maps**. They are what result when we apply a filter to an image; they are the visual features the network extracts. Here are a few kernels pictured with the feature maps they produced when applied to an image.

<figure>
<!-- <img src="./images/3-kernels-and-maps.png" width="600" alt="The channels of a color image."> -->
<img src="https://i.imgur.com/JxBwchH.png" width="800" alt="An embossing kernel and the feature map it produces.">
</figure>

# Detect #

After filtering, the feature maps pass through the activation function. The **ReLU activation** has a graph like this:

<figure>
<!-- <img src="./images/3-channels-stack.png" width="300" alt="Channels form the depth dimension."> -->
<img src="https://i.imgur.com/3Ud5xhK.png" width="300" alt="Graph of the ReLU activation function.">
</figure>

(*ReLU* stands for *Rectified Linear Unit*.)

You could think about the activation function as normalizing the pixel values according to some measure of importance. The ReLU function says that negative values are not important and so sets them to 0. ("Everything unimportant is equally unimportant.")

Like other activation functions used in neural networks, the ReLU function is *nonlinear*. Essentially this means that the total effect of all the layers in the network is different than what we would get by just adding the effects together -- which would be no different than what you would get with a single layer.

The ReLU function ensures that only pixels with positive activation remain in the feature map. This is desireable because we don't want any negative activations destroying the features we detect deeper in the network, which is what would happen if we simply added them together.

Here is ReLU applied the feature maps above. Notice how it succeeds at isolating the feature of interest.

<figure>
<!-- <img src="./images/3-relu-and-maps.png" width="800" alt="ReLU applied to feature maps."> -->
<img src="https://i.imgur.com/dKtwzPY.png" width="800" alt="ReLU applied to feature maps.">
</figure>

# Condense #

Notice that after applying the ReLU function the feature map ends up with a lot of "dead space," that is, large areas containing only 0's (the black areas in the image). Carrying these 0 activations through the entire network would unnecessarily increase the number of parameters. Instead, we would like to **condense** the feature map to retain only the information of interest -- the feature itself.

This is what the pooling operation known as **maximum pooling** does. Max pooling will take a block of activations in the original feature map and replace them with the maximum activation in that block.

<figure>
<!-- <img src="./images/3-max-pooling.png" width="600" alt="Maximum pooling replaces a block with the maximum value in that block."> -->
<img src="https://i.imgur.com/5V5z7lP.png" width="400" alt="Maximum pooling replaces a block with the maximum value in that block.">
</figure>

You can see that pooling reduces the dimensions of the image. In this sense, pooling is a kind of downsampling applied to the feature map. When applied after the ReLU activation, it has the effect of "intensifying" the feature.

<figure>
<!-- <img src="./images/3-pool-and-maps.png" width="800" alt="Maximum pooling replaces a block with the maximum value in that block."> -->
<img src="https://i.imgur.com/rl0Lejy.png" width="800" alt="Maximum pooling applied to feature maps.">
</figure>

# Example - Apply Operations #

Let's apply these operations to an image to get a feel for what they do.

Here is the image we'll use for this example:
<!-- #endregion -->

In [None]:
#$HIDE_INPUT$
import tensorflow as tf
import matplotlib.pyplot as plt
plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')

img_path = '/kaggle/input/computer-vision-resources/car_feature.jpg'
img = tf.io.read_file(img_path)
img = tf.io.decode_jpeg(img)

plt.figure(figsize=(6, 6))
plt.imshow(tf.squeeze(img), cmap='gray')
plt.axis('off')
plt.show();

We'll start by creating a simple model that performs the feature extraction. Then we'll pull out the layers so that we can apply them one by one.

In [None]:
import tensorflow.keras as keras
import tensorflow.keras.layers as layers

# Create Layers
model = keras.Sequential([
    layers.Conv2D(filters=1,
                  kernel_size=3,
                  padding='same',
                  use_bias=False,
                  input_shape=img.shape),
    layers.Activation('relu'),
    layers.MaxPool2D(pool_size=2,
                     padding='same'),
])

conv2d, relu, maxpool2d = model.layers

Notice that this is essentially the convolutional block you learned about in Lesson 2.

Now for the filtering step, we'll define a kernel and then apply it with the convolution. The kernel in this case is an "edge detection" kernel.

In [None]:
import visiontools

krn = tf.constant([
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1],
])

visiontools.show_kernel(krn)

Remember that a kernel is a set of weights for a convolutional layer. We'll give the `conv2d` layer we've created these weights with its `set_weights` method.

In [None]:
# Reformat for batch compatibility.
image = tf.image.convert_image_dtype(img, dtype=tf.float32)
image = tf.expand_dims(image, axis=0)
kernel = tf.reshape(krn, [*krn.shape, 1, 1])

# Apply the kernel to the layer. Since a conv layer can have many
# kernels, we pass it in a list.
conv2d.set_weights([kernel])

# You can call the layer on the image just like a function.
image_filter = conv2d(image)

plt.figure(figsize=(6, 6))
plt.imshow(tf.squeeze(image_filter))
plt.axis('off')
plt.show();

Next is the detection step with the ReLU function.

In [None]:
image_detect = relu(image_filter)

plt.figure(figsize=(6, 6))
plt.imshow(tf.squeeze(image_detect))
plt.axis('off')
plt.show();

The `relu` and `maxpool2d` layers have no trainable weights, so we can simply pass in the image.

And last is condensing with maximum pooling.

In [None]:
image_condense = maxpool2d(image_detect)

plt.figure(figsize=(6, 6))
plt.imshow(tf.squeeze(image_condense))
plt.axis('off')
plt.show();

Let's look at the whole process start-to-finish.

In [None]:
visiontools.show_extraction(img, krn)

# Conclusion #

In this tutorial, we saw how **kernels** represent the weights in a convnet, while **feature maps** represent activations. We saw how a convolutional network can engineer the complex features it needs to solve a classification problem through the application of convolution, ReLU, and pooling. In the exercises, you'll explore the operations more on your own!
<!-- #endregion -->