<!--TITLE: Convolution and ReLU-->

# Introduction #

In the last lesson, we saw that a convolutional classifier has two parts: a convolutional **base** and a **head** of dense layers. We learned that the job of the base is to extract visual features from an image, which the head would then use to classify the image.

Over the next few lessons, we're going to learn about the two most important types of layers that you'll usually find in a convolutional image classifier. These are: the **convolutional layer** with **ReLU activation**, and the **maximum pooling layer**. In Lesson 5, you'll learn how these layers are usually composed into **blocks** that perform the feature extraction.

This lesson is about the convolutional layer with its ReLU activation function.

# Feature Extraction #

The **feature extraction** performed by the base consists of **three basic operations** common to convolutional image classifiers.
1. **Filter** an image for a particular feature (convolution)
2. **Detect** that feature within the filtered image (ReLU)
3. **Condense** to isolate the feature (maximum pooling)

Here is what this process looks like. You can see how these three operations are able to isolate some particular characteristic of the original image.

<figure>
<!-- <img src="./images/2-show-extraction.png" width="1000" alt="An example of the feature extraction process."> -->
<img src="https://i.imgur.com/GDKuk6m.png" width="1000" alt="An example of the feature extraction process.">
</figure>

Typically, the network will perform several extractions in parallel on a single image. By the time the data reaches the classifier, a network might be producing over 1000 features! It is images like these that the head of dense layers uses to predict a class.

# Filter with Convolution #

A convolutional layer carries out the filtering step. Typically, you would define a convolutional layer within a model like this:

In [None]:
import tensorflow.keras as keras
import tensorflow.keras.layers as layers

model = keras.Sequential([
    layers.Conv2D(filters=64, kernel_size=3) # activation is None
    # More layers follow
])

The **weights** a convnet learns during training are primarily contained in its convolutional layers. These weights we call **kernels**. We can represent them as small arrays:

<figure>
<!-- <img src="./images/3-kernel.png" width="150" alt="A 3x3 kernel."> -->
<img src="https://i.imgur.com/uJfD9r9.png" width="150" alt="A 3x3 kernel.">
</figure>

A convolutional layer will usually contain many kernels -- often hundreds or thousands. They are what determine the kinds of features produced. You can think about a kernel as a kind of polarized lens, letting through only a certain pattern of information. More specifically, a kernel produces a "weighted sum" of a pixel and its neighbors. 

<figure>
<!-- <img src="./images/3-kernel-lens.png" width="400" alt="A kernel acts as a kind of lens."> -->
<img src="https://i.imgur.com/j3lk26U.png" width="250" alt="A kernel acts as a kind of lens.">
</figure>

The **activations** in the network we call **feature maps**. They are what result when we apply a filter to an image; they are the visual features the network extracts. Here are a few kernels pictured with the feature maps they produced when applied to an image.

<figure>
<!-- <img src="./images/3-kernels-and-maps.png" width="600" alt="The channels of a color image."> -->
<img src="https://i.imgur.com/JxBwchH.png" width="800" alt="An embossing kernel and the feature map it produces.">
</figure>

From the pattern of numbers in the kernel, you can tell what kind of feature maps it will produce. Generally, what a convolution accentuates in its inputs will match the shape of the *positive* numbers in the kernel. The left and middle kernels above will both filter for horizontal shapes.

# Detect with ReLU #

After filtering, the feature maps pass through the activation function. The **ReLU function** has a graph like this:

<figure>
<!-- <img src="./images/3-channels-stack.png" width="300" alt="Channels form the depth dimension."> -->
<img src="https://i.imgur.com/3Ud5xhK.png" width="300" alt="Graph of the ReLU activation function.">
</figure>

(*ReLU* stands for *Rectified Linear Unit*.)

Typically, you'll include it as the activation function of the convolutional layers.

In [None]:
model = keras.Sequential([
    layers.Conv2D(filters=64, kernel_size=3, activation='relu')
    # More layers follow
])

You could think about the activation function as normalizing the pixel values according to some measure of importance. The ReLU function says that negative values are not important and so sets them to 0. ("Everything unimportant is equally unimportant.")

Like other activation functions used in neural networks, the ReLU function is *nonlinear*. Essentially this means that the total effect of all the layers in the network is different than what we would get by just adding the effects together -- which would be no different than what you would get with a single layer.

The ReLU function ensures that only pixels with positive activation remain in the feature map. This is desireable because we don't want any negative activations destroying the features we detect deeper in the network, which is what would happen if we simply added them together.

Here is ReLU applied the feature maps above. Notice how it succeeds at isolating the feature of interest.

<figure>
<!-- <img src="./images/3-relu-and-maps.png" width="800" alt="ReLU applied to feature maps."> -->
<img src="https://i.imgur.com/dKtwzPY.png" width="800" alt="ReLU applied to feature maps.">
</figure>


# Example - Apply Convolution and ReLU #

Let's apply these operations to an image to get a feel for what they do.

Here is the image we'll use for this example:
<!-- #endregion -->

In [None]:
#$HIDE_INPUT$
import tensorflow as tf
import matplotlib.pyplot as plt
plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')

image_path = '/kaggle/input/computer-vision-resources/car_feature.jpg'
image = tf.io.read_file(image_path)
image = tf.io.decode_jpeg(image)

plt.figure(figsize=(6, 6))
plt.imshow(tf.squeeze(image), cmap='gray')
plt.axis('off')
plt.show();

For the filtering step, we'll define a kernel and then apply it with the convolution. The kernel in this case is an "edge detection" kernel. We'll define it as an array just like in Numpy.

In [None]:
import tensorflow as tf
import visiontools

kernel = tf.constant([
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1],
])

plt.figure(figsize=(3, 3))
visiontools.show_kernel(kernel)

TensorFlow includes many common operations performed by neural networks in its `tf.nn` [module](https://www.tensorflow.org/api_docs/python/tf/nn). This is what we'll use to apply convolution and ReLU for this example, but remember that when you're building models, you'll use layers in Keras as usual.

This next hidden cell does some reformatting to make things compatible with TensorFlow. The details aren't important for this example.

In [None]:
#$HIDE$
# Reformat for batch compatibility.
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image = tf.expand_dims(image, axis=0)
kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])

Now let's apply our kernel and see what happens!

In [None]:
image_filter = tf.nn.conv2d(
    input=image,
    filters=kernel,
    # we'll talk about these two in the next lesson!
    strides=1,
    padding='SAME'
)

plt.figure(figsize=(6, 6))
plt.imshow(tf.squeeze(image_filter))
plt.axis('off')
plt.show();

Next is the detection step with the ReLU function. This function is much simpler than the convolution, as it doesn't have any parameters to set.

In [None]:
image_detect = tf.nn.relu(image_filter)

plt.figure(figsize=(6, 6))
plt.imshow(tf.squeeze(image_detect))
plt.axis('off')
plt.show();

**TODO** Discussion

# Conclusion #

**TODO**