# Introduction #

In these exercises, you'll explore the operations a couple of popular convnet architectures use for feature extraction, learn about how convnets can capture large-scale visual features through stacking layers, and finally see how convolution can be used on one-dimensional data, in this case, a time series.

Run the cell below to set everything up.

In [None]:
# Setup feedback system
from learntools.core import binder
binder.bind(globals())
from learntools.computer_vision.ex4 import *

from cv_prelude import *

# Exploring Feature Extraction #

In the tutorial, we looked at the operations VGG16 uses to perform the feature extraction. Now let's take a look at the parameters used in a couple of more advanced models: [ResNet50](https://keras.io/api/applications/resnet/#resnet50-function) and [InceptionV3](https://keras.io/api/applications/inceptionv3/).


### 1a) ResNet50

Run this next cell to load the model and an example image.

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt
from matplotlib import gridspec
import visiontools
import warnings

plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')
warnings.filterwarnings("ignore") # to clean up output cells

IMAGE_PATH = '/kaggle/input/computer-vision-resources/car_illus.jpg'
image = visiontools.read_image(IMAGE_PATH, channels=1)
image = tf.image.resize(image, size=[224, 224], method='nearest')

resnet50 = tf.keras.models.load_model(
    '/kaggle/input/cv-course-models/cv-course-models/resnet50',
)

Now run this cell to get a random kernel from the first convolutional layer of ResNet50. It should return a `(7,7)` kernel. This convolution it follows with a pooling layer with a `(3, 3)` window and strides of `(2, 2)`. Run it several times if you like to see different examples.

In [None]:
kernel = visiontools.random_kernel(model=resnet50, layer='conv1_conv')

visiontools.show_kernel(kernel, label=False)

visiontools.show_extraction(
    image, kernel,
    conv_stride=2,
    conv_padding='same',
    activation='relu',
    pool_size=3,
    pool_stride=2,
    pool_padding='same',
subplot_shape=(1, 4),
figsize=(16, 6));
q_1.a.check()

### 1b) InceptionV3

The InceptionV3 architecture takes a different approach to reducing parameters. Though many of its layers use the traditional `(3, 3)` kernels, it also introduced the idea of an asymmetric convolution. Instead of a single `(7, 7)` kernel, for instance, it sometimes will use a `(7, 1)` kernel followed by a `(1, 7)` kernel. This allows it to get the same range of connectivity, but with many fewer parameters. Let's take a look at one of these kinds of kernels.

First run this cell to load the model.

In [None]:
inceptionv3 = tf.keras.models.load_model(
    '/kaggle/input/cv-course-models/cv-course-models/inceptionv3',
)

Now run this cell to see an example.

In [None]:
# Change the layer to 'conv2d_39' if you'd like to see a "long" kernel instead
kernel = visiontools.random_kernel(model=inceptionv3, layer='conv2d_38')

visiontools.show_kernel(kernel, label=False)

visiontools.show_extraction(
    image, kernel,
    conv_stride=1,
    conv_padding='same',
    activation='relu',
    pool_size=2,
    pool_stride=2,
    pool_padding='same',
subplot_shape=(1, 4),
figsize=(16, 6));
q_1.b.check()

Hopefully, these two examples illustrated the range of possibilities for feature extraction just in the choice of parameters values. The ResNet and Inception architectures infact introduced other innovations as well, if you're interested in how convnets are built, they're worth reading more about!

# The Receptive Field #

The **receptive field** of a neuron is the part of the image input it is able to receive information from, that is, the patch of input pixels it is ultimately connected to. We would like our base however to recognize large-scale features, like wheels or windows for the cars and trucks. Let's see how we can "grow" the receptive field through stacking layers, so that the outputs can get information from large patches of the input image.

### 2) How the Receptive Field Grows

This next picture illustrates two stacked convolutional layers both with `(3, 3)` kernels. The bottom layer represents the input. Each of the neurons in the first (middle) layer has a $3 \times 3$ receptive field. Following the path of connections, we can see that each of the neurons in the second (top) layer has a $5 \times 5$ receptive field.

<figure>
<img src="https://i.imgur.com/HmwQm2S.png" alt="Illustration of the receptive field of two stacked convolutions." width=250>
</figure>

If you added a *third* convolutional layer with a `(3, 3)` kernel, each of its neurons would have a receptive field of:

In [None]:
# Lines below will give you a hint or solution
#_COMMENT_IF(PROD)_
q_2.a.hint()
#_COMMENT_IF(PROD)_
q_2.a.solution()

Now say you add a `(2, 2)` maximum pooling layer with `strides=2` after the third convolution. What receptive field do the outputs have now? (This is harder. Try the hint if you need help.)

In [None]:
# Lines below will give you a hint or solution
#_COMMENT_IF(PROD)_
q_2.b.hint()
#_COMMENT_IF(PROD)_
q_2.b.solution()

# One-Dimensional Convolution #

Though we've been using convolutional networks on two-dimensional data, it turns out that they can also be useful on *one*-dimensional data, like time series or natural language texts. In fact, convolutional networks tend to be successful on any kind of data with a strong **local topological structure**, meaning that the information about a point tends to be concentrated in nearby points -- you can most successfully predict the value of a pixel by looking at nearby pixels, you can most successfully predict the weather today by looking at the weather yesterday instead of a month ago.

### 3) Apply a 1D Convolution

In this exercise, we'll see how a convolution can be used on a **time series**. The time series we'll use is from [Google Trends](https://trends.google.com/trends/); it measures the popularity of the search term "machine learning" for weeks from January 25, 2015 to January 15, 2020.

In [None]:
import pandas as pd

# Load the time series as a Pandas dataframe
machinelearning = pd.read_csv(
    '/kaggle/input/computer-vision-resources/machinelearning.csv',
    parse_dates=['Week'],
    index_col='Week',
)

machinelearning.plot();

Because our data is one-dimensional, the kernel needs to be one-dimensional as well. Define a one dimensional kernel. Though not required, you'll get better results if the entries sum to 1.

In [None]:
# YOUR CODE HERE: Define a 1D kernel. 
kernel = tf.constant([____])
q_3.check()

In [None]:
#%%RM_IF(PROD)%%
kernel = tf.constant([0.1, 0.2, 0.3, 0.4])
q_3.assert_check_passed()

Now run the next cell to apply the kernel with a convolution and see what effect it had on the time series.

In [None]:
# Reformat for TensorFlow
ts_data = machinelearning.to_numpy()
ts_data = tf.expand_dims(ts_data, axis=0)
ts_data = tf.cast(ts_data, dtype=tf.float32)
kern = tf.reshape(kernel, shape=(*kernel.shape, 1, 1))

ts_filter = tf.nn.conv1d(
    input=ts_data,
    filters=kern,
    stride=1,
    padding='VALID',
)

# Format as Pandas Series
machinelearning_filtered = pd.Series(tf.squeeze(ts_filter).numpy())

machinelearning_filtered.plot();

# Conclusion #

This lesson ends our discussion of feature extraction. Hopefully, having completed these lessons, you've gained some intuition about how the process works and why the usual choices for its implementation are often the best ones.

# Keep Going #

In the next lesson, [**Lesson 5**](#$NEXT_NOTEBOOK_URL$), you'll learn how to compose the `Conv2D` and `MaxPool2D` layers to build your own convolutional networks from scratch.