# Introduction #

In the last lesson, you used transfer learning with a pre-trained model. But what exactly did our model learn? Let's open up the "black box" and find out!

<!-- header illustration -->

In this lesson you'll,

# Convnets #

A **convolutional neural network** (also, **CNN** or **convnet**) is a special neural network designed to learn **spatial information** such as images, text, and video. Its characteristic operation, the convolution, makes a convnet very efficient at processing this kind of information.

## How Convnets Work ##

A convnet used for classifying images will consist of two parts: a **convolutional base** and a **dense head**.

<!-- TODO: diagram -->

The convolutional base acts as a visual **feature extractor**, while the dense head acts as a classifier. A convnet classifies an image like this: 
1. the base extracts the visual features
2. the head decides the class using those features

<!-- TODO: diagram -->

# Weights and Activations / Filters and Features #

You might remember **weights** and **activations** from *Introduction to Deep Learning*. An input to a neural network layer produces activations, and the weights in the layer determine what those activations are.

Convnets have a special layers called **convolutional layers** that they use for feature extraction. The weights in a convolutional layer define **filters**, and every filter will extract a particular kind of visual feature.
<!-- blue box -->
- an input produces activations determined by the weights
- an image produces features determined by the filter
<!-- end blue box -->

Here are the features produced by a filter for extracting vertical lines:
<!-- TODO: image -->

When training a network, the goal is to learn weights which produce activations with minimal loss. The goal when training a convnet is to learn filters which produce features with minimal classification error.

# Simple to Complex / General to Specific #

A convnet repeats the feature extraction process many times as information flows through its layers. The first set of features extracted feeds into the next convolutional layer which extracts another, more complex, set of featuers.

<!-- TODO: image -->

The result is that the filters the network learns that are near the input are the simplest and most general, while the filters near the output are the most complex and specific.

You can see this in the following illustration. The filters in the beginning layers extract things like straight lines or curves, while the filters at the end extract complex things like wheels and windshields.

<!-- TODO: illustration -->

This is what makes transfer learning work! If you were to train a convnet on images of dogs, in the ending layers would be filters specific to dogs, likely things like eyes and snouts and ears. But at the beginning would be filters for things like lines and curves, which are useful for images of all kinds. The more similar the new dataset is to the original, the more layers are likely to be useful.

# Making Visualizations #

Let's train a simple convnet on the cars dataset. Then we'll see how we can look at the filters it learned and at the activations they produce.

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(kernel_size=3, filters=16, padding='same', activation='relu', input_shape=[*IMAGE_SIZE, 3]),
    tf.keras.layers.Conv2D(kernel_size=3, filters=30, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=2),
    tf.keras.layers.Conv2D(kernel_size=3, filters=60, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=2),
    tf.keras.layers.Conv2D(kernel_size=3, filters=90, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=2),
    tf.keras.layers.Conv2D(kernel_size=3, filters=110, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=2),
    tf.keras.layers.Conv2D(kernel_size=3, filters=130, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(kernel_size=1, filters=40, padding='same', activation='relu'),
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(5, activation='softmax')
])
model.compile(
    optimizer='adam',
    loss= 'categorical_crossentropy',
    metrics=['accuracy'])

model.summary()

Once the model has been trained, we can get its filters like this:

And if we can see the features it extracts from an image like this:

Let's look at some filter/feature pairs as go we deeper and deeper down the network.

<!-- TODO: filter/feature pairs -->

# Conclusion #

# Your Turn #

<!-- # Why a New Layer? # -->

<!-- In this lesson, you'll learn about convolutional neural networks, sometimes called **CNNs** **convnets**. These neural networks use **convolutional layers**, which are specially designed to learn images. -->

<!-- The kind of network layers you've seen so far have mostly been *dense layers*. Because dense layers are fully-connected, they are very flexible. <\!-- TODO: advantages of dense layers -\-> -->

<!-- But this flexibility comes at a price. Because dense layers are fully-connected they will have very many parameters that must be trained. The VGG16 model we saw in the last tutorial has <\!-- X -\-> convolutional layers and <\!-- X -\-> parameters to train. An equivalent number of dense layers would give <\!-- very many -\-> parameters! <\!-- TODO: disadvantages of dense layers -\-> -->

<!-- <\!-- TODO: diagram of dense vs conv layer (maybe) -\-> -->

<!-- A model whose design reflects the structure of the problem will make the most efficient use of your time and data. What is best is if your model can vary in just the way your data can vary, but no more. -->

<!-- The design of a deep learning model is primarily reflected in its layers and in the way those layers are connected. -->

<!-- Convolutional layers learn images. -->

<!-- Convolutional networks use two operations in alternation: **convolution** and **pooling**. A convolution is a kind of weighted average that in effect divides spatial information into its components. The pooling operation condenses the result of a convolution, keeping the important part. -->

<!-- As information flows through a convolutional network,  -->

<!-- In fact, convolutional layers are well adapted to learning *any* kind of spatial data. This includes data that changes through time! Convnets that learn images use 2D convolutions. But with a 1D layer, a convnet can also learn text and audio. And a 3D layer can learn video! -->

<!-- In this tutorial, we'll explore convnets by looking at the **feature maps** its layers produce. We'll see the kinds of features a layer has learned and also what parts of an image the network thinks are most important to identifying its class. -->

<!-- # Convnet Architecture # -->

<!-- Here is a diagram of a small convolutional network. -->

<!-- <\!-- TODO: diagram of convnet -\-> -->
<!-- ```python -->
<!-- tf.keras.utils.plot_model(model, show_layer_names=False, show_shapes=True) -->
<!-- ``` -->

<!-- You'll learn more about how a convnet constructs these feature maps in the next lesson. -->

<!-- # Looking at Feature Maps # -->


<!-- # Conclusion # -->

<!-- There are many ways of visualizing convnets. -->

<!-- The ability to "open up" a convnet is a great advantage in interpretability. -->
