<!--Filters and Features-->
# Introduction #

In the last lesson, we saw how the structure of convolution and pooling layers was adapted to solving classification problems with spatial features.

In this lesson, we'll look at another consequence of this structure: the **hierarchy of visual features**. We'll explore this hierarchy by looking at the features the network is most sensitive to as we travel deeper into its layers.

# The Receptive Field #

Each neuron in a convnet is connected to the neurons in a previous layer (or to the inputs) according to the "windows" we saw in the previous lesson -- either a kernel or a pooling block. Since our windows are usually very small ($3 \times 3$ or $2 \times 2$), we might worry that a convnet wouldn't be able to detect "large" features that could be important in a classification problem.

Fortunately, because these windows "stack", the number of pixels a neuron receives inputs from grows the deeper the neuron in the network. The deep stacking of layers creates a widening of the receptive field. The neurons in the last convolutional layers might ultimately be receiving inputs from large parts of the original image.

<figure>
<!-- <img src="5-receptive-field.png" alt="The convolution induces a widening pattern of connections." width=400> -->
<img src="https://i.imgur.com/HmwQm2S.png" alt="The convolution induces a widening pattern of connections." width=200>
</figure>

The consequence of this expanding receptive field is that filters in the shallow layers will be most sensitive to features at the smallest scale, the simplest features. While the filters in the deepest layers will be most sensitive to features at a large scale, the most complex features.

If we look at features from layer to layer, we can see how the deep structure of the network produces features that are more and more complex and refined. By the time an image has reached the classifier, it has undergone the extraction process many times, its simple features being combined and recombined in complex ways.

An image is worth a thousand words, though -- so let's open up a network and take a look!

# Example - Looking Inside Convnets #

We saw in Lesson 3 how a network performs the feature extraction.

We're going to create a new model essentially by "chopping off" some layers. We do this by just rerouting the layer outputs into new model outputs.


```python
# New inputs are the same as the original
inputs = model.inputs

# New outputs are now a layer's outputs
layer = model.get_layer('conv1')
outputs = layer.outputs

# And we now we can create a new model!
viz_model = tf.keras.Model(inputs = inputs, outputs = outputs)
```

The following hidden cell will prepare the image we'll use for this example.

In [None]:
#$HIDE_INPUT$
from visiontools import read_image, show_image
from tensorflow.keras.applications.vgg16 import preprocess_input

img = read_image('/kaggle/input/computer-vision-resources/ys.jpg')

plt.figure(figsize=(8, 8))
show_image(img)
plt.show()

SIZE = [1500, 1500]
img = tf.image.resize(img, size=SIZE, method='nearest')
img = tf.keras.applications.vgg16.preprocess_input(img, data_format='channels_last')
img = tf.image.convert_image_dtype(img, dtype=tf.float32)

We'll continue looking at the VGG16 model together in this tutorial. In the exercises, you'll look inside two much more powerful models: *ResNet50V2* and *Xception*.

In [None]:
vgg16 = tf.keras.models.load_model(
    '/kaggle/input/cv-course-models/cv-course-models/vgg16-pretrained-base',
)
vgg16.summary()

## Filters and Features ##



In [None]:
from visiontools import show_feature_maps

LAYER_NAME = 'block1_conv1'

show_feature_maps(
    img,
    model=vgg16,
    layer_name=LAYER_NAME,
    rows=1, cols=6,
)
show_filters(
    model=vgg16,
    layer_name=LAYER_NAME,
    rows=1, cols=6,
)

In [None]:
LAYER_NAME = 'block3_conv2'

show_feature_maps(
    img,
    model=vgg16,
    layer_name=LAYER_NAME,
    rows=1, cols=6,
)
show_filters(
    model=vgg16,
    layer_name=LAYER_NAME,
    rows=1, cols=6,
)

# Conclusion #