# Part 3: Pretrained Convolutional Neural Networks

In [None]:
%matplotlib inline

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# for elementary image manipulation
import imageio as iio

# specifies the default figure size for this notebook
plt.rcParams['figure.figsize'] = (10, 10)

# specifies the default color map
plt.rcParams['image.cmap'] = 'gray'

# we'll use keras to build our networks
from tensorflow import keras
from keras.models import Sequential

from keras.layers import Flatten, Dense, Dropout
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import ZeroPadding2D

from tensorflow.keras.optimizers import SGD

# Working with a Pre-Trained Network

Now lets load an already trained network in our environment. This network (VGG-16) has been trained on the Imagenet dataset where the goal is to classify pictures into one out of one thousand categories. 
When it came out in 2014, it won the annual ImageNet Recognition Challenge correctly classifying 93% of the images in the test set. 
For comparison, humans can achieve around 95% accuracy. 
It's also very simple, it only uses 3x3 convolutions (like the ones you have used before)! 

It is however rather deep and it takes **2 to 3 weeks with 4 GPUs** to train it...

To load the model, you must first define it's architecture. 
You're going to do this step by step as you learn the components of convolutional neural networks. 

**IMPORTANT NOTE**: it is extremely difficult to come up with an architecture "that works". So while people successfully adapt existing neural nets such as VGG16 to their needs (and, in fact, a lot of people use it without its last layer as an *image-embedding operator*), architecture design is the realm of research (and much head scratching).

Recently, the Google Brain team has applied brute-force style search to try to find the best architectures for problems but also to learn what activation function to use, what step-size mechanisms etc (learn everything-approach). 
This required an obscene amount of resources and the results were far from intuitive (*search for "Google Brain 2017 year review" for a discussion of this and many other interesting results*). 

### Implementing a convolutional layer

You are going to define the first convolutional layer of the network. But before, you will add some padding to the image so the convolutions get to apply on the outer edges.

In what follows you don't have to modify the cells but just run them making sure you understand what is being done. Do not tune the parameters as we will load pre-trained weights on the architecture!

In [None]:
# Create the model, it's a Sequential model (stack of layers one after the other)
vgg_model = Sequential()

# On the very first layer, you must specify the input shape
# ZeroPadding2D adds a frame of 0 (column left and right, row top and bottom)
# the tuple (1, 1) indicates it's one pixel and symmetric.
vgg_model.add(ZeroPadding2D((1, 1), input_shape=(224, 224, 3))) 

# Your first convolutional layer will have 64 3x3 filters, 
# and will use a relu activation function
vgg_model.add(Conv2D(64, (3, 3), activation='relu', name='conv1_1'))

### Stacking layers

Now you're going to stack another convolutional layer. Remember, the output of a convolutional layer is a 3-D tensor, just like the input image. Although it does have a much higher depth!

In [None]:
# Once again you must add padding
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(64, (3, 3), activation='relu', name='conv1_2'))

### Adding pooling layers

Now lets add your first pooling layer. Pooling reduces the width and height of the input by aggregating adjacent cells together.


In [None]:
# Add a pooling layer with window size 2x2
# The stride indicates the distance between each pooled window
vgg_model.add(MaxPooling2D((2, 2), strides=(2, 2)))

### Adding more convolutions for VGG

Now you can stack many more of these! Remember not to change the parameters as we are about to load the weights of an already trained version of this network.

Also, as you will quickly realise, Keras for practitioners usually means a lot of copy-pasting...

In [None]:
# second set of Padding - Conv - Padding - Conv - Pooling
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(128, (3, 3), activation='relu', name='conv2_1'))
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(128, (3, 3), activation='relu', name='conv2_2'))
vgg_model.add(MaxPooling2D((2, 2), strides=(2, 2)))

# third set
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(256, (3, 3), activation='relu', name='conv3_1'))
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(256, (3, 3), activation='relu', name='conv3_2'))
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(256, (3, 3), activation='relu', name='conv3_3'))
vgg_model.add(MaxPooling2D((2, 2), strides=(2, 2)))

# fourth set
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(512, (3, 3), activation='relu', name='conv4_1'))
vgg_model.add(ZeroPadding2D((1,1)))
vgg_model.add(Conv2D(512, (3, 3), activation='relu', name='conv4_2'))
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(512, (3, 3), activation='relu', name='conv4_3'))
vgg_model.add(MaxPooling2D((2, 2), strides=(2, 2)))

# fifth set
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(512, (3, 3), activation='relu', name='conv5_1'))
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(512, (3, 3), activation='relu', name='conv5_2'))
vgg_model.add(ZeroPadding2D((1, 1)))
vgg_model.add(Conv2D(512, (3, 3), activation='relu', name='conv5_3'))
vgg_model.add(MaxPooling2D((2, 2), strides=(2, 2)))

As you can see, the depth of the layers get progressively larger, up to 512 for the latest layers. 
This means that, as we go along, each layer detects a greater number of features. 

On the other hand, each max-pooling layer halves the height and width of the layer outputs. 
Starting from images of dimensions 224x224, the final outputs are only of size 7x7.

Now you're about to add some fully connected layers which can learn the more abstract features of the image. 
But first you must first change the layout of the input so it looks like a 1-D tensor (vector).

In [None]:
# Flatten the output
vgg_model.add(Flatten())

# Add a fully connected layer with 4096 neurons
vgg_model.add(Dense(4096, activation='relu'))

The `Flatten` function removes the spatial dimensions of the layer output, it is now a simple 1-D vector of numbers. This means we can no longer apply 2D convolution layers as before, but we can apply fully connected layers like the ones of the perceptron from the previous module.

`Dense` layers are fully connected layers. You used them in the previous module.

### Preventing overfitting with Dropout

`Dropout` is a method used at train time to prevent overfitting. As a layer, it randomly modifies its input
so that the neural network learns to be robust to these changes. Although you won’t actually use it
now, you must define it to correctly load the pre-trained weights as it was part of the original network.

In [None]:
# Add a dropout layer
vgg_model.add(Dropout(0.5))

The number 0.5 indicates the amount of change, 0.0 means no change, and 1.0 means completely different.


Add one more fully connected layer:

In [None]:
vgg_model.add(Dense(4096, activation='relu'))
vgg_model.add(Dropout(0.5))

Finally a softmax layer to predict the categories. There are 1000 categories and hence 1000 neurons.

In [None]:
vgg_model.add(Dense(1000, activation='softmax'))

### Loading the weights

And you're all set with the architecture! Let's load the weights of the network. For this use 

```python
vgg_model.load_weights(path)
```

where `path` is the path to `vgg16_weights_tf_dim_ordering_tf_kernels.h5`. Those are the weights for the pre-trained VGG16, they can be downloaded online so you don't need to re-train the model yourself.

The weights file need to be downloaded to the `data` folder as follow:

In [None]:
import urllib.request

url = 'https://info.cambridgespark.com/hubfs/Curriculum%20Team%20Folder/L7%20AI/Neural%20Networks/vgg16_weights_tf_dim_ordering_tf_kernels.h5'
output_path = 'data/vgg16_weights_tf_dim_ordering_tf_kernels.h5'
urllib.request.urlretrieve(url, output_path)

In [None]:
weight_loc = 'data/vgg16_weights_tf_dim_ordering_tf_kernels.h5'
vgg_model.load_weights(weight_loc)

Compile the network no need to worry about this for now

In [None]:
sgd = SGD()
vgg_model.compile(optimizer=sgd, loss='categorical_crossentropy')

### Preprocessing the data

Lets feed an image to your model. In the VGG network, we only do zero centering. The model takes as input a slightly transformed version of the input. 

In [None]:
# Load an image of a cat and a puppy (feel free to test with another one of your images)
img = iio.imread('data/cat.jpg')
img2 = iio.imread('data/puppy.jpg')

# Plot the images to check them
plt.imshow(img)
plt.axis('off')
plt.figure()
plt.imshow(img2)
plt.axis('off')
plt.figure()

The VGG16 network assumes that the input it will receive is an array of 224 by 224 RGB images that have been *centered* in each of their channels. 

The function below applies the centering and makes sure the dimensions are right. 

In [None]:
# This transformation performs the 0-centering
def transform_image(image):
    image_t = np.copy(image)
    image_t = keras.preprocessing.image.smart_resize(
        image_t, (224, 224), interpolation='bilinear'
    )

    image_t = image_t.astype(np.float32) # Avoids modifying the original
    image_t[:, :, 0] -= 103.939                 # Substracts mean Red
    image_t[:, :, 1] -= 116.779                 # Substracts mean Green
    image_t[:, :, 2] -= 123.68                  # Substracts mean Blue
    image_t = np.expand_dims(image_t, axis=0)   # The network takes batches of images as input
    return image_t

img_t = transform_image(img)
img2_t = transform_image(img2)

print(img_t.shape)

The first dimension is a "dummy" dimension, that is because the network expects an array of images as input. 

The three subsequent dimensions are the image dimensions with the three colour channels at the end. 

### Getting an output from the network

Let's push the images through the network and see what happens!

In [None]:
# Push the image through the network using vgg_model.predict call the result
out = vgg_model.predict(img_t)
out2 = vgg_model.predict(img2_t)

In [None]:
# the output is for a batch of images but you only gave one so extract the first element
out = out[0]                   
out2 = out2[0]

In [None]:
# now plot the output, xlabel=Categories, ylabel=Probabilities
plt.figure()
plt.ylabel('Probabilities')
plt.xlabel('Categories')
cats = np.arange(out.shape[0])
plt.vlines(cats, [0], out, label='out', color='C0')
plt.vlines(cats, [0], out2, label='out2', color='C1')
plt.legend()
plt.show()

The network seems pretty confident! Lets look at its top 5 guesses for each of the two images. 

1. load the labels from `data/synset_words`
2. sort the top 5 probabilities
3. display the corresponding categories with their predicted probabilities

In [None]:
# Load labels
imagenet_labels_filename = 'data/synset_words.txt'
labels = np.loadtxt(imagenet_labels_filename, str, delimiter='\t')

In [None]:
top_5 = out.argsort()[-5:][::-1]
top_5_values = np.sort(out)[-5:][::-1]
print('Image 1')
for label, prob in zip(labels[top_5], top_5_values):
    print('label: {} with probability: {:0.3f}'.format(label, prob))

top2_5 = out2.argsort()[-5:][::-1]
top2_5_values = np.sort(out2)[-5:][::-1]

print('\nImage 2')
for label, prob in zip(labels[top2_5], top2_5_values):
    print('label: {} with probability: {:0.3f}'.format(label, prob))

Not too bad...

### Looking inside the network

In a convolutional neural network, there's an easy way to visualise the filters learned at the very first layer. We can print each filter to show which colours it reponds to.

In [None]:
# This is a helper function to let you visualise what goes on inside the network

def vis_square(weights, padsize=1, padval=np.nan, activation=False,
               normalize=True, diverging=True, cmap=None):
    # Avoids modifying the network weights
    data = np.copy(weights)
    
    if normalize:
        # Normalise the inputs - this makes sense for interpreting colour
        # -- we are interested in relative difference between patches
        # however, as with activations, we may sometimes want to understand
        # want to understand the magnitude of the values (i.e. un-normalized)
        data -= data.min()
        data /= data.max()
    
    
    vmax = int(
        max(
            np.abs(np.max(data)),
            np.abs(np.min(data))
        )
    )
    vmin = -vmax
    
    # Lets tile the inputs
    # How many inputs per row (e.g.: if 64 filters then 8x8)
    
    n = int(np.ceil(np.sqrt(data.shape[0]))) 
    
    # add padding between inputs (you can safely ignore this)
    padding = ((0, n**2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)
    data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))
    
    # place the filters on an n by n grid
    data = data.reshape((n, n) + data.shape[1:])
    
    # merge the filters contents onto a single image
    data = data.transpose((0, 2, 1, 3) + tuple(range(4, data.ndim)))
    data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
    
    # complete the border
    data = np.pad(data, [(padsize, 0), (padsize, 0), (0, 0)], mode='constant', 
                  constant_values=(padval))
    
    # show the filter. In the activation case we don't have a colour channel (see further)
    plt.figure()
    if activation:
        data = data[:, :, 0]
    
    if cmap is None:
        cmap = plt.get_cmap()
    
    # Ensure that zero is in the middle of the colourmap
    if diverging:
        plt.imshow(data, vmin=vmin, vmax=vmax, cmap=cmap)
    else:
        plt.imshow(data, cmap=cmap)
    plt.axis('off')
    

In [None]:
# Get the weights of the first convolutional layer 
# (so that's the second layer, just after the input layer)
first_layer_weights = vgg_model.layers[1].get_weights()
# first_layer_weights[0] stores the connection weights of the layer
# first_layer_weights[1] stores the biases of the layer
# For now we're just interested in the connections
filters = first_layer_weights[0]

# Visualise the filters 
# (swapaxes to get it to be in the appropriate ordering of dimensions)
vis_square(np.swapaxes(filters, 0, 3))
plt.show()

You can see how each filter detects a different property of the input image. Some are designed to respond to certain colours, while some other -- the greyscale looking ones -- detects changes in brightness such as edges. 

Another way of visualising the network is to see which neurons get activated as the images traverses the network. A neuron outputing a high value means the pattern it has learnt to detect has been observed. 
Let's apply this to our kitten image.

In [None]:
def get_layer_output(model, image, layer):
    output_fn = keras.Model(inputs=[model.layers[0].input], outputs=[model.layers[layer].output])
    return output_fn([image])

# retrieve the activation of the first convolutional layer
layer_output = get_layer_output(vgg_model, img_t, 1)

# visualise
cmap = plt.get_cmap('bwr')
cmap.set_bad('black', alpha=1.)  # set np.nan values on border to be black

print(layer_output.shape)

vis_square(np.swapaxes(layer_output, 0, 3), padsize=20, 
           activation=True, diverging=True, normalize=False, cmap=cmap)
plt.colorbar(fraction=.045)
plt.show()

**It's worth spending a moment to understand what is going on here. Do you understand why all these activations are positive?**

**Each pixel in this image is a different neuron in the neural network. Neurons on the same image sample share the same weights and therefore detect the same feature. You can compare the visualised filters above with their corresponding image sample. What filter helps detect grass?**



Using this method, it is possible to visualise the deeper parts of the neural network, although they become much harder to interpret. You can visualise the output of the second convolutional layer:

In [None]:
layer_output = get_layer_output(vgg_model, img_t, 3)
vis_square(np.swapaxes(layer_output, 0, 3), padsize=15,
           activation=True, diverging=True, normalize=False, cmap=cmap)
plt.colorbar(fraction=.045)
plt.show()

And the eighth layer:

In [None]:
layer_output = get_layer_output(vgg_model, img_t, 9)
vis_square(np.swapaxes(layer_output, 0, 3)[0:64], padsize=5,
           activation=True, diverging=True, normalize=False, cmap=cmap)
plt.colorbar(fraction=.045)
plt.show()

As we get further down the network, the representations become smaller in their spatial features thanks to the pooling layers. The final convolutional layers only have dimensions 14 by 14.

In [None]:
# visualize the final convolutional layer
layer_output = get_layer_output(vgg_model, img_t, 29)
vis_square(np.swapaxes(layer_output, 0, 3)[0:64], padsize=1,
           activation=True, diverging=True, normalize=False, cmap=cmap)
plt.colorbar(fraction=.045)
plt.show()
