# Week 10: Convolutional Neural Networks: an Introduction

## Setup

In [None]:
# Python ≥ 3.8 is required
import sys
assert sys.version_info >= (3, 8)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "1.0"

# Common imports
import numpy as np
import pandas as pd
import os

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Ignore useless warnings (see SciPy issue #5998)
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

In [None]:
## Import tensorflow
import tensorflow as tf
from tensorflow import keras
print('Tensorflow version', tf.__version__)
print('Keras version', keras.__version__)

## 1. A quick overview on Convolutions

To explain how convolutions works we will load a couple of sample images:
* the image of a china temple
* the image of a flower

In [None]:
from sklearn.datasets import load_sample_image

# Load sample images
china = load_sample_image("china.jpg") / 255
flower = load_sample_image("flower.jpg") / 255
images = np.array([china, flower])
batch_size, height, width, channels = images.shape

In [None]:
height

In [None]:
width

In [None]:
channels

In [None]:
batch_size

Now we can print out our images and see how they look like:

In [None]:
_ = plt.imshow(flower)

In [None]:
_ = plt.imshow(china)

We will now create three 2D convolutional filters.

Each filter will be $7 \times 7$ in size and applied to all the channels in the image (our images are RGB, so we have three channels).

The filters are:
* a vertical bar filter
* a horizontal bar filter
* a filter consisting of a single positive central pixel surrounded by negative pixels. This, as we will see, behaves as an edge detector.

In [None]:
# Create 3 7x7xn_channels filters, the first one as a vertical line 
# the second one as a horizontal one

# first we create a create a 4D array 
# 7 X 7 X number of channels X number of filters 
filters = np.zeros(shape=(7, 7, channels, 3), dtype=np.float32)

# our first filters is a vertical line 
filters[:, 3, :, 0] = 1  # vertical line

# our second filter is a horizontal line 
filters[3, :, :, 1] = 1  # horizontal line

# our third filter is -1 everywhere except in the centre
# where it is = 48 so that the sum of all the pixels is zero.
# This will behaves as an "edge filter"
edge_filter = -1.0*np.ones((7, 7))
edge_filter[3, 3] = 48
edge_filter = np.repeat(edge_filter[:, :, np.newaxis], 3, axis=2)
edge_filter = tf.constant(edge_filter, shape=(7, 7, channels), dtype=tf.float32)

filters[:, :, :, 2] = edge_filter  # edge filter


Now we can print out our three convolutional filters and see how they look like:

In [None]:
_ = plt.imshow(filters[:, :, 0, 0], cmap='gray')

In [None]:
_ = plt.imshow(filters[:, :, 0, 1], cmap='gray')

In [None]:
_ = plt.imshow(filters[:, :, 0, 2], cmap='gray')

In [None]:
filters[:, :, 0, 2]

Now, let's perform the 2D convolution:

In [None]:
outputs = tf.nn.conv2d(
    images,
    filters,
    strides=1,
    padding="SAME"
)

In [None]:
# A couple of functions to plot greyscale and colour images
def plot_image(image):
    plt.imshow(image, cmap="gray", interpolation="nearest")
    plt.axis("off")

def plot_color_image(image):
    plt.imshow(image, interpolation="nearest")
    plt.axis("off")

# crop images
def crop(images):
    return images[150:220, 130:250]

In [None]:
for image_index in (0, 1):
    for feature_map_index in (0, 1, 2):
        plt.subplot(2, 3, image_index * 3 + feature_map_index + 1)
        plot_image(outputs[image_index, :, :, feature_map_index])

plt.show()

### Filter 1: Vertical line detection

See this detail of the Chinese temple picture

In [None]:
plot_image(crop(outputs[0, :, :, 0]))

### Filter 2: Horizontal line detection

See this detail of the Chinese temple picture

In [None]:
plot_image(crop(outputs[0, :, :, 1]))

### Filter 3:Edge detection

This is very analogous to the behaviour of some cells in the Primary Visual Area of the Primates Brain (area V1).

In [None]:
plot_image(outputs[0, :, :, 2])

In [None]:
plot_image(outputs[1, :, :, 2])

## 2. Convolutional and pooling layers in Keras

### 2.1 Convolutional Layer

In convolutional layers, the filters are not pre-defined/hard-coded but they are learned during training

In [None]:
conv = keras.layers.Conv2D(
    filters=32,
    kernel_size=3,
    strides=1,
    padding="SAME",
    activation="relu"
)
conv

### 2.2 Pooling Layer

In [None]:
max_pool = keras.layers.MaxPool2D(pool_size=2)
max_pool

## 3. A Convolutional Network for the Fashion MNIST dataset (image classification)

In [None]:
### Load the data; create training, test and validation sets
(
    X_train_full, y_train_full
), (
    X_test, y_test
) = keras.datasets.fashion_mnist.load_data()
class_names = [
    "T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
    "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"
]
X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]

# apply standard scaling to our data
X_mean = X_train.mean(axis=0, keepdims=True)
X_std = X_train.std(axis=0, keepdims=True) + 1e-7
X_train = (X_train - X_mean) / X_std
X_valid = (X_valid - X_mean) / X_std
X_test = (X_test - X_mean) / X_std

# add one inner dimension to the dataset so that it becomes 4-dimensional
# 2D Convolutional networks take 4D tensors as inputs with shape (n_samples, width, height, n_channels)
X_train = X_train[..., np.newaxis]
X_valid = X_valid[..., np.newaxis]
X_test = X_test[..., np.newaxis]

In [None]:
X_train.shape

**Exercise 01:** Let us implement a simple CNN to tackle the Fashion MNIST dataset. We will implement a CNN with these requirements:
  * a first 2-D Convolutional layer with 64 7x7 kernel filters, ReLU activation, and zero padding. Remember that the imput shape must be of the size of a single frame of the input image ($width \times height \times n_{channels}$) 
  * a max-pooling layer with pool size of 2
  * a second 2-D Convolutional layer with 128 3x3 kernel, ReLU activation, and zero padding.
  * a third 2-D Convolutional layer with 128 3x3 kernel, ReLU activation, and zero padding.
  * a max-pooling layer with pool size of 2
  * Now we repeat the same structure again: two more 2-D Convolutional layers with 256 3x3 kernel, ReLU activation, and zero padding, followed by one more MaxPooling layer with pool size of 2
  * After that, stack a fully connected network, composed of two hidden dense layers and a dense output layer. you must flatten the inputs to the first dense layer, since a dense network expects a 1D array of features for each instance. Furthermore, add two dropout layers to the Dense hidden layers, with a dropout rate of 50% each, to reduce overfitting. The first dense layer will have 128 neurons, the second hidden layer will have 64.

The number of kernel filters grows as we go up in the CNN towards the output layer: it is initially 64, then 128, then 256. It makes sense to have it increasing: the number of low-level visual feature is generally fairly low (e.g., edges, small circles...), but there are various ways to use them to generate higher-level features. Doubling the number of filters after each pooling layer is a common practice. A pooling layer divides each spatial dimension by a factor of 2, so we can afford to double the number of feature maps in the next layer without parameter explosion and increase of memory usage.

In [None]:
# write your solution here
n_net = ...

In [None]:
# print out the summary of your model
n_net.summary()

**Exercise 02:** Now let's compile and train the model. Use the correct loss function, the NADAM optimizer, and "accuracy" as a metric. Train the network ideally for at least 10 epochs.
If the model takes too much to compile you can consider increasing the stride, or reducing the number of filters. This will likely affect your accuracy. On my 2019 MacBookPro each epoch takes about 6 minutes to complete, so expect a full training session of 10 epochs to last 1 hour or more. Once you have successfully trained your model you can save it for later usage/deployment

In [None]:
# compile your model here:


In [None]:
# train your model here and return the history object:


Finally we can evaluate the performance on the test set, if we are happy with the training result.

In [None]:
# this cell should work as it is if the previous ones have been coded correctly
score = n_net.evaluate(X_test, y_test)

In [None]:
X_new = X_test[:10] # pretend we have new images
y_pred = n_net.predict(X_new)

In [None]:
y_pred

**Exercise 03:** How can print out the predicted classes? Do it here:

In [None]:
# write your solution here
