# Pretrained networks and transfer learning (very optional)

This notebook is not part of the course, but you can study it if you wish. It is provided as is, without many comments.

このノートブックはコースの一部ではありませんが、内容は自由に学ぶことができます。コメントが少なめ、和訳なしで提供します。

In [3]:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import os

## Using pretrained neural networks

In this session, we will learn how to use __pretrained neural networks__. These networks were already trained on a large dataset and the weights of the trained network are available.

The `keras` library contains some pretrained models that are easy to access.
<br>
These models are in the `keras.applications` package.

Let us try the `MobileNetV2` network that was pre-trained on the ImageNet database:
- `MobileNetV2` is a specific architecture of CNN for image classification
- ImageNet is a large database of images (14 million annotated images)

Creating the model is as simple as:

In [4]:
# import the mobilenet_v2 functions
from tensorflow.keras.applications import mobilenet_v2

# Create a pre-trained model
model_mobilenet = mobilenet_v2.MobileNetV2()


Downloading data from https://github.com/JonathanCMitchell/mobilenet_v2_keras/releases/download/v1.1/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5
  458752/14536120 [..............................] - ETA: 7:30:48

KeyboardInterrupt: 

We can have a look at the model layout:

In [None]:
model_mobilenet.summary()

We can see that:
- There are many layers.
- There are `3,504,872` trainable parameters.
- The shape of the input placeholder indicates that the images must be of size `224x224` and have `3` channels; Namely `mobilenet_v2` works on color images (RGB channels).
- The output shape of the last layer (called `Logits`) indicates that `mobilenet_v2` can recognize `1000` categories of objects. The network was trained using the data of the ImageNet Large Scale Visual Recognition Challenge that uses a subset of ImageNet with `1000` categories. 

Let us try `mobilenet_v2` on a few examples:

|<img src='./img/ballpen.png' width='200' align="left">|<img src='./img/backpack.png' width='200'>|<img src='./img/cup.png' width='200'>|<img src='./img/keyboard.png' width='200'>|
|---|---|---|---|
|./img/ballpen.png|./img/backpack.png|./img/cup.png|./img/keyboard.png|

`keras.preprocessing` package provides tool for loading and formatting images:

In [None]:
from tensorflow.keras.preprocessing import image

In [None]:
!ls img

It is possible to load an image, and if necessary resize it.

In [None]:
img_path = './img/ballpen.png'
img = image.load_img(img_path, target_size=(224, 224))

Let us plot the image:

In [None]:
plt.imshow(img);

The `image` tool loads images as a `PIL.Image.Image` object.
<br>
We need to transform it to a `numpy` array for the network.
<br>
The `PIL.Image.Image` object has a function `img_to_array` that does this transform:

In [None]:
print(type(img))
x = image.img_to_array(img)
print(type(x))

The `numpy` array `x` has a shape of:

In [None]:
print(x.shape)

The `mobilenet_v2` network expects an array of shape `(None, 224, 224, 3)`.
<br>
This means that it is a block of at least one image of size `224x224` with `3` channels.
<br>
Let us create a block of one image from our image:

In [None]:
X = np.expand_dims(x, axis=0) 
# There other ways to do this conversion:
#X = x[np.newaxis, :, :, :]
#X = x.reshape((1,224,224,3))
print(X.shape)

`keras` provides a function to preprocess input images for `mobilenet_v2`.
<br>
The preprocessing substract a mean value (computed during training) from the image:

In [None]:
X = mobilenet_v2.preprocess_input(X)

Now that the image is preprocessed, we can use the `mobilenet_v2` network to get the prediction:

In [None]:
predictions = model_mobilenet.predict(X)
print(predictions.shape)

As expected, the prediction is a vector of size `(1, 1000)` as we used an input of size `(None, 224, 224, 3)`:

`keras` also provides a function to display the predictions.
The parameter `top` limits the number of predictions to consider; here, we only access the top three:

In [None]:
decoded_predictions = mobilenet_v2.decode_predictions(predictions, top=3)

The output of `decode_predictions` is a list containing a list of elements for each of the input images.
<br>
Each element has 3 fields:
- An id from the ImageNet database for the predicted object
- A human readable name for the object
- The value of the output neuron corresponding to that object

Here we gave only one input image so `decoded_predictions[0]` contains the list of element for that image:

In [None]:
for obj in decoded_predictions[0]:
    print(obj)

## Try it

Check what are the predictions of mobilenet_v2 for the other 3 example images

## Try it

Process the 4 example images together in a single call to `model.predict`

The package `keras.applications` contains many different models.
<br>
For example:
- the `VGG19` model from the package `vgg19`
- the `ResNet50` model from the package `resnet50`

have the same interface as `mobilenet_v2`.


## Transfer learning

In this section, we will learn to re-use a part of a trained network for solving another classification problem.

#### Trying pretrained models on our data

Most of the time, the pre-trained deep networks like "VGG19", "ResNet50" and "MobileNetV2" are trained on a dataset (here ImageNet subset) that does not correspond to the classification task we want to do on our own data.
<br>
For example, if we want to create an image classifier that classify images of hard discs and ram modules, the pre-trained classifier may not be the best choice.

In the `./data/` folder, there are two subfolders `hd/` and `ram/` each containing various images of hard discs and ram modules.
<br>
Let us see what kind of results `mobilenet_v2` gives.

To access the files easily, we use the `glob` package:

In [None]:
import glob
image_list = glob.glob('./data/hd/hd_*.jpg')

Then we can apply `mobilenet_v2` on all images:

In [None]:
pred_list = []
for img_path in image_list:
    try:
        X = image.img_to_array(image.load_img(img_path, target_size=(224, 224))).reshape((1,224,224,3))
        X = mobilenet_v2.preprocess_input(X)
        predictions = model_mobilenet.predict(X)
        decoded_predictions = mobilenet_v2.decode_predictions(predictions, top=1)
        pred_list.append(decoded_predictions[0][0][1])#Just keep the readable name of the class
    except OSError as e:
        print(str(e))

# Show one example
plt.imshow(image.load_img(img_path, target_size=(224, 224)))
plt.axis("off")

# Keep one example for later use
X_hd = X

In [None]:
objects_set = set(pred_list)
labels = []
counts = []
for obj in objects_set:
    labels += [obj]
    counts += [pred_list.count(obj)]

plt.barh(2*np.arange(len(counts)),counts,1.5)
plt.yticks(2*np.arange(len(labels)), labels, rotation='horizontal')
plt.ylim(-1, 2*(len(counts)-0.5))
plt.xlabel("counts");

`mobilenet_v2` is able to recognize the hard disc images.

In [None]:
print('mobilenet_v2 accuracy:', pred_list.count('hard_disc') / len(pred_list))

In [None]:
import glob
image_list = glob.glob('./data/ram/ram_*.jpg')

pred_list = []
for img_path in image_list:
    try:
        X = image.img_to_array(image.load_img(img_path, target_size=(224, 224))).reshape((1,224,224,3))
        X = mobilenet_v2.preprocess_input(X)
        predictions = model_mobilenet.predict(X)
        decoded_predictions = mobilenet_v2.decode_predictions(predictions, top=1)
        pred_list.append(decoded_predictions[0][0][1])
    except OSError as e:
        print(str(e))

# Show one example
plt.imshow(image.load_img(img_path, target_size=(224, 224)))
plt.axis("off")

# Keep one example for later use
X_ram = X

In [None]:
objects_set = set(pred_list)
labels = []
counts = []
for obj in objects_set:
    labels += [obj]
    counts += [pred_list.count(obj)]

plt.barh(2*np.arange(len(counts)),counts,1.5)
plt.yticks(2*np.arange(len(labels)), labels, rotation='horizontal')
plt.ylim(-1, 2*(len(counts)-0.5))
plt.xlabel("counts");

`mobilenet_v2` does not know about RAM! (The reason is that RAM images are not in the ImageNet dataset.)

#### Truncated pretrained network

The models in the `keras.applications` package have a `include_top` parameter.
If set to `True`, the model includes the classification part otherwise only the feature part is loaded.
<br>
Let us load a `mobilenet_v2` without the classification part:

In [None]:
model_mobilenet_no_top = mobilenet_v2.MobileNetV2(include_top=False)
model_mobilenet_no_top.summary()

Compared to the full `mobilenet_v2`, last few layers are missing.

In addition to the block size that was already `None` in the full `mobilenet_v2`, the image width and height are also set to `None` in the truncated `mobilenet_v2`.

When applied to an image of size `224x224` with `3` channles, the `predict` function outputs an image of size `7x7` with `1056` channels:

In [None]:
features_hd = model_mobilenet_no_top.predict(X_hd)
print(features_hd.shape)

To visualize this image, let us concatenate all the `1056` channels in a large image.
<br>
We create a large image of size `(7x32)x(7x64)` by having `32` rows and `33` columns of small `7x7` images. 

In [None]:
I_hd = np.zeros((7*32, 7 *33))
for i in range(32):
    for j in range(33):
        I_hd[i*7 : (i+1)*7, j*7 : (j+1)*7] = features_hd[0,:,:,i*33+j].reshape((7,7))
plt.imshow(I_hd)
plt.axis("off");

This image shows the representation of the input hard disc image obtained from the convolutive part of the trained `mobilenet_v2`.

We can do the same thing for the ram image:

In [None]:
features_ram = model_mobilenet_no_top.predict(X_ram)

I_ram = np.zeros((7*32, 7 *33))
for i in range(32):
    for j in range(33):
        I_ram[i*7 : (i+1)*7, j*7 : (j+1)*7] = features_ram[0,:,:,i*33+j].reshape((7,7))
plt.imshow(I_ram)
plt.axis("off");


This second image is the representation of the input ram image obtained from the convolutive part of the trained `mobilenet_v2`.

The idea of transfer learning is to train a classifier not to on the original images but to on the representations (__features__) obtained from the truncated network.

#### Create an intermediary feature dataset

The fist approach is to first transform our image dataset into a feature dataset by applying the truncated network to all the images.

We preprocess all the hard disc and RAM images using the truncated MobileNetV2. We obtain a dataset of features.

In [None]:
image_list = glob.glob('./data/hd/hd_*.jpg')
w = 224
img_list_hd = []
X_list_hd = []
for img_path in image_list:
    try:
        img = image.load_img(img_path, target_size=(w, w))

        X = image.img_to_array(img).reshape((1,w,w,3))
        img_list_hd.append(X.copy()/255.0)

        X = mobilenet_v2.preprocess_input(X)

        F = model_mobilenet_no_top.predict(X)
        X_list_hd.append(F)
    except OSError:
        pass

imgs_hd = np.concatenate(img_list_hd, axis=0)
X_hd = np.concatenate(X_list_hd, axis=0)
y_hd = np.zeros(X_hd.shape[0])

In [None]:
image_list = glob.glob('./data/ram/ram_*.jpg')

img_list_ram = []
X_list_ram = []
for img_path in image_list:
    try:
        img = image.load_img(img_path, target_size=(w, w))

        X = image.img_to_array(img).reshape((1,w,w,3))
        img_list_ram.append(X.copy()/255.0)

        X = mobilenet_v2.preprocess_input(X)

        F = model_mobilenet_no_top.predict(X)
        X_list_ram.append(F)
    except OSError:
        pass
imgs_ram = np.concatenate(img_list_ram, axis=0)
X_ram = np.concatenate(X_list_ram, axis=0)
y_ram = np.ones(X_ram.shape[0])

In [None]:
imgs = np.concatenate((imgs_hd, imgs_ram), axis=0)
X = np.concatenate((X_hd, X_ram), axis=0)
y = np.concatenate((y_hd, y_ram), axis=0)

We will split the features dataset into training and testing part.
<br>
(We do it manually so that we can keep track of the corresponding original images.)

In [None]:
# Create a permutation of the indicies
number_samples = X.shape[0]
shuffle_index = np.random.permutation(number_samples)

# use the permuted list as indices
imgs = imgs[shuffle_index, :,:,:]
X = X[shuffle_index,:,:,:]
y = y[shuffle_index]

# Split the data in training and testing
testing_training_ratio = 0.5
test_samples = int(testing_training_ratio * number_samples)

# from 0 to test_samples-1
imgs_test = imgs[:test_samples]
X_test = X[:test_samples]
y_test = y[:test_samples]

# From test_samples to end
imgs_train = imgs[test_samples:]
X_train = X[test_samples:]
y_train = y[test_samples:]

print("Training set size:", X_train.shape[0])
print("Testing set size:", X_test.shape[0])

#### Training a network on the features

We now train a small neural network to recognize hard discs and ram based on the features.

We will use a 3 layer fully connected neural network.

In [None]:
from tensorflow.keras.layers import Input, Flatten, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adagrad
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.utils import to_categorical

input_shape = X[0].shape
feature_input = Input(shape=input_shape, name="feature_input")

In [None]:
fl = Flatten()(feature_input)

In [None]:
fc1 = Dense(64, activation='relu', name='fc1')(fl)
dp1 = Dropout(0.1)(fc1)
fc2 = Dense(64, activation='relu', name='fc2')(dp1)
dp2 = Dropout(0.1)(fc2)

In [None]:
fc3 = Dense(2, activation='softmax', name='fc3')(dp2)

In [None]:
model_feature = Model(feature_input, fc3, name='hd_or_ram')

In [None]:
model_feature.summary()

Let's train.
<br>
(Note: we use the _Adagrad_ optimizer, as it worked better then Adam for this problem.)

In [None]:
y_train_one_hot = to_categorical(y_train)

model_feature.compile(loss='binary_crossentropy', optimizer=Adagrad(lr=0.01), metrics=['acc'])

model_checkpoint_cb = ModelCheckpoint("model_feature_weights.hdf5", monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
early_stopping_cb = EarlyStopping(monitor='val_loss', min_delta=0.1, patience=10, verbose=0, mode='auto')
H = model_feature.fit(X_train, y_train_one_hot, batch_size=16, epochs=50, validation_split=0.25 , shuffle=True, callbacks=[model_checkpoint_cb, early_stopping_cb])

In [None]:
plt.plot(H.history['loss'], label="loss")
plt.plot(H.history['val_loss'], label="val_loss")
plt.xlabel("epochs")
plt.ylabel("loss")
plt.title("loss vs epochs")
plt.legend();

In [None]:
plt.plot(H.history['acc'], label="acc")
plt.plot(H.history['val_acc'], label="val_acc")
plt.xlabel("epochs")
plt.ylabel("acc")
plt.title("Accuracy vs epochs")
plt.legend();

#### Performance

Let's reload the best model and test the performance.

In [None]:
model_feature.load_weights("model_feature_weights.hdf5")

In [None]:
y_test_pred_one_hot = model_feature.predict(X_test)

In [None]:
y_test_pred = np.argmax(y_test_pred_one_hot, axis=1)

In [None]:
from sklearn.metrics import confusion_matrix
CM = confusion_matrix(y_test, y_test_pred)
print(CM)

In [None]:
A = np.sum(np.diag(CM)) / np.sum(CM)
print("Accuracy = {:.02f}".format(A))

In [None]:
P = np.diag(CM) / np.sum(CM, axis = 0)
R = np.diag(CM) / np.sum(CM, axis = 1)
for i in range(2):
    print("Class '{}' : P = {:.02f} R = {:.02f}".format(i, P[i], R[i]))

Finally we plot a few examples of correct classifications. and all the incorrect classifications.

In [None]:
correct_indices = np.where(y_test_pred == y_test)[0]
for i in correct_indices[:10]:
    plt.figure()
    plt.imshow(imgs_test[i, :, :, :])
    plt.axis('off')
    if y_test[i] == 0:
        title = "True HD Pred HD"
    else:
        title = "True RAM Pred RAM"
    plt.title(title)
    

In [None]:
error_indices = np.where(y_test_pred != y_test)[0]
for i in error_indices:
    plt.figure()
    plt.imshow(imgs_test[i, :, :, :])
    plt.axis('off')
    if y_test[i] == 0:
        title = "True HD Pred RAM"
    else:
        title = "True RAM Pred HD"
    plt.title(title)
    

# Try it
Use transfer learning to create a classifier for two (or more) classes that are not in the `1000` default classes.