## Purpose

The goal of this notebook is to load the pretrained 10calsses_model.h5 with the MNIST dataset and perform some transfer learning to learn the USPS dataset. We are going to follow two approaches:
* Freeze the convolutional part and train the dense layers
* Fine-tune the model 

In [1]:
from keras.utils import to_categorical
from keras import layers
from keras import models
from keras.models import load_model
import h5py
import cv2
import numpy as np

import warnings
warnings.filterwarnings('ignore')

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


## Data

Let us now load the USPS dataset.

In [2]:
with h5py.File('usps_dataset.h5', 'r') as hf:
        train = hf.get('train')
        train_images = train.get('data')[:]
        train_labels = train.get('target')[:]
        test = hf.get('test')
        test_images = test.get('data')[:]
        test_labels = test.get('target')[:]

In [3]:
print(train_images.shape)
print(train_labels.shape)
print(test_images.shape)
print(test_labels.shape)

(7291, 256)
(7291,)
(2007, 256)
(2007,)


Therefore, our training set consists of 7291 images of size 16x16 and our test set of 2007 images of the same size. Let us now see the labels of the first five trainig images so as to see how they are expressed.

In [4]:
for i in range(5):
    print(train_labels[i])

6
5
4
7
3


We see that the label gives us the number corresponding to the image. Recall that our images show digits from 0 to 9, so the labels go from 0 to 9. Therefore, we will need to convert them to one-hot encoding when preprocessing the data.

## Preprocessing

We begin by resizing the images so as to be of size 28x28, which is the size of the input of our pretrained model using the MNIST dataset.

In [5]:
train_images_new = []
test_images_new = []

for i in range(7291):
    img = train_images[i].reshape(16,16)
    img = cv2.resize(img, dsize=(28, 28), interpolation=cv2.INTER_CUBIC)
    train_images_new.append(img.flatten())
    
for i in range(2007):
    img = test_images[i].reshape(16,16)
    img = cv2.resize(img, dsize=(28, 28), interpolation=cv2.INTER_CUBIC)
    test_images_new.append(img.flatten())

Let us now reshape them and convert them into the range [0, 1].

In [6]:
#convert list to numpy arrays
train_images_new = np.asarray(train_images_new)
test_images_new = np.asarray(test_images_new)

#train images
train_images_new = train_images_new.reshape((7291, 28, 28, 1))
train_images_new = train_images_new.astype('float32')
for i in range(7291):
    min_aux = min(train_images_new[i][0])
    max_aux = max(train_images_new[i][0]-min_aux)
    train_images_new[i] = (train_images_new[i]-min_aux)/max_aux

#test images
test_images_new = test_images_new.reshape((2007, 28, 28, 1))
test_images_new = test_images_new.astype('float32')
for i in range(2007):
    min_aux = min(test_images_new[i][0])
    max_aux = max(test_images_new[i][0]-min_aux)
    test_images_new[i] = (test_images_new[i]-min_aux)/max_aux

#labels
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [7]:
print(max(train_images_new[0][0]))
print(min(train_images_new[0][0]))

[1.]
[0.]


## Model: freezing convolutional part

We will now load the pretrained model.

In [8]:
model = load_model('MNIST_pretrained_model.h5')
model.summary()








Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________

In [9]:
# Freeze the layers except the last 2 layers
for layer in model.layers[:-2]:
    layer.trainable = False
    
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
__________

We will now train the model with the USPS dataset. The convolutional part is going to be frozen whether the two dense layers are going to be learnt.

In [11]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(train_images_new, train_labels, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Let us now evaluate the model on the test dataset.

In [12]:
test_loss, test_acc = model.evaluate(test_images_new, test_labels)



In [13]:
print('test accuracy: ', test_acc)
print('test loss: ', test_loss)

test accuracy:  0.9561534628799203
test loss:  0.23419712542630008


We see that the generalization accuracy and error are high and low respectively. So this trained model generalizes quite well to unseen data. 

## Model: fine-tuning

Load the pretrained model

In [14]:
model = load_model('MNIST_pretrained_model.h5')
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
__________

In [15]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(train_images_new, train_labels, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Notice the bad performance of this approach even in the training set. Even though, we are going to compute the generalization error and accuracy.

In [16]:
test_loss, test_acc = model.evaluate(test_images_new, test_labels)



In [17]:
print('test accuracy: ', test_acc)
print('test loss: ', test_loss)

test accuracy:  0.17887394112410923
test loss:  2.265857430673323


By fine-tuning the model, we do not get a good network so as to classify the USPS dataset.