[View in Colaboratory](https://colab.research.google.com/github/CameronBeebe/transfer_keras/blob/TL-FIX/Copy_of_TL_keras_lab_FIX.ipynb)

In [0]:
import numpy as np
import keras
import tensorflow as tf
from keras.models import Sequential, load_model, Model, Input
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPool2D, InputLayer, ZeroPadding2D, GlobalAvgPool2D, Reshape, Softmax
from keras.datasets import cifar10
from keras.applications.mobilenet import MobileNet
from keras.utils import np_utils
import matplotlib.pyplot as plt
from skimage.transform import resize
from sklearn.metrics import mean_squared_error, mean_squared_log_error

%matplotlib inline

In [0]:
#  This tutorial attempts to achieve transfer learning on incompatible image sizes with MobileNet.
#  Why do this, when we could just load a different (compatible) data set, or use a different model?
#  Because!  Well, this is a tutorial, and it illustrates some practical issues implementing transfer learning.
#  We will resize a small batch of images to show how pre-trained weights affect fitting and prediction.
#  The unbiased and biased MobileNet objects will have the same layer architecture except for the weights.
#  For such a small dataset, the results using a pre-trained network are much better than when starting with a fresh network.
#  However, both models in the current set-up have performed terribly on prediction and validation.
#  With such small data sets, 


#  This notebook is intended to be used with a GPU.  
#  Otherwise, it just takes too long to train and experiment.
#  I recommend doing it on Google Colab:  https://colab.research.google.com/

#  Table of Contents:
#  
#  1.  Data Processing
#  2.  Create MobileNet Conv. Net. Models
#  3.  Prediction and Scoring
#


# 1.  Data Processing

## Load Data: (32,32,3) Images

In [0]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

#  Check shapes.

In [0]:
#  Convince yourself that the labels are consistent with the data.  
#  See https://www.cs.toronto.edu/~kriz/cifar.html
print(y_train[4999])   
plt.imshow(x_train[4999])

## Resize Train Data

In [0]:
%%time

#  Pretrained weights only exist for certain shapes, which is why we get an error with smaller image sizes.
#  One option to deal with the fact that MobileNet does not like (32,32,3) shape is to resize the images.
#  NOTE:  Resizing the entire training data set would take, with my precise calculations, a very long time.
#  I will only be resizing and training on 5,000.  If you want to save afterwards, it will create about a 2 gig file.  
#  I will not include a resized file on github (although it is possible to host large files).
#  
#  It is much faster for me when on Google Colab (~12 minutes).  
#  If you want to load a saved file, you might have to host it on Google Drive.
#  You might get a memory warning (which is also one reason it takes so long on my computer).

#  I tried 10000 on Colab.  It took about 45 minutes filling up 12 Gigs of memory.  Runtime crash ensues.
#  8000 ~30min, crash
#  7000 ~20 min, did okay for a few trains and then crash
#  6000 may be stable

#  Base case.
resized_train_data = []
resized_train_data = np.reshape(np.append(resized_train_data, resize(x_train[0],(128,128,3))),(128,128,3))

#  Change range to 50000 if you want to do the entire set.  NOT RECOMMENDED
for row in range(5000):
    if row > 0:
        resized = resize(x_train[row],(128,128,3))
        resized_train_data = np.reshape(np.append(resized_train_data, resized),((row+1),128,128,3))
          

In [0]:
#  Check
plt.imshow(resized_train_data[4999])

## Resize Test Data

In [0]:
#  Also resize test data
resized_test_data = []
resized_test_data = np.reshape(np.append(resized_test_data, resize(x_test[0],(128,128,3))),(128,128,3))

#  Change range to 10000 if you want to resize entire test data.  NR
for row in range(1000):
    if row > 0:
        resized = resize(x_test[row],(128,128,3))
        resized_test_data = np.reshape(np.append(resized_test_data, resized),((row+1),128,128,3))
          

In [0]:
plt.imshow(x_test[17])

In [0]:
plt.imshow(resized_test_data[17])

## Scale Data

In [0]:
resized_train_data /= 255

In [0]:
#  Also for test data (which may have a different name if you are loading them.)
resized_test_data /= 255

In [0]:
resized_train_data.shape

## Encode Category Labels

In [0]:
#  Encode labels.
y_train_encoded = np_utils.to_categorical(y_train)
y_test_encoded = np_utils.to_categorical(y_test)

In [0]:
#  Grab relevant number of labels for resized batch.
fivek_labels = y_test_encoded[:5000]
fivek_labels.shape

# 2.  Create MobileNet Conv. Net. Models

## Create Unbiased Conv. NN

In [0]:
#  With the resized image shapes, we can use them directlly with MobileNet.  
#  Otherwise, it would throw an error that the input_shape is too small.
#  However, I am using a dataset of 5000, which does not seem to train very well.
#  It will do pretty good in higher numbers of epochs, which you can check.
#  This illustrates one of the uses of transfer learning:  when data sets are too small to properly train.

fresh = MobileNet(input_shape=(128,128,3),include_top =True, weights=None,classes=10)
fresh.summary()
fresh.compile(optimizer='adadelta',loss='categorical_crossentropy',metrics=['accuracy'])

## Fit

In [0]:
%%time

#  Note how long it takes to train per epoch, the loss, and accuracy.  
#  We will compare with the pre-trained model below.
history_fresh = fresh.fit(resized_train_data,fivek_labels,epochs = 20,batch_size=64,validation_split=0.1)

## Plot

In [0]:
plt.plot(history_fresh.history['loss'])
#plt.plot(history_fresh.history['acc'])

## Create Biased Conv. NN

In [0]:
#  Since we are doing a more complex model, we use the functional API Model() class from keras. 
#  This lets us be very explicit about inputs and outputs for the layers in the model.
    
#  Create tensor object to pass into the trained_model.
inputs = Input(shape = (128,128,3)) 

#  For this MobileNet object with pre-trained weights, we will need to chop off the classification layers: include_top=False. 
trained_model = MobileNet(input_shape = (128,128,3),include_top = False, weights='imagenet', input_tensor = inputs)

#  Freeze: keep some pre-trained weights as they are.  Somebody already spent time training to classify images.
#  I am freezing everything except the last depthwise convolution (and classification convolution).
#  Check number of trainable parameters in summary after freezing layers.
#  Compare results when training all pre-trained weights and when freezing some (or all) weights.  
#  Why does convergence suffer when all weights are frozen?
for layer in trained_model.layers[:89]:
    layer.trainable = False

#  Then, copy as close as possible the structure of the layers removed by include_top=False.
x = GlobalAvgPool2D(data_format='channels_last')(trained_model.output)
x = Reshape((1,1,-1))(x)

#  Adjust dropout rate (e.g. try 0.5) to see how it helps prevent overfitting.  
#  Alternatively, you can just use less epochs.
x = Dropout(rate=0.001)(x)
x = Conv2D(filters=10,kernel_size=(1,1))(x)
x = Activation(activation = 'softmax')(x)
predictions = Reshape((-1,))(x)

#  Put the layers together by a transfer_model that takes inputs and outputs.
transfer_model = Model(inputs = inputs,outputs = predictions)
transfer_model.compile(optimizer = 'adadelta',loss = 'categorical_crossentropy',metrics=['accuracy'])
transfer_model.summary()

In [0]:
%%time

#  Notice fitting the pre-trained model is much quicker, each epoch takes much less time.
#  Additionally, the rate of accuracy increase and loss decrease is much higher.
#  It can even be considered 'overfitting' in a fraction of the time the unbiased network takes to fit.
#  Try adjusting the dropout rate to mitigate overfitting, or reducing epochs.
history_transfer = transfer_model.fit(resized_train_data,fivek_labels,epochs=20,batch_size=64,validation_split=0.1)

## Compare Performance

In [0]:
plt.plot(history_transfer.history['loss'])

In [0]:
plt.plot(history_transfer.history['acc'])

In [0]:
#  Compare with plots from unbiased network.
plt.plot(history_fresh.history['loss'])

In [0]:
plt.plot(history_fresh.history['acc'])

# 3.  Prediction and Scoring

In [0]:
#  NOTE:  In the current state, these models are not good at prediction.
#  We could see this by looking at the validation accuracy when training above.
#  Try experimenting to improve validation when training.

#  See how our transferred model does on the test data set.
#  We hope to get comparible prediction scores.

#  Supress scientific notation for easier comparison.
np.set_printoptions(suppress=True)

#  Predict a class and look at an example to compare between biased and unbiased.
#  What do you expect the comparison to show?
unbiased_prediction = fresh.predict(resized_test_images)

In [0]:
unbiased_prediction[158]

In [0]:
#  Note: predict gives us probabilities like predict_proba for other models.  Check that they sum to one.
sum(unbiased_prediction[158])

In [0]:
biased_prediction = transfer_model.predict(resized_test_images)
biased_prediction[158]

In [0]:
#  True label:
y_test_encoded[158]

In [0]:
#  Scores: lower is better: smaller distance between prediction and true label.  
#  Try looking at prediction scores before and after training, and after different amounts of training.
#  Why (and when) might our pre-trained model do worse on these scores?

#  Biased
print(mean_squared_error(y_test_encoded[:1000],biased_prediction))
mean_squared_log_error(y_test_encoded[:1000],biased_prediction)

In [0]:
#  Unbiased
print(mean_squared_error(y_test_encoded[:1000],unbiased_prediction))
mean_squared_log_error(y_test_encoded[:1000],unbiased_prediction)