# Transfer Learning 

## Definiton 

[**Transfer Learning**](https://en.wikipedia.org/wiki/Transfer_learning): Transfer learning or inductive transfer is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. This area of research bears some relation to the long history of psychological literature on transfer of learning, although formal ties between the two fields are limited. 

https://en.wikipedia.org/wiki/Transfer_learning

## Common Methods

1. Using **pre-trained CNN features**
2. Learning **domain-invariant representations**
3. Making representations more similar
4. Confusing domains

http://ruder.io/transfer-learning/index.html#adaptingtonewdomains

## Applications


* Learning from _simulations_
*  Adapting to _new_ _domains_
*  Transferring knowledge across _new_ _languages_


...


http://ruder.io/transfer-learning/index.html#adaptingtonewdomains


# Import Modules

In [1]:
import numpy as np

from keras.models import Sequential,Model 
from keras.layers import Convolution2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Activation, Dropout, Flatten, Dense , BatchNormalization 
from keras import backend as K
from keras.preprocessing import image 
from keras import applications
from keras.callbacks import ModelCheckpoint ,EarlyStopping 
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.xception import preprocess_input
from matplotlib import pyplot as plt
import seaborn as sns 
sns.set(color_codes = True)
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


# Define Parameters

In [2]:
batch_size = 64
num_classes = 2
epochs = 2
img_width, img_height = 256, 256
n_train_samples = 6000
n_validation_samples = 1000
n_test_samples = 200

train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
test_data_dir = 'data/test'
top_model_weights_path = 'cat_dog_Xception.h5'

based_model_last_block_layer_number = 126

# Examine the Data

## Directory Setup

**data/train**
 
 
 cat/
 
 dog/
 
 
**data/validation**
  
  
  cat/
  
  dog/
  
  
  **data/test**
  
  
  cat/
  
  dog/


## Training data

In [None]:
from keras.preprocessing import image 
import PIL
plt.figure(figsize =(10,10))
images = range(0,9)
for i in images:
    plt.subplot(330 + 1 + i)
    plt.imshow(image.load_img('data/train/cat/cat.'+str(i) +'.jpg', target_size = (150,150)),
               cmap=plt.get_cmap('gray'))
    plt.xlabel('cat '+str(i))
plt.show()

In [None]:
plt.figure(figsize =(10,10))
images = range(0,9)
for i in images:
    plt.subplot(330 + 1 + i)
    plt.imshow(image.load_img('data/train/dog/dog.'+str(9000+i) +'.jpg', target_size = (150,150)),
               cmap=plt.get_cmap('gray'))
    plt.xlabel('dog '+str(i))
plt.show()

## After Augmentation


In [None]:
# Initialize Generator
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

img = load_img('data/train/cat/cat.0.jpg') 
x = img_to_array(img)  
x = x.reshape((1,) + x.shape)  

i = 0
for batch in train_datagen.flow(x, batch_size=1,save_to_dir='data/aug_images', 
                          save_prefix='cat', save_format='jpeg'):
    i += 1
    if i > 20:
        break 

![title](collage.jpg)

# Load Pretrained Xception Model 


#### keras.applications.xception.Xception(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)

**Xception V1 model, with weights pre-trained on ImageNet.**


**include_top:** whether to include the fully-connected layer at the top of the network.


**weights:** one of None (random initialization) or 'imagenet' (pre-training on ImageNet).


**input_tensor:** optional Keras tensor (i.e. output of  layers.Input()) to use as image input for the model.


**input_shape:** optional shape tuple, only to be specified if  include_top is False (otherwise the input shape has to be  (224, 224, 3) (with 'channels_last' data format) or  (3, 224, 224) (with 'channels_first' data format). It should have exactly 3 inputs channels, and width and height should be no smaller than 197. E.g. (200, 200, 3) would be one valid value.


**pooling:** Optional pooling mode for feature extraction when  include_top is False.
None means that the output of the model will be the 4D tensor output of the last convolutional layer.
'avg' means that global average pooling will be applied to the output of the last convolutional layer, and thus the output of the model will be a 2D tensor.
'max' means that global max pooling will be applied.


**classes:** optional number of classes to classify images into, only to be specified if include_top is True, and if no weights argument is specified.


It returns a Keras Model instance. Read more: https://keras.io/applications/#xception

**Xception Network:**


Paper: https://arxiv.org/pdf/1610.02357.pdf


Github: https://github.com/fchollet/deep-learning-models/blob/master/xception.py

![title](xception.png)

In [None]:
model = applications.xception.Xception(include_top = False,
                                     weights ='imagenet',
                                     input_shape = (img_width, img_height, 3))

## Build a Classifier Model

In [None]:
x = model.output
x = GlobalAveragePooling2D()(x)
x = BatchNormalization()(x)
x = Dense(1024, activation ='relu')(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
y_pred = Dense(2, activation='softmax')(x)

In [None]:
model_final = Model(inputs=model.input, outputs= y_pred)
model_final.summary()

## Unfreeze the layers that you are going to retrain
Here we freeze the rest layers except the layers in the last block of Xception network.

In [None]:
for layers in model.layers[:based_model_last_block_layer_number]:
    layers.trainable= False
for layer in model.layers[based_model_last_block_layer_number:]:
    layer.trainable = True

# Model Training 

We will use **.flow_from_directory()** to generate batches of image data (and their labels) directly from our .jpgs in their respective directories.

**flow_from_directory(directory)**: Takes the path to a directory, and generates batches of augmented/normalized data. Yields batches indefinitely, in an infinite loop.

More details: https://keras.io/preprocessing/image/

In [None]:
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)

In [None]:
train_generator = train_datagen.flow_from_directory(train_data_dir,
                                       target_size = (img_width, img_height),
                                       batch_size = batch_size,
                                       class_mode =  "categorical")

validation_generator = test_datagen.flow_from_directory(
                                        validation_data_dir,
                                        target_size= (img_width, img_height),
                                        batch_size=batch_size,
                                        class_mode= "categorical")

test_generator = test_datagen.flow_from_directory(test_data_dir,
                                target_size= (img_width, img_height),
                                batch_size=batch_size,
                                class_mode= "categorical")

## Compile the Model

**keras.optimizers.Nadam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, schedule_decay=0.004)
**


Nesterov Adam optimizer.

Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum.

To learn more: https://keras.io/optimizers/

In [None]:
model_final.compile(loss='categorical_crossentropy',
              optimizer='nadam',
              metrics=['accuracy'])

## Train and Save the Model
Now, we can use these generator to train our model. with **model.fit_generator()**.


Save the model with **keras.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1** which will save the model after every epoch.

Learn More: https://keras.io/callbacks/

In [None]:
def plot_model_training(model_history):
    fig, axs = plt.subplots(1,2,figsize=(15,5))
    # summarize history for accuracy
    axs[0].plot(range(1,len(model_history.history['acc'])+1),model_history.history['acc'])
    axs[0].plot(range(1,len(model_history.history['val_acc'])+1),model_history.history['val_acc'])
    axs[0].set_title('Model Accuracy')
    axs[0].set_ylabel('Accuracy')
    axs[0].set_xlabel('Epoch')
    axs[0].set_xticks(np.arange(1,len(model_history.history['acc'])+1),len(model_history.history['acc'])/10)
    axs[0].legend(['train', 'val'], loc='best')
    # summarize history for loss
    axs[1].plot(range(1,len(model_history.history['loss'])+1),model_history.history['loss'])
    axs[1].plot(range(1,len(model_history.history['val_loss'])+1),model_history.history['val_loss'])
    axs[1].set_title('Model Loss')
    axs[1].set_ylabel('Loss')
    axs[1].set_xlabel('Epoch')
    axs[1].set_xticks(np.arange(1,len(model_history.history['loss'])+1),len(model_history.history['loss'])/10)
    axs[1].legend(['train', 'val'], loc='best')
    plt.show()

In [None]:
checkpoint = ModelCheckpoint(filepath = top_model_weights_path,
                            verbose = 1,
                            save_best_only = True,
                            monitor = 'val_acc')
early = EarlyStopping(monitor='val_acc', min_delta = 0, patience =5,verbose=1, mode ='auto')

history = model_final.fit_generator(
        train_generator,
        steps_per_epoch = n_train_samples // batch_size,
        epochs= epochs,
        validation_data=validation_generator,
        validation_steps = n_validation_samples // batch_size,
        callbacks = [checkpoint,early])

In [None]:
plot_model_training(history)

# Evaluate the Model 

**
evaluate_generator(self, generator, steps=None, max_queue_size=10, workers=1, use_multiprocessing=False)**


Evaluates the model on a data generator.

The generator should return the same kind of data as accepted by test_on_batch.

In [None]:
scores, acc = model_final.evaluate_generator(test_generator, steps =n_test_samples)

In [None]:
print('Accuracy: %.2f%%' % (acc*100))

## Make Predictions

**predict_generator(self, generator, steps=None, max_queue_size=10, workers=1, use_multiprocessing=False, verbose=0)**


Generates predictions for the input samples from a data generator.

The generator should return Numpy array(s) of predictions.




In [None]:
#predictions = model_final.predict_generator(test_generator, steps= n_test_samples)

In [None]:
img = image.load_img('andy.jpg')
target_size = (256,256)

if img.size != target_size:
    img = img.resize(target_size)

x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model_final.predict(x)
print('Probabilities for dog and cat:', preds[0])
plt.imshow(img)
plt.axis('off')

plt.figure()
labels = ("dog","cat")
plt.barh([0, 1], preds[0], alpha=0.5)
plt.yticks([0, 1], labels)
plt.xlabel('Probability')
plt.xlim(0,1.01)
plt.tight_layout()
plt.show()

In [None]:
img = image.load_img('zao.jpg')
target_size = (256,256)

if img.size != target_size:
    img = img.resize(target_size)

x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model_final.predict(x)
print('Probabilities for cat and dog:', preds[0])
plt.imshow(img)
plt.axis('off')

plt.figure()
labels = ('cat','dog')
plt.barh([0, 1], preds[0], alpha=0.5)
plt.yticks([0, 1], labels)
plt.xlabel('Probability')
plt.xlim(0,1.01)
plt.tight_layout()
plt.show()

# Further improvement

To further improve the result, you can try to 'fine-tune" the last convolutional block of the Xception network alongside the top-level classifier. 

**Fine-tuning** consist of starting from a trained network, then re-train it on a new dataset using a very small weight updates (small learning rate). In our case, this can be done in 3 stpes: 

1. Instantiate the convolutional base of Xception and load the weights;
2. Add our previous defined fully-connected model on top and load its weights;
3. Freeze the layers on the Xception model up to the last convolutional block.


Github Link: https://github.com/haohan723/Distributed-Deep-Learning/blob/master/Keras%20Introduction/Training_with_Augmentation_2.ipynb