### REUSING PRETRAINED LAYERS

+ In Practice we cannot train deep neural networks from scratch.
+ Instead we can make use of the Existing Neural Networks that accomplishes a similar task.
+ Like this Reusing the Lower Layer for solving another Problems is Called Transfer Learning.

*ADVANTAGES OF TRANSFER LEARNING*

+ It will speed the Training.
+ It require only less training data.

**Transfer Learning with Keras**

+ Consider we have a model for Image Classification for MNIST Fashion dataset for 8 Classes.
+ Which is model A.
+ Now I need to build a Binary Classifier for Sandal and Shirt.
+ Your Dataset is Quite Small.
+ So we will take advantage of the lower layer of Model A to train and Develop the Binary Classifier.
+ split the data into two 
+ X_train_A =  all images of all items except for sandals and shirts (classes 5 and 6).
+ X_train_B = a much smaller training set of just the first 200 images of sandals or shirts.
+ The validation set and the test set are also split this way, but without restricting the number of images.
+ We will train a model on set A (classification task with 8 classes)
+ and try to reuse it to tackle set B (binary classification)
+ In the first `model A`, `we have an accuracy: 0.9255 and val_accuracy: 0.9228`

+ In the similar fashion, we trained the MODEL A.
+ `accuracy: 1.0000, val_accuracy: 0.9848`

+ Doing the Transfer Learning.

In [10]:
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [22]:
## fetch the mnist dataset from the library
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train_full = X_train_full / 255.0
X_test = X_test / 255.0
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

## split the data into two 
## X_train_A =  all images of all items except for sandals and shirts (classes 5 and 6).
## X_train_B = a much smaller training set of just the first 200 images of sandals or shirts.
## The validation set and the test set are also split this way, but without restricting the number of images.
## We will train a model on set A (classification task with 8 classes)
## and try to reuse it to tackle set B (binary classification)

def split_dataset(X, y):
    y_5_or_6 = (y == 5) | (y == 6) # sandals or shirts
    y_A = y[~y_5_or_6]
    y_A[y_A > 6] -= 2 # class indices 7, 8, 9 should be moved to 5, 6, 7
    y_B = (y[y_5_or_6] == 6).astype(np.float32) # binary classification task: is it a shirt (class 6)?
    return ((X[~y_5_or_6], y_A),
            (X[y_5_or_6], y_B))

(X_train_A, y_train_A), (X_train_B, y_train_B) = split_dataset(X_train, y_train)
(X_valid_A, y_valid_A), (X_valid_B, y_valid_B) = split_dataset(X_valid, y_valid)
(X_test_A, y_test_A), (X_test_B, y_test_B) = split_dataset(X_test, y_test)
X_train_B = X_train_B[:200]
y_train_B = y_train_B[:200]

print(X_train_A.shape)
print(X_train_B.shape)
print(y_train_A.shape)
print(y_train_B.shape)
print(X_valid_A.shape)
print(X_valid_B.shape)
print(X_test_A.shape)
print(X_test_B.shape)

print(y_train_A[:30])
print(y_train_B[:30])

print("===================BUILDING THE MODEL==========================")
## define the model A
model_A = keras.models.Sequential()
model_A.add(keras.layers.Flatten(input_shape=[28, 28]))
for n_hidden in (300, 100, 50, 50, 50):
    model_A.add(keras.layers.Dense(n_hidden, activation="selu"))
## classifying the 8 classes.
model_A.add(keras.layers.Dense(8, activation="softmax"))
print(model_A.summary())

print("===================COMPILE THE MODEL==========================")
## sparse categorical cross entropy is becuase we have 8 classes to classify.
## optimizer is used as SGD stochastic gradient descent.
## metrics is accuracy
model_A.compile(loss="sparse_categorical_crossentropy",
                optimizer=keras.optimizers.SGD(learning_rate=1e-3),
                metrics=["accuracy"])

print("====================TRAIN THE MODEL===========================")
history = model_A.fit(X_train_A, y_train_A, epochs=20,validation_data=(X_valid_A, y_valid_A))

print("=====================SAVE THE MODEL==========================")
model_A.save("model_A.h5")

(43986, 28, 28)
(200, 28, 28)
(43986,)
(200,)
(4014, 28, 28)
(986, 28, 28)
(8000, 28, 28)
(2000, 28, 28)
[4 0 5 7 7 7 4 4 3 4 0 1 6 3 4 3 2 6 5 3 4 5 1 3 4 2 0 6 7 1]
[1. 1. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1.
 1. 0. 1. 1. 1. 1.]
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_4 (Flatten)         (None, 784)               0         
                                                                 
 dense_24 (Dense)            (None, 300)               235500    
                                                                 
 dense_25 (Dense)            (None, 100)               30100     
                                                                 
 dense_26 (Dense)            (None, 50)                5050      
                                                                 
 dense_27 (Dense)            (None, 50)                255

In [23]:
## same way I'm training the MODEL B.
## check the model perfomance
print("===================BUILDING THE MODEL==========================")
## define the model A
model_B = keras.models.Sequential()
model_B.add(keras.layers.Flatten(input_shape=[28, 28]))
for n_hidden in (300, 100, 50, 50, 50):
    model_B.add(keras.layers.Dense(n_hidden, activation="selu"))
## classifying the 8 classes.
model_B.add(keras.layers.Dense(8, activation="softmax"))
print(model_B.summary())

print("===================COMPILE THE MODEL==========================")
## sparse categorical cross entropy is becuase we have 8 classes to classify.
## optimizer is used as SGD stochastic gradient descent.
## metrics is accuracy
model_B.compile(loss="sparse_categorical_crossentropy",
                optimizer=keras.optimizers.SGD(learning_rate=1e-3),
                metrics=["accuracy"])

print("====================TRAIN THE MODEL===========================")
history = model_B.fit(X_train_B, y_train_B, epochs=20,validation_data=(X_valid_B, y_valid_B))

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_5 (Flatten)         (None, 784)               0         
                                                                 
 dense_30 (Dense)            (None, 300)               235500    
                                                                 
 dense_31 (Dense)            (None, 100)               30100     
                                                                 
 dense_32 (Dense)            (None, 50)                5050      
                                                                 
 dense_33 (Dense)            (None, 50)                2550      
                                                                 
 dense_34 (Dense)            (None, 50)                2550      
                                                                 
 dense_35 (Dense)            (None, 8)                

#### TRANSFER LEARNING

In [34]:
print("=======================MODEL DEVELOPMENT========================")
## fetch the saved model.
model_A = keras.models.load_model("model_A.h5")
## created the model B on A
model_B_on_A = keras.models.Sequential(model_A.layers[:-1])
## adding the last layer for binary classifier.
model_B_on_A.add(keras.layers.Dense(1, activation="sigmoid"))
print(model_B_on_A.summary())
print("======================TAKING THE CLONE OF THE MODEL==========================")
## since the model A and model B on A, both are sharing the layers.
## we need to clone the layers of A.
## other while training the second model.
## both will modify.
## take the clone of A
model_A_clone = keras.models.clone_model(model_A)
## then set the weights at that time.
model_A_clone.set_weights(model_A.get_weights())
## now define the model B on A on top of the A clone
model_B_on_A = keras.models.Sequential(model_A_clone.layers[:-1])
## adding the last layer
model_B_on_A.add(keras.layers.Dense(1, activation="sigmoid"))

## when the shared layer got trained again.
## there will be large error.
## so freeze those layers to avoid these issues.
## and give time to learn for the new layers.
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = False
## we need to compile the model after freeze or unfreeze the layers    
print("=======================COMPILE THE MODEL=======================")
model_B_on_A.compile(loss="binary_crossentropy",
                     optimizer=keras.optimizers.SGD(learning_rate=1e-3),
                     metrics=["accuracy"])
print("========================TRAIN THE MODEL FOR 4 EPOCHS=======================")
history = model_B_on_A.fit(X_train_B, y_train_B, epochs=4,validation_data=(X_valid_B, y_valid_B))

## now unfreeze the reused layer
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = True
print("======================COMPILE THE MODE AGAIN AFTER UNFREEZING THE RESUED LAYERS=====================")
## reduce the learning rate.
optimizer = keras.optimizers.SGD(learning_rate = 1e-4)
## compile the model again
model_B_on_A.compile(loss="binary_crossentropy",
                     optimizer=optimizer,
                     metrics=["accuracy"])

print("==============TRAIN THE MODEL AGAIN=====================")
history = model_B_on_A.fit(X_train_B, y_train_B, epochs=16,validation_data=(X_valid_B, y_valid_B))

Model: "sequential_21"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_4 (Flatten)         (None, 784)               0         
                                                                 
 dense_24 (Dense)            (None, 300)               235500    
                                                                 
 dense_25 (Dense)            (None, 100)               30100     
                                                                 
 dense_26 (Dense)            (None, 50)                5050      
                                                                 
 dense_27 (Dense)            (None, 50)                2550      
                                                                 
 dense_28 (Dense)            (None, 50)                2550      
                                                                 
 dense_51 (Dense)            (None, 1)               

In [39]:
## checking the output
model_B.evaluate(X_test_B, y_test_B)



[0.09681594371795654, 0.9810000061988831]

In [37]:
model_B_on_A.evaluate(X_test_B, y_test_B)



[0.2728533446788788, 0.9129999876022339]

There is no Improvemment I can See on the Transfer Learning Model.
+ Transfer Learning work best with Deep CNNs, which tend to learn feature detectors that are much more general.

****

### UNSUPERVISED PRETRAINING

+ Suppose you want tackle a complex task.
+ But you dont have much labelled training data.
+ And you cant find a model trained for similar tasks.
+ Then you can use Unsupervised Pretraining.

    + gather plenty of unlabelled training data.
    + use it to train unsupervised model.
    + such as AutoEncoders, GANS etc.
    + then you can reuse lower layer of these AutoEncoders and GAN.
    + Add the output layer of your new task on top of it.
    + Fine tune the network using SuperVised Learning.

***

### PRETRAINING ON AUXILIARY TASKS

+ If you dont have much labelled training data.
+ train a NN on auxiliary tasks, for which you can easily obtain or generate the labelled data.
+ then reuse the lower layer of that network to for your actual task.
+ first NN lower layer will learn feature detectors.
+ then we can reuse it to define the second NN.

****