# Transfer Learning
I'm try out training a network with ImageNet pre-trained weights as a base, but with additional network layers of mine added on. I'll also get to see the difference between using frozen weights and training on all layers.

In [5]:
freeze_flag = True
weights_flag = 'imagenet'
preprocess_flag = True

In [13]:
from tensorflow.keras.applications.inception_v3 import InceptionV3

import warnings

## Loads in InceptionV3 architecture
In the below, I've set Inception to use an input_shape of 139x139x3 instead of the default 299x299x3. This will help us to speed up our training a bit later(and I'll actually be upsampling from smaller images, so we aren't losing data here). In order to do so, I also must set include_top to False, which means the final fully-connected layer with 1,000 nodes for each ImageNet class is dropped, as well as a Global Average Pooling layer.

In [16]:
warnings.filterwarnings('ignore')

input_size = 139
inception = InceptionV3(weights=weights_flag, include_top=False, input_shape=(input_size, input_size, 3))

## Pre-trained with frozen weights
To start, we'll see how an ImageNet pre-trained model with all weights frozen in the InceptionV3 model performs, I will also drop the end layer and append new layers onto it, although I could do this in different ways(not drop the end and add new layers, drop more layers than I will here, etc.). 

In [20]:
if freeze_flag == True:
    for layer in inception.layers:
        layer.trainable = False

## Dropping layers
Before dropping layers, I should check out what the actual layers of the model are.

In [21]:
inception.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            (None, 139, 139, 3)  0                                            
__________________________________________________________________________________________________
conv2d_188 (Conv2D)             (None, 69, 69, 32)   864         input_3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_v1_188 (Bat (None, 69, 69, 32)   96          conv2d_188[0][0]                 
__________________________________________________________________________________________________
activation_188 (Activation)     (None, 69, 69, 32)   0           batch_normalization_v1_188[0][0] 
__________________________________________________________________________________________________
conv2d_189

In a normal Inception network, I would see from the model summary that the last two layers were a global average pooling layer, and a fully-connected "Dense" layer. However, since I set <code>include_top=False</code>, both of these get dropped. If I otherwise wanted to drop additional layers, I would use:

    inception.layers.pop()
Note that pop() works from the end of the model backwards.

It's important to note two things here:
 1. How many layers you drop is up tp you, typically. I dropped the final two already by setting include_top to False in the original loading of the model, but you could instead just run pop() twice to achive similar results. Additional layers could be dropped by addtional calls to pop().
 2. If you make a mistake with pop(), you'll want to reload the model. If you use it multipletimes, the model will continue to drop more and more layers, so you may need to check summary again to check your work.

### Adding new layers
While I've used Keras's Sequential model before for simplicity, we'll actually use the **ModelAPI** this time. This functions a little differently, in that instead of using model.add(), I explicitly tell the model which previous layer to attach to the current layer. This is useful if I want to use more advanced concepts like **skip layers**, for instance(which were used heavily in ResNet).

For example, if I had a previous layer named <code>inp</code>:

    x = Dropout(0.2)(inp)
    
is how I would attach a new dropout layer x, with it's input coming from a layer with the variable name <code>inp</code>.

I'm going to use the **CIFAR-10 dataset**, which consists of 60,000 32x32 images of 10 classes. We need to use Keras's Input function to do so, and then I want to re-size the images up to the input_size I specified earlier(139x139).

In [23]:
from tensorflow.keras.layers import Input, Lambda
import tensorflow as tf

# Makes the input placeholder layer 32x32x3 for CIFAR-10
cifar_input = Input(shape=(32,32,3))

resized_input = Lambda(lambda image: tf.image.resize_images(
    image,
    (input_size,input_size)))(cifar_input)
                 
inp = inception(resized_input)

In [25]:
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D

x = GlobalAveragePooling2D()(inp)

x = Dense(512, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

In [26]:
from tensorflow.keras.models import Model

model = Model(inputs=cifar_input, outputs=predictions)
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
lambda (Lambda)              (None, 139, 139, 3)       0         
_________________________________________________________________
inception_v3 (Model)         (None, 3, 3, 2048)        21802784  
_________________________________________________________________
global_average_pooling2d (Gl (None, 2048)              0         
_________________________________________________________________
dense (Dense)                (None, 512)               1049088   
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
Total params: 22,857,002
Trainable params: 1,054,218
Non-trainable params: 21,802,784
________________________________________________________

### Keras Callbacks
Keras **callbacks** allow me to gather and store additional information during training, such as the best model, or even stop training early if the validation accuracy has stopped improving. These methods can help to avoid overfitting, or avoid other issues.

There's two key callbacks to mention here, **ModelCheckpoint** and **EarlyStopping**. As the names may suggest, model checkpoint saves down the best model so far based on a given metric, while early stopping will end training before the specified number of epochs if the chosen metric no longer improves after a given amount of time.

I still need to actually feed these callbacks into fit() when I train the model(along with all other relevant data to feed into fit).

In [28]:
from sklearn.utils import shuffle
from sklearn.preprocessing import LabelBinarizer
from tensorflow.keras.datasets import cifar10

(X_train, y_train), (X_valid, y_valid) = cifar10.load_data()

lb = LabelBinarizer()
one_hot_train = lb.fit_transform(y_train)
one_hot_valid = lb.fit_transform(y_valid)

X_train, one_hot_train = shuffle(X_train, one_hot_train)
X_valid, one_hot_valid = shuffle(X_valid, one_hot_valid)

# I'm only going to use the first 10,000 images for speed reasons.
# And only the first 2,000 imges from test set.
X_train = X_train[:10000]
one_hot_train = one_hot_train[:10000]
X_valid = X_valid[:2000]
one_hot_valid = one_hot_valid[:2000]

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


### Data Preprocessing

In [32]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.inception_v3 import preprocess_input

if preprocess_flag == True:
    datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
    val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
else:
    datagen = ImageDataGenerator()
    val_datagen = ImageDataGenerator()

### Train the Model

In [36]:
warnings.filterwarnings('ignore')

batch_size = 32
epochs = 5

model.fit_generator(datagen.flow(X_train, one_hot_train, batch_size=batch_size),
                    steps_per_epoch=len(X_train)/batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=val_datagen.flow(X_valid, one_hot_valid, batch_size=batch_size),
                    validation_steps=len(X_valid)/batch_size)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x2ae937f0a58>