## Transfer learning assignment

Hi folks. Today we're going to be training a convnet to recognize desserts using transfer learning, then comparing it to a simple convnet. This exercise is quite memory and CPU intensive so try to close any unnecessary programs before you begin.

*Be sure to run this docker container from the command line with:*

*```
docker run -it --name tensorflowboard -p 8889:8888 -p 6006:6006 -v "$PWD":/tf tensorflow/tensorflow:2.0.0a0-py3-jupyter
```*

*to run and use tensorflowboard.*

> Be sure to run this docker container from the command line with:
>
> ```sh
> docker run -it --name tensorflowboard -p 8889:8888 -p 6006:6006 -v "$PWD":/tf tensorflow/tensorflow:2.0.0a0-py3-jupyter
> ```
> to run and use tensorflowboard.

#### Step 1: Data
Inspect the data/ folder. To make it easy to load images into Keras, it's been split into a training and validation folders, with an additional holdout set to evaluate model performance at the end.

In [1]:
ls data

[0m[01;34mholdout_more[0m/   [01;34mtrain_more[0m/   [01;34mvalidation_more[0m/
[01;34mholdout_small[0m/  [01;34mtrain_small[0m/  [01;34mvalidation_small[0m/


#### Step 2: Simple ConvNet
1. Using the create_model function in simple_cnn.py (this is the same ConvNet you built yesterday), create a keras model. Use 100x100x3 (100 pixels square with channels for RGB) as the input size while testing to save time, but we will increase this later

In [2]:
import simple_cnn

In [3]:
model = simple_cnn.create_model((100,100,3), n_categories=5)

2. Previously, we used model.fit() to run the model. However, the fit() method will load all of your data into memory, which is generally unusable for large datasets. To deal with this, we'll be using data generators, which load data on the fly. The keras ImageDataGenerator also makes it very easy to implement data augmentation, which we can use to increase our validation accuracy.  
Make two image data generators: one for training data and one for validation. for both, use the Xception preprocessor, which performs a couple quick scaling and transformation operations.  

```python
from keras.applications.xception import preprocess_input
```



You can decide what image augmentation to use in the training datagen, but don't use augmentation in the validation datagen as we want that to be indicative of real world inputs to our model.

In [4]:
from tensorflow.keras.applications.xception import preprocess_input
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [5]:
trainimggen = ImageDataGenerator(preprocessing_function=preprocess_input, 
                                  horizontal_flip=True, 
                                  rotation_range=30, 
                                  width_shift_range=3, 
                                  height_shift_range=3,
                                  brightness_range=None,
                                  shear_range=3,
                                  zoom_range=3,
                                  channel_shift_range=3)
validationimggen = ImageDataGenerator(preprocessing_function=preprocess_input)

3. Using your image datagens, use flow_from_directory to make two generators, one for training and one for validation.  Start with target_size 100x100 and batch_size 16.

In [6]:
ls data/train_small/

[0m[01;34mcarrot_cake[0m/   [01;34mpanna_cotta[0m/      [01;34mstrawberry_shortcake[0m/
[01;34mcreme_brulee[0m/  [01;34mred_velvet_cake[0m/


In [7]:
batch_size=32
filepath_train = 'data/train_small/'
filepath_val = 'data/validation_small'
train_small = trainimggen.flow_from_directory(filepath_train, target_size=(100,100), batch_size=batch_size)
val_small = validationimggen.flow_from_directory(filepath_val, target_size=(100,100), batch_size=batch_size)

Found 1249 images belonging to 5 classes.
Found 331 images belonging to 5 classes.


4. Compile model using your favorite optimizer

In [8]:
model.compile(optimizer='SGD', loss=['categorical_crossentropy'], metrics=['accuracy'])

5. Run your model for a few epochs using the fit_generator method. steps_per_epoch is generally equal to the number of training images / batch_size, and validation steps is number of validation images / batch_size

In [9]:
! pip install pillow;
! pip install scipy;
# If this ran something other than 'Requirement already satisfied', you will need to reset your kernal

[33mYou are using pip version 19.0.3, however version 20.2b1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[33mYou are using pip version 19.0.3, however version 20.2b1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [10]:
train_small[0][0].shape

(32, 100, 100, 3)

In [12]:
training_images = 30
steps_per_epoch = training_images / batch_size
validation_steps = 30 / batch_size
model.fit_generator(generator=train_small, 
                    epochs=2, 
                    steps_per_epoch=steps_per_epoch, 
                    validation_steps=validation_steps)

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7f4290345be0>

5. After you've gotten that to work, add a tensorboard callback so you can monitor training status

In [13]:
from tensorflow.keras.callbacks import TensorBoard

# Load the TensorBoard notebook extension
%load_ext tensorboard.notebook

In [14]:
import datetime

In [15]:
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

In [16]:
project_name = "transfer_learning"

In [17]:
tensorboard = TensorBoard(log_dir=log_dir, histogram_freq=1)

# tensorboard = TensorBoard(log_dir=project_name, 
#                           histogram_freq=0, 
# #                           batch_size=batch_size, 
#                           write_graph=True, 
#                           embeddings_freq=0)

In [18]:
model.fit_generator(generator=train_small, 
                    validation_data=val_small,
                    epochs=2, 
                    steps_per_epoch=steps_per_epoch, 
                    validation_steps=validation_steps,
                    callbacks=[tensorboard])

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7f4261f6a240>

In [19]:
%tensorboard --logdir logs/fit

6. The model at the end of training session is not necessarily the best model! To fix this, we will add another callback that saves our best model to disk for use later. Use keras.callbacks.ModelCheckpoint to make a callback and pass it to the fit_generator. You can use save_best_only=True to prevent saving tons of models on your computer.

In [20]:
from tensorflow.keras.callbacks import ModelCheckpoint

In [21]:
!mkdir models

mkdir: cannot create directory ‘models’: File exists


In [22]:
mdl_check = ModelCheckpoint(filepath='models/best_model.hdf5',
                            save_best_only=True)
model.fit_generator(generator=train_small, 
                    validation_data=val_small,
                    epochs=2, 
                    steps_per_epoch=steps_per_epoch, 
                    validation_steps=validation_steps,
                    callbacks=[mdl_check, tensorboard])

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7f4261f6a208>

7. Finally, let's evaluate our model on the holdout set.
First, load your best model from disk:
```python
from keras.models import load_model
best_model = load_model(file_path_to_model)
```
You can make a holdout_generator with validation_datagen.flow_from_directory and pass it the holdout folder instead of the validation folder. Then use model.evaluate_generator(), which is very similar to fit_generator to output the holdout loss and holdout accuracy.

```python
metrics = best_model.evaluate_generator(<your code here>)
```

In [23]:
ls models

best_model.hdf5        simple_class_test.hdf5
best_trans_model.hdf5  transfer_test.hdf5


In [24]:
from tensorflow.keras.models import load_model
file_path_to_model = 'models/best_model.hdf5'
best_model = load_model(file_path_to_model)

In [25]:
metrics = best_model.evaluate_generator(generator=val_small)
metrics

[1.6057954593138262, 0.21450152]

##### Checkpoint 1: Congratulations! You just created a very practical set-up for modeling with a ConvNet, where you can read in large datasets with ease, save the best models and monitor the progress on a tensorboard!

#### Step 2: Transfer Model

1. Create a function that takes Xception (from keras.applications) and adds a new head for our current task onto it. Use a GlobalAveragePooling2D layer and a Dense layer with a softmax activation.

In [26]:
from tensorflow.keras.applications import Xception

In [27]:
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Flatten, Dropout
from tensorflow.keras.models import Model

def create_transfer_model(input_size, n_categories, weights = 'imagenet'):
        # note that the "top" is not included in the weights below
        base_model = Xception(weights=weights,
                          include_top=False,
                          input_shape=input_size)
        
        model = base_model.output
        model = GlobalAveragePooling2D()(model)
        predictions = Dense(n_categories, activation='softmax')(model)
        model = Model(inputs=base_model.input, outputs=predictions)
        
        return model
    
transfer_model = create_transfer_model(input_size=(100,100,3), n_categories=5)

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.4/xception_weights_tf_dim_ordering_tf_kernels_notop.h5


2. Set all of the layers except for the new head to untrainable, then compile it with you favorite optimizer. We want to warm up the head slowly, so use a lower learning rate than you normally would ~(2x to 10x smaller).

In [28]:
def print_model_properties(model, indices = 0):
     for i, layer in enumerate(model.layers[indices:]):
        print("Layer {} | Name: {} | Trainable: {}".format(i+indices, layer.name, layer.trainable))
        
print_model_properties(transfer_model)

Layer 0 | Name: input_1 | Trainable: True
Layer 1 | Name: block1_conv1 | Trainable: True
Layer 2 | Name: block1_conv1_bn | Trainable: True
Layer 3 | Name: block1_conv1_act | Trainable: True
Layer 4 | Name: block1_conv2 | Trainable: True
Layer 5 | Name: block1_conv2_bn | Trainable: True
Layer 6 | Name: block1_conv2_act | Trainable: True
Layer 7 | Name: block2_sepconv1 | Trainable: True
Layer 8 | Name: block2_sepconv1_bn | Trainable: True
Layer 9 | Name: block2_sepconv2_act | Trainable: True
Layer 10 | Name: block2_sepconv2 | Trainable: True
Layer 11 | Name: block2_sepconv2_bn | Trainable: True
Layer 12 | Name: conv2d_2 | Trainable: True
Layer 13 | Name: block2_pool | Trainable: True
Layer 14 | Name: batch_normalization_v1 | Trainable: True
Layer 15 | Name: add | Trainable: True
Layer 16 | Name: block3_sepconv1_act | Trainable: True
Layer 17 | Name: block3_sepconv1 | Trainable: True
Layer 18 | Name: block3_sepconv1_bn | Trainable: True
Layer 19 | Name: block3_sepconv2_act | Trainable: Tr

In [29]:
def change_trainable_layers(model, trainable_index):
    for layer in model.layers[:trainable_index]:
        layer.trainable = False
    for layer in model.layers[trainable_index:]:
        layer.trainable = True
        
change_trainable_layers(transfer_model, 132)

In [30]:
print_model_properties(transfer_model)

Layer 0 | Name: input_1 | Trainable: False
Layer 1 | Name: block1_conv1 | Trainable: False
Layer 2 | Name: block1_conv1_bn | Trainable: False
Layer 3 | Name: block1_conv1_act | Trainable: False
Layer 4 | Name: block1_conv2 | Trainable: False
Layer 5 | Name: block1_conv2_bn | Trainable: False
Layer 6 | Name: block1_conv2_act | Trainable: False
Layer 7 | Name: block2_sepconv1 | Trainable: False
Layer 8 | Name: block2_sepconv1_bn | Trainable: False
Layer 9 | Name: block2_sepconv2_act | Trainable: False
Layer 10 | Name: block2_sepconv2 | Trainable: False
Layer 11 | Name: block2_sepconv2_bn | Trainable: False
Layer 12 | Name: conv2d_2 | Trainable: False
Layer 13 | Name: block2_pool | Trainable: False
Layer 14 | Name: batch_normalization_v1 | Trainable: False
Layer 15 | Name: add | Trainable: False
Layer 16 | Name: block3_sepconv1_act | Trainable: False
Layer 17 | Name: block3_sepconv1 | Trainable: False
Layer 18 | Name: block3_sepconv1_bn | Trainable: False
Layer 19 | Name: block3_sepconv2_

3. From here, you can run the warmup phase the same way that you ran the simple model with the generators and the fit_generator method

In [31]:
transfer_model.compile(optimizer='SGD', loss=['categorical_crossentropy'], metrics=['accuracy'])
mdl_check_trans = ModelCheckpoint(filepath='models/best_trans_model.hdf5',
                            save_best_only=True)
transfer_model.fit_generator(generator=train_small, 
                    validation_data=val_small,
                    epochs=2, 
                    steps_per_epoch=steps_per_epoch, 
                    validation_steps=validation_steps,
                    callbacks=[mdl_check_trans, tensorboard])

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7f41f80d4f60>

4. After a few warmup epochs, unfreeze the 14th convolutional block onward, recompile and continue training, again with a low learning rate or an adaptive optimizer.

In [34]:
change_trainable_layers(transfer_model, 126)
transfer_model.compile(optimizer='SGD', loss=['categorical_crossentropy'], metrics=['accuracy'])

transfer_model.fit_generator(generator=train_small, 
                    validation_data=val_small,
                    epochs=2, 
                    steps_per_epoch=steps_per_epoch, 
                    validation_steps=validation_steps,
                    callbacks=[mdl_check_trans])

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7f41f0d04898>

5. Evaluate your performance on with the transfer model. Is it better than the simple ConvNet?

In [35]:
from tensorflow.keras.models import load_model
file_path_to_trans_model = 'models/best_trans_model.hdf5'
best_trans_model = load_model(file_path_to_trans_model)

metrics = best_model.evaluate_generator(generator=val_small)
metrics

[1.6074773506684736, 0.21450152]

6. Play around with different hyperparameters, optimizers and even base models (try mobilenet, etc.)

> I would change the optimizer next. Then perhaps another base model?

#### Checkpoint 2: Nice work! You just performed surgery on a neural network and retrained it to your particular task! This is a really powerful method for image classification.