## Woof vs. Meow - 3

###  Fine-tuning the top layers of a pre-trained network 

In the last part, we achieved an 87% accuracy on the test validation set using the bottleneck features of a pre-trained network (VGG-16). Refer: [Transfer Learning](https://github.com/darshanbagul/Cats-vs-Dogs/blob/master/3.Transfer_Learning.ipynb) 

In this last part, we shall fine tune the top layers of the previous model, in order to adapt the model for the task of recognizing cats and dogs.

As we know, the base layers of a CNN learn rudimentary features such as edges and corners, and the top most layers learn to detect high level features such as legs, face, ears etc. The aim is to fine-tune the top layers so that the CNN is able to learn high level features for just cats and dogs, rather than high level features of all the classes that the original VGG-16 was trained to learn.

## Data

Data can be downloaded at:
https://www.kaggle.com/c/dogs-vs-cats/data  
All you need is the train set  
The recommended folder structure is:  

### Folder structure

```python
data/
    train/
        dogs/ 
            dog001.jpg
            dog002.jpg
            ...
        cats/ 
            cat001.jpg
            cat002.jpg
            ...
    validation/
        dogs/ 
            dog001.jpg
            dog002.jpg
            ...
        cats/ 
            cat001.jpg
            cat002.jpg
            ...
```

### Data loading

In [1]:
##This notebook is built around using tensorflow as the backend for keras
!pip install pillow
!KERAS_BACKEND=tensorflow python -c "from keras import backend"

[33mYou are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Using TensorFlow backend.


In [46]:
import os
import numpy as np
from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras import optimizers
import scipy
import pylab as pl
import matplotlib.cm as cm
%matplotlib inline

In [2]:
# dimensions of our images.
img_width, img_height = 150, 150

train_data_dir = 'data/train'
validation_data_dir = 'data/validation'

## Fine-tuning the top layers of a a pre-trained network

Start by instantiating the VGG base and loading its weights.

In [3]:
model_vgg = Sequential()
model_vgg.add(ZeroPadding2D((1, 1), input_shape=(img_width, img_height,3)))
model_vgg.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_1'))
model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_2'))
model_vgg.add(MaxPooling2D((2, 2), strides=(2, 2)))

model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_1'))
model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_2'))
model_vgg.add(MaxPooling2D((2, 2), strides=(2, 2)))

model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_1'))
model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_2'))
model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_3'))
model_vgg.add(MaxPooling2D((2, 2), strides=(2, 2)))

model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_1'))
model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_2'))
model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_3'))
model_vgg.add(MaxPooling2D((2, 2), strides=(2, 2)))

model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_1'))
model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_2'))
model_vgg.add(ZeroPadding2D((1, 1)))
model_vgg.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_3'))
model_vgg.add(MaxPooling2D((2, 2), strides=(2, 2)))

In [22]:
import h5py
f = h5py.File('models/vgg/vgg16_weights.h5')
for k in range(f.attrs['nb_layers']):
    if k >= len(model_vgg.layers) - 1:
        # we don't look at the last two layers in the savefile (fully-connected and activation)
        break
    g = f['layer_{}'.format(k)]
    weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
    layer = model_vgg.layers[k]

    if layer.__class__.__name__ in ['Convolution1D', 'Convolution2D', 'Convolution3D', 'AtrousConvolution2D']:
        weights[0] = np.transpose(weights[0], (2, 3, 1, 0))

    layer.set_weights(weights)

f.close()

Build a classifier model to put on top of the convolutional model. For the fine tuning, we start with a fully trained-classifer. We will use the weights from the earlier model. And then we will add this model on top of the convolutional base.

In [4]:
top_model = Sequential()
top_model.add(Flatten(input_shape=model_vgg.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(1, activation='sigmoid'))

top_model.load_weights('models/bottleneck_40_epochs.h5')

model_vgg.add(top_model)

For fine turning, we only want to train a few layers.  This line will set the first 25 layers (up to the conv block) to non-trainable.

In [24]:
for layer in model_vgg.layers[:25]:
    layer.trainable = False

In [25]:
# compile the model with a SGD/momentum optimizer
# and a very slow learning rate.
model_vgg.compile(loss='binary_crossentropy',
              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

In [26]:
# prepare data augmentation configuration  . . . do we need this?
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_height, img_width),
        batch_size=32,
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        validation_data_dir,
        target_size=(img_height, img_width),
        batch_size=32,
        class_mode='binary')

Found 20002 images belonging to 2 classes.
Found 4998 images belonging to 2 classes.


In [36]:
nb_epoch = 4
# fine-tune the model
model_vgg.fit_generator(
        train_generator,
        samples_per_epoch=nb_train_samples,
        nb_epoch=nb_epoch,
        validation_data=validation_generator,
        nb_val_samples=nb_validation_samples)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x11d3d6750>

In [37]:
model_vgg.save_weights('models/finetuning_20epochs_vgg.h5')

In [5]:
model_vgg.load_weights('models/finetuning_14epochs_vgg.h5')

### Evaluating on validation set

Computing loss and accuracy :

In [38]:
model_vgg.evaluate_generator(validation_generator, nb_validation_samples)

[0.20605383881134073, 0.93797519007603036]

## Accuracy = ~94 %

As we can see, just a little fine-tuning yields an accuracy that of 94%! This is well over the baseline accuracy of ~ 80%, which is also stated in this [paper](http://xenon.stanford.edu/~pgolle/papers/dogcat.pdf).

## Conclusion

    1. Transfer learning followed by a fine-tuning of top layers lead to a very accurate model
    
    2. If an algorithm is able to classify the images in the ASIRRA dataset with over 94% accuracy, the dataset being used for CAPTCHA is no longer robust against security attacks.
    
    3. In this project we explored a number of techniques for solving image classification challenges, and also laid down the strategies that can be followed for any image classification problem.
    
    4. Summary of our experiments:
        a. Training a small network from scratch (as a baseline)  

        b. Using data augmentation techniques for improving model robustness when working with less data and exploring Keras' ImageDataGenerator class for real-time data augmentation
        
        c. Using the bottleneck features of a pre-trained network  

        d. Fine-tuning the top layers of a pre-trained network along with layer freezing and model fine-tuning.