# Deep Learning for Computer Vision:  Assignment 5

## Computer Science: COMS W 4995 004

## Due: April 6, 2017

### Problem: Telling Cats from Dogs using VGG16

This assignment is based on the blog post
"Building powerful image classification models using very little data"
from blog.keras.io. Here you will build a classifier that can distinguish between pictures of dogs and cats. You will use a ConvNet (VGG16) that was pre-trained ImageNet. Your task will be to re-architect the network to solve your problem. To do this you will:
0. Make a training dataset, using images from the link below, with 10,000 images of cats and 10,000 images of dogs. Use 1,000 images of each category for your validation set. The data should be orgainized into folders named ./data/train/cats/ + ./data/train/dogs/ + ./data/validation/cats/ + ./data/validation/dogs/. (No need to worry about a test set for this assignment.)  
1. take VGG16 network architecture
2. load in the pre-trained weights from the link below for all layers except the last layers 
3. add a fully connected layer followed by a final sigmoid layer to replace the 1000 category softmax layer that was used when the network was trained on ImageNet
4. freeze all layers except the last two that you added
5. fine-tune the network on your cats vs. dogs image data
6. evaluate the accuracy
7. unfreeze all layers
8. continue fine-tuning the network on your cats vs. dogs image data
9. evaluate the accuracy
10. comment your code and make sure to include accuracy, a few sample mistakes, and anything else you would like to add

Downloads:
1. You can get your image data from:
https://www.kaggle.com/c/dogs-vs-cats/data. 
2. You can get your VGG16 pre-trained network weights from 
https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3

(Note this assignment deviates from blog.keras.io in that it uses more data AND performs the fine-tuning in two steps: first freezing the lower layers and then un-freezing them for a final run of fine-tuning. The resulting ConvNet gets more than 97% accuracy in telling pictures of cats and dogs apart.)

A bunch of code and network definition has been included to to get you started. This is not meant to be a difficult assignment, as you have your final projects to work on!  Good luck and have fun!

Here we import necessary libraries.

Next we make the last layer or layers. We flatten the output from the last convolutional layer, and add fully connected layer with 256 hidden units. Finally, we add the output layer which is has a scalar output as we have a binary classifier. 

In [1]:
# Load all the packages 
import os
import h5py,pdb
import matplotlib.pyplot as plt
import time, pickle, pandas
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential, load_model
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.callbacks import TensorBoard, ModelCheckpoint
from keras import backend
from keras import optimizers
%matplotlib inline

Using TensorFlow backend.


In [2]:
nb_classes = 2 # number of classes
class_name = {
    0: 'cat',
    1: 'dog',
}

# dimensions of our images.
img_width, img_height = 150, 150

train_data_dir = './data/train'
validation_data_dir = './data/validation'
nb_train_samples = 20000
nb_validation_samples = 2000


def build_vgg16(framework='tf'):
    if framework == 'th':
        # build the VGG16 network in Theano weight ordering mode
        backend.set_image_dim_ordering('th')
    else:
        # build the VGG16 network in Tensorflow weight ordering mode
        backend.set_image_dim_ordering('tf')

    model = Sequential()
    if framework == 'th':
        model.add(ZeroPadding2D((1, 1), input_shape=(3, img_width, img_height)))
    else:
        model.add(ZeroPadding2D((1, 1), input_shape=(img_width, img_height, 3)))

    model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_1'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_2'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2)))

    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_1'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_2'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2)))

    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_1'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_2'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_3'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2)))

    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_1'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_2'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_3'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2)))

    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_1'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_2'))
    model.add(ZeroPadding2D((1, 1)))
    model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_3'))
    model.add(MaxPooling2D((2, 2), strides=(2, 2)))
    return model

weights_path = 'vgg16_weights.h5'
th_model = build_vgg16('th')

assert os.path.exists(weights_path), 'Model weights not found (see "weights_path" variable in script).'
f = h5py.File(weights_path)
for k in range(f.attrs['nb_layers']):
    if k >= len(th_model.layers):
        # we don't look at the last (fully-connected) layers in the savefile
        break
    g = f['layer_{}'.format(k)]
    weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
    th_model.layers[k].set_weights(weights)
f.close()
print('Model loaded.')

tf_model = build_vgg16('tf')

# transfer weights from th_model to tf_model
for th_layer, tf_layer in zip(th_model.layers, tf_model.layers):
    if th_layer.__class__.__name__ == 'Convolution2D':
      kernel, bias = th_layer.get_weights()
      kernel = np.transpose(kernel, (2, 3, 1, 0))
      tf_layer.set_weights([kernel, bias])
    else:
      tf_layer.set_weights(tf_layer.get_weights())

num_layers_before_top=len(tf_model.layers)
# build a classifier model to put on top of the convolutional model
top_model = Sequential()
top_model.add(Flatten(input_shape=tf_model.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(1, activation='sigmoid'))

Model loaded.


In [3]:
''' We add this model to the top of our VGG16 network, freeze all the weights except the top, and compile.
'''

# add the model on top of the convolutional base
tf_model.add(top_model)
# freeze all the weights except the top
for layer in tf_model.layers[:num_layers_before_top]:
    layer.trainable = False
tf_model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

'''Defining options for data augmentation.'''
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)
# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_width, img_height),
        batch_size=32,
        class_mode='binary')
# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
        validation_data_dir,
        target_size=(img_width, img_height),
        batch_size=32,
        class_mode='binary')

Found 20000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.


In [None]:
'''
Fine-tune the model
Now we train for 5 epochs to get the weights for the top close to where we need them. Essentially, we want the network to be doing the right thing before we unnfreeze the lower weights.
'''
nb_epoch=5 # nuumber of epochs
batch_size = 16 # batch-size
hist_little_convet = tf_model.fit_generator(
        train_generator,
        samples_per_epoch = nb_train_samples,
        nb_epoch = nb_epoch,
        validation_data = validation_generator,
        nb_val_samples = nb_validation_samples,
        verbose=2
        )

Epoch 1/5
415s - loss: 0.4650 - acc: 0.7989 - val_loss: 0.3382 - val_acc: 0.8485
Epoch 2/5
403s - loss: 0.3616 - acc: 0.8428 - val_loss: 0.3205 - val_acc: 0.8630
Epoch 3/5
403s - loss: 0.3381 - acc: 0.8553 - val_loss: 0.3095 - val_acc: 0.8575
Epoch 4/5
402s - loss: 0.3289 - acc: 0.8636 - val_loss: 0.3089 - val_acc: 0.8665
Epoch 5/5
404s - loss: 0.3208 - acc: 0.8673 - val_loss: 0.3115 - val_acc: 0.8665


In [None]:
''' Set the last conv. block to trainable. That is unfreezing it.'''
for layer in tf_model.layers[:num_layers_before_top]:
    layer.trainable = False
for layer in tf_model.layers[25:num_layers_before_top]:
    layer.trainable = True
    
# Hey! I just wanted to ask, is there something we can 
tf_model.compile(loss='binary_crossentropy',
              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

nb_epoch=15 # training for 15 epochs
hist_little_convet = tf_model.fit_generator(
        train_generator,
        samples_per_epoch = nb_train_samples,
        nb_epoch = nb_epoch,
        validation_data = validation_generator,
        nb_val_samples = nb_validation_samples,
        verbose=2
        )


Epoch 1/15
472s - loss: 0.2752 - acc: 0.8882 - val_loss: 0.2487 - val_acc: 0.9010
Epoch 2/15
471s - loss: 0.2298 - acc: 0.9051 - val_loss: 0.2260 - val_acc: 0.9060
Epoch 3/15
471s - loss: 0.2036 - acc: 0.9178 - val_loss: 0.2216 - val_acc: 0.9060
Epoch 4/15
471s - loss: 0.1859 - acc: 0.9231 - val_loss: 0.2018 - val_acc: 0.9120
Epoch 5/15
471s - loss: 0.1722 - acc: 0.9295 - val_loss: 0.2078 - val_acc: 0.9165
Epoch 6/15
471s - loss: 0.1590 - acc: 0.9373 - val_loss: 0.1690 - val_acc: 0.9270
Epoch 7/15
471s - loss: 0.1509 - acc: 0.9388 - val_loss: 0.1893 - val_acc: 0.9265
Epoch 8/15
471s - loss: 0.1409 - acc: 0.9433 - val_loss: 0.1835 - val_acc: 0.9265
Epoch 9/15
471s - loss: 0.1338 - acc: 0.9483 - val_loss: 0.1990 - val_acc: 0.9225
Epoch 10/15
471s - loss: 0.1258 - acc: 0.9508 - val_loss: 0.1811 - val_acc: 0.9275
Epoch 11/15
470s - loss: 0.1171 - acc: 0.9533 - val_loss: 0.1960 - val_acc: 0.9265
Epoch 12/15
471s - loss: 0.1105 - acc: 0.9569 - val_loss: 0.1824 - val_acc: 0.9300
Epoch 13/15
4

In [7]:
'''Unfreezing all layers and training the whole network with backprop.'''
for layer in tf_model.layers: # unfreezing all layers
    layer.trainable = True
nb_epoch=10
hist_little_convet = tf_model.fit_generator(
        train_generator,
        samples_per_epoch = nb_train_samples,
        nb_epoch = nb_epoch,
        validation_data = validation_generator,
        nb_val_samples = nb_validation_samples,
        verbose=2)

Epoch 1/10
470s - loss: 0.0567 - acc: 0.9781 - val_loss: 0.2156 - val_acc: 0.9380
Epoch 2/10
470s - loss: 0.0576 - acc: 0.9783 - val_loss: 0.2121 - val_acc: 0.9380
Epoch 3/10
470s - loss: 0.0489 - acc: 0.9811 - val_loss: 0.1992 - val_acc: 0.9395
Epoch 4/10
471s - loss: 0.0500 - acc: 0.9811 - val_loss: 0.1936 - val_acc: 0.9430
Epoch 5/10
471s - loss: 0.0488 - acc: 0.9808 - val_loss: 0.2130 - val_acc: 0.9370
Epoch 6/10
471s - loss: 0.0447 - acc: 0.9832 - val_loss: 0.2152 - val_acc: 0.9405
Epoch 7/10
470s - loss: 0.0443 - acc: 0.9846 - val_loss: 0.2115 - val_acc: 0.9385
Epoch 8/10
471s - loss: 0.0416 - acc: 0.9834 - val_loss: 0.2040 - val_acc: 0.9410
Epoch 9/10
471s - loss: 0.0410 - acc: 0.9851 - val_loss: 0.2473 - val_acc: 0.9410
Epoch 10/10
471s - loss: 0.0397 - acc: 0.9860 - val_loss: 0.2522 - val_acc: 0.9375


In [8]:
'''Training the network for 10 more epochs.'''
hist_little_convet = tf_model.fit_generator(
        train_generator,
        samples_per_epoch = nb_train_samples,
        nb_epoch = nb_epoch,
        validation_data = validation_generator,
        nb_val_samples = nb_validation_samples,
        verbose=2)

Epoch 1/10
471s - loss: 0.0356 - acc: 0.9873 - val_loss: 0.2346 - val_acc: 0.9340
Epoch 2/10
471s - loss: 0.0377 - acc: 0.9866 - val_loss: 0.2152 - val_acc: 0.9390
Epoch 3/10
471s - loss: 0.0327 - acc: 0.9886 - val_loss: 0.2168 - val_acc: 0.9415
Epoch 4/10
471s - loss: 0.0324 - acc: 0.9888 - val_loss: 0.2439 - val_acc: 0.9435
Epoch 5/10
471s - loss: 0.0291 - acc: 0.9892 - val_loss: 0.2326 - val_acc: 0.9425
Epoch 6/10
471s - loss: 0.0340 - acc: 0.9880 - val_loss: 0.2337 - val_acc: 0.9400
Epoch 7/10
471s - loss: 0.0305 - acc: 0.9891 - val_loss: 0.2238 - val_acc: 0.9450
Epoch 8/10
471s - loss: 0.0283 - acc: 0.9901 - val_loss: 0.2658 - val_acc: 0.9340
Epoch 9/10
471s - loss: 0.0273 - acc: 0.9905 - val_loss: 0.2182 - val_acc: 0.9470
Epoch 10/10
471s - loss: 0.0290 - acc: 0.9892 - val_loss: 0.2308 - val_acc: 0.9415


In [None]:
'''Lower the learning rate and training the network for 20 more epochs.'''
tf_model.compile(loss='binary_crossentropy',
              optimizer=optimizers.SGD(lr=1e-5, momentum=0.9),
              metrics=['accuracy'])
nb_epoch=20
hist_little_convet = tf_model.fit_generator(
        train_generator,
        samples_per_epoch = nb_train_samples,
        nb_epoch = nb_epoch,
        validation_data = validation_generator,
        nb_val_samples = nb_validation_samples,
        verbose=2)

Epoch 1/20
1325s - loss: 0.0215 - acc: 0.9927 - val_loss: 0.2132 - val_acc: 0.9485
Epoch 2/20
1317s - loss: 0.0203 - acc: 0.9927 - val_loss: 0.1965 - val_acc: 0.9515
Epoch 3/20
1316s - loss: 0.0175 - acc: 0.9937 - val_loss: 0.2081 - val_acc: 0.9520
Epoch 4/20
1317s - loss: 0.0161 - acc: 0.9940 - val_loss: 0.2634 - val_acc: 0.9400
Epoch 5/20
1317s - loss: 0.0156 - acc: 0.9947 - val_loss: 0.2077 - val_acc: 0.9560
Epoch 6/20
1317s - loss: 0.0171 - acc: 0.9936 - val_loss: 0.2181 - val_acc: 0.9475
Epoch 9/20
1316s - loss: 0.0163 - acc: 0.9937 - val_loss: 0.2341 - val_acc: 0.9450
Epoch 10/20
1316s - loss: 0.0142 - acc: 0.9960 - val_loss: 0.2313 - val_acc: 0.9490
Epoch 11/20
1316s - loss: 0.0147 - acc: 0.9952 - val_loss: 0.1706 - val_acc: 0.9545
Epoch 12/20
1316s - loss: 0.0157 - acc: 0.9942 - val_loss: 0.2264 - val_acc: 0.9510
Epoch 13/20
1316s - loss: 0.0132 - acc: 0.9956 - val_loss: 0.2096 - val_acc: 0.9535
Epoch 14/20
1316s - loss: 0.0148 - acc: 0.9955 - val_loss: 0.2179 - val_acc: 0.9495

In [None]:
# the best validation accuracy achieved was 95.90%

In [None]:
# Evaluate the accuracy on train and validation set again:
accuracies = np.array([])
losses = np.array([])

i=0
for X_batch, Y_batch in validation_generator:
    loss, accuracy = model.evaluate(X_batch, Y_batch, verbose=0)
    losses = np.append(losses, loss)
    accuracies = np.append(accuracies, accuracy)
    i += 1
    if i == 20:
       break
       
print("Validation: accuracy = %f  ;  loss = %f" % (np.mean(accuracies), np.mean(losses)))