# Cats vs Dogs redux (kaggle)

This code implements a deep learning classification model to distinguish between images of cats and images of dogs. It makes use of the pretrained VGG16 model<sup>1</sup> by using the Python `vgg` class [provided by Fast.AI](https://github.com/fastai/courses/blob/master/deeplearning1/nbs/vgg16.py) as part of the [Practical Deep Learning for Coders](http://course.fast.ai/lessons/lesson1.html) course as well as their [`utils.py`](https://github.com/fastai/courses/blob/master/deeplearning1/nbs/utils.py) package.

The code assumes that their is a data folder whose structure is defined in the [Prepare Data](https://github.com/DanGolding/kaggle_cats_vs_dogs_redux/blob/master/Prepare%20data.ipynb) notebook in this repository.

<sup>1</sup>K. Simonyan, A. Zisserman [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/pdf/1409.1556.pdf), arXiv technical report, 2014

In [21]:
from __future__ import division,print_function

import os, json
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt
from matplotlib.image import imread
%matplotlib inline

import utils; reload(utils)
from utils import plots
import vgg16; reload(vgg16)
from vgg16 import Vgg16

from keras.preprocessing import image
from keras.layers.core import Flatten, Dense, Dropout, Lambda
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping

In [30]:
path = "data/"
# path = "data/sample/"
batch_size=64

Train the model

In [None]:
vgg = Vgg16()

generator = image.ImageDataGenerator()
batches = generator.flow_from_directory(path + 'train',target_size=(224,224),batch_size=batch_size, class_mode='categorical', shuffle=True)
val_batches = generator.flow_from_directory(path + 'valid',target_size=(224,224),batch_size=batch_size*2, class_mode='categorical', shuffle=True)

# Adapt the model from predicting ImageNet classes to predicting Cats vs Dogs
def finetune(model,num_classes,lr):
    model.pop()
    for layer in model.layers:
        layer.trainable = False
    model.add(Dense(num_classes,activation='softmax'))
    vgg.compile(lr)

def fit(model):
    hist = model.fit_generator(batches, samples_per_epoch=batches.nb_sample, nb_epoch=1,
                validation_data=val_batches, nb_val_samples=val_batches.nb_sample)
    return hist.history['val_loss'][0]
    
lr = 0.002
finetune(vgg.model,batches.nb_class,lr)
tol = 0.005
loss_previous = fit(vgg.model)
epochs = 5
for ep in range(epochs):
    print('Epoch {}'.format(ep+1))
    loss_current = fit(vgg.model)
    print('val_loss decreased by {}%'.format(100*(loss_previous - loss_current)/loss_previous))
    if (loss_previous - loss_current)/loss_previous < tol:
        lr /= 2.
        tol /= 2.
        print('lr: {}'.format(lr))
        vgg.model.optimizer.lr = lr
    loss_previous = loss_current


Found 23000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/1
 4224/23000 [====>.........................] - ETA: 488s - loss: 0.1675 - acc: 0.9583

In [32]:
vgg.model.save_weights(path + 'results/ft2.h5')

Run the trained model on the test data

In [33]:
test_batches = generator.flow_from_directory(path + 'test',target_size=(224,224),batch_size=batch_size*2, class_mode=None, shuffle=False)
predictions = vgg.model.predict_generator(test_batches,test_batches.nb_sample)

Found 12500 images belonging to 1 classes.


Prepare the results for kaggle submission

In [34]:
filenames = test_batches.filenames
utils.save_array(path + 'results/test_predictions_II.dat',predictions)
utils.save_array(path + 'results/test_filenames_II.dat',filenames)

In [35]:
ids = np.array([int(filename.split('.')[0].split("/")[1]) for filename in filenames])
results = np.vstack((ids,np.clip(predictions,0.05,0.95)[:,1])).T

Create the submission file and a link in order to download it to a local machine in order to submit to kaggle via the website

In [36]:
np.savetxt(path + 'results/results_II.csv',results,fmt='%d,%.5f',delimiter=',',header='id,label',comments='')

In [38]:
from IPython.display import FileLink
FileLink(path + 'results/results_II.csv')