# Cats vs Dogs redux (kaggle)

This code implements a deep learning classification model to distinguish between images of cats and images of dogs. It makes use of the pretrained VGG16 model<sup>1</sup> by using the Python `vgg` class [provided by Fast.AI](https://github.com/fastai/courses/blob/master/deeplearning1/nbs/vgg16.py) as part of their [Practical Deep Learning for Coders](http://course.fast.ai/lessons/lesson1.html) course as well as their [`utils.py`](https://github.com/fastai/courses/blob/master/deeplearning1/nbs/utils.py) package.

The code assumes that their is a data folder whose structure is defined in the [Prepare Data](https://github.com/DanGolding/kaggle_cats_vs_dogs_redux/blob/master/Prepare%20data.ipynb) notebook in this repository.

<sup>1</sup>K. Simonyan, A. Zisserman [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/pdf/1409.1556.pdf), arXiv technical report, 2014

In [1]:
from __future__ import division,print_function

import os, json
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt
from matplotlib.image import imread
%matplotlib inline

import utils; reload(utils)
from utils import plots
import vgg16; reload(vgg16)
from vgg16 import Vgg16

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
Using Theano backend.


In [2]:
path = "data/"
# path = "data/sample/"
batch_size=64

Train the model

In [3]:
vgg = Vgg16()
batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=1)


Found 23000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/1


In [4]:
vgg.model.save_weights(path + 'results/ft1.h5')

Run the trained model on the test data

In [30]:
# test_data = utils.get_data(path + 'test',10)

TypeError: 'int' object is not iterable

In [26]:
# preds, idxs, classes = vgg.predict(test_data)

NameError: name 'test_data' is not defined

In [6]:
test_batches, predictions = vgg.test(path + 'test1', batch_size = batch_size*2)

Found 12500 images belonging to 1 classes.


Prepare the results for kaggle submission

In [26]:
filenames = test_batches.filenames
utils.save_array(path + 'results/test_predictions.dat',predictions)
utils.save_array(path + 'results/test_filenames.dat',filenames)

In [34]:
ids = np.array([int(filename.split('.')[0].split("/")[1]) for filename in filenames])
results = np.vstack((ids,np.clip(predictions,0.05,0.95)[:,1])).T

In [45]:
np.savetxt(path + 'results/results.csv',results,fmt='%d,%.5f',delimiter=',',header='id,label',comments='')

In [46]:
from IPython.display import FileLink
FileLink(path + 'results/results.csv')