# Dogs vs. Cats Redux: Kernels Edition

## Overview

Distinguish images of dogs from cats using a pretrained image recognition model *VGG16*.

See [Dogs vs. Cats Redux: Kernels Edition](https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition)

## Basic Setup

In [1]:
# Show plots on the page
%matplotlib inline

Define path to the dataset:

In [43]:
# path = "data/sample/"
path = "data/"

Load required libraries:

In [15]:
from __future__ import division,print_function

import os, json
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt

Load [fast.ai's](course.fast.ai) utility library:

In [4]:
import libs.utils; reload(libs.utils)
from libs.utils import plots

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
Using Theano backend.


Load [fast.ai's](course.fast.ai) version of VGG16 pretrained model:

In [44]:
import libs.vgg16; reload(libs.vgg16)
from libs.vgg16 import Vgg16

## Training

Create Vgg16 object

In [45]:
vgg = Vgg16()

Grab batches of data from our training and validation folder:

In [46]:
batch_size=64
batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)

Found 23000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.


Modify the model such that it will be trained based on the data in the batches provided. In this case, to predict either 'dog' or 'cat'

In [47]:
vgg.finetune(batches)

Fit the parameters of the model using the training data

In [48]:
vgg.fit(batches, val_batches, nb_epoch=1)

Epoch 1/1


## Generate Predictions

In [50]:
test_batches, preds = vgg.test(path+'test', batch_size=batch_size)

Found 12500 images belonging to 1 classes.


In [53]:
# Extract image ids from the file names
file_id = [os.path.splitext(os.path.basename(f))[0] for f in test_batches.filenames]

# Index 1 is the probability that the image is a dog
is_dog = preds[:, 1]
# Tweak over confident result
is_dog = is_dog.clip(min=0.05, max=0.95)

# Join two arrays
subm = np.stack([file_id, is_dog], axis=1)

Output result as csv

In [54]:
submission_file_name = 'submission.csv'
np.savetxt(submission_file_name, subm, fmt='%d,%.5f', header='id,label', comments='')