# Deep Vision
___

**Author** : Aman Hussain  
**Email** : aman@amandavinci.me  
**Description** : Classifying images of dogs and cats by finetuning the VGG16 model

## Import Libraries

#### Scientific Computing Stack

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

#### Custom Packages

In [None]:
import os, json

from helper import utils
from helper.utils import plots

from helper import vgg16
from helper.vgg16 import Vgg16

## Declaring paths & global parameters

The path to the dataset is defined here. It will point to the sample folder which contains lesser number of images for quick and iterative training on the local machine. For the final training, on the cloud we must change the path to the one commented out below.

In [4]:
# path = '../data/dogscats/sample/'
path = '../data/dogscats/'

The default batchsize for training and validation purposes

In [4]:
batchsize = 64

## Data Exploration

Instantiating the VGG16 class which implements the required utility methods

In [5]:
vgg = Vgg16()

Downloading data from http://files.fast.ai/models/vgg16.h5
Downloading data from http://files.fast.ai/models/imagenet_class_index.json

Getting the training and validation batches

In [6]:
batches = vgg.get_batches(path+'train', batch_size=batchsize)
val_batches = vgg.get_batches(path+'valid', batch_size=batchsize)

Found 20000 images belonging to 2 classes.
Found 5000 images belonging to 2 classes.


Visualizing the images, only if we are exploring the samples

In [7]:
if path == '../data/dogscats/sample/':
    imgs, labels = next(batches)
    val_imgs, val_labels = next(val_batches)
    labels = ['dog' if i[0]==0 else 'cat' for i in labels]
    val_labels = ['dog' if i[0]==0 else 'cat' for i in val_labels]
    plots(val_imgs, figsize=(20,10), titles=val_labels)

## Finetuning

In [8]:
vgg.finetune(batches)

In [9]:
%%time
vgg.fit(batches, val_batches, nb_epoch=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
CPU times: user 1h 3min 27s, sys: 12min 31s, total: 1h 15min 59s
Wall time: 54min 49s


## Model Testing

Due to the quirkiness of the ImageDataGenerator.flow_from_directory() used by vgg.get_batches(), we have to make a sub directory under test directory by the name 'subdir_for_keras_ImageDataGenerator'.

In [10]:
batch_size = len(os.listdir(path+'test'+'/subdir_for_keras_ImageDataGenerator'))

With the class_mode set to None, it will return only the batch of images without labels

In [11]:
testbatch = vgg.get_batches(path+'test', shuffle=False, batch_size=batch_size, class_mode=None)

Found 12500 images belonging to 1 classes.


In [12]:
test_imgs = next(testbatch) 

Here,we visualize the test images

In [13]:
if path == '../data/dogscats/sample/':
    plots(test_imgs)

Here, we make the predictions using our trained model

In [14]:
%%time
probab, prediction, prediction_labels = vgg.predict(test_imgs, details = True)

CPU times: user 4min 10s, sys: 1min 15s, total: 5min 25s
Wall time: 5min 25s


## Results

Preparing to save the predictions as submissions to the Kaggle competetion

In [15]:
np.save(path+'submissions/probab', probab)
np.save(path+'submissions/prediction', prediction)
np.save(path+'submissions/prediction_labels', prediction_labels)

In [16]:
index = [str(i) for i in range(1, batch_size+1)]
index.insert(0, 'id')

labels_pred = [str(label) for label in prediction]
labels_pred.insert(0, 'label')

labels_prob = [str(label) for label in probab]
labels_prob.insert(0, 'label')

In [17]:
submission_array_pred = np.vstack((index, labels_pred)).T.astype('str')
submission_array_prob = np.vstack((index, labels_prob)).T.astype('str')

Saving the array as a CSV

In [18]:
np.savetxt(path+'submissions/submission_pred.csv', submission_array_pred, delimiter=",", fmt='%1s')
np.savetxt(path+'submissions/submission_prob.csv', submission_array_prob, delimiter=",", fmt='%1s')

___

## Correcting Submissions after Training

In [23]:
batch_size = len(os.listdir(path+'test'+'/subdir_for_keras_ImageDataGenerator'))

In [13]:
probab = np.load(path+'submissions/probab.npy')
prediction = np.load(path+'submissions/prediction.npy')
prediction_labels = np.load(path+'submissions/prediction_labels.npy')

In [14]:
probab[:5]

array([ 1.        ,  0.98002827,  1.        ,  1.        ,  1.        ], dtype=float32)

In [15]:
prediction[:5]

array([0, 0, 0, 0, 1])

In [16]:
prediction_labels[:5]

array([b'cats', b'cats', b'cats', b'cats', b'dogs'],
      dtype='|S4')

In [17]:
for predicted, index in enumerate(prediction):
    # When a cat is predicted, get the complimentary value
    if predicted == 0:
        probab[index] = 1 - probab[index]

In [18]:
probab[:10]

array([ 0.        ,  0.98002827,  1.        ,  1.        ,  1.        ,
        0.99999928,  1.        ,  0.98711455,  1.        ,  1.        ], dtype=float32)

In [19]:
prediction[:10]

array([0, 0, 0, 0, 1, 1, 0, 0, 0, 0])

In [24]:
index = [str(i) for i in range(1, batch_size+1)]
index.insert(0, 'id')

labels_pred = [str(label) for label in prediction]
labels_pred.insert(0, 'label')

labels_prob = [str(label) for label in probab]
labels_prob.insert(0, 'label')

In [25]:
submission_array_pred = np.vstack((index, labels_pred)).T.astype('str')
submission_array_prob = np.vstack((index, labels_prob)).T.astype('str')

In [26]:
np.savetxt(path+'submissions/submission_pred.csv', submission_array_pred, delimiter=",", fmt='%1s')
np.savetxt(path+'submissions/submission_prob.csv', submission_array_prob, delimiter=",", fmt='%1s')