In [1]:
import numpy as np
from keras.models import Model
from keras.layers import Dropout, Flatten, Dense, Input
from keras.callbacks import ModelCheckpoint

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## Dog Breed Classifier
This project is a response to the <a href = 'https://www.kaggle.com/c/dog-breed-identification'> Kaggle Dog Breed Identification Challenge</a>. The dataset includes photographs of 120 dog breeds and a labeled training set of 10,222 photos.

## Building a Convnet with CPU
Image classification is notoriously computation heavy and it can take a long time to train up a functioning model without the aid of high-performance GPU. I wanted to see what was possible with, and what the limitations are of tackling a challenge like this with a CPU-only machine. These are the steps I took to create a classifier with non-trivial performance:

 - Use a pre-trained convnet: VGG16 trained with ImageNet (includes many dog breeds)
 - Strip "top," flattened layers from model to access raw image features
 - Build new classifier on top of pre-trained VGG16 convolutional layers
 - Final model consists pre-trained VGG16 layers and my custom-trained top layer
 
## Holdout Group and Data Preparation
To prepare the images for processing, I use Scikit-Learn stratified split. To play nice with the keras ImageDataGenerator, I organize the photos (after resizing to 224x224 pixels) into folders by breed class, in one directory for training (80% of photos) and another for validation (20% of photos.) That script can be found <a href='https://github.com/AlliedToasters/dogs/blob/master/make_holdout.ipynb'>here</a>.

## Extracting "Bottleneck" Features
Running the VGG16 convolutional layers on my CPU machine is a very lengthy process (about 4 hours to get through all 10,222 photos.) To avoid running each photo through the convnet more than once, I save the output of each of the images into a numpy array file (.npy). I use the keras ImageDataGenerator to aid in that process by "feeding" the images to the model. This process was taken from <a href='https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html'>this helpful tutorial</a> on the Keras blog. The script I use to extract these features can be found <a href='https://github.com/AlliedToasters/dogs/blob/master/bottleneck_features.ipynb'>here</a>.

## Training Model "Top"
To build a model sensitive to dog breeds, I used the extracted VGG16 features and train up the model. That process is recorded below.<br><br>
The "top" model itself is simply a 256 node-wide layer densely connected to the flattened 25,088 features extracted with the VGG16 model.

In [2]:
train_features = np.load('features_train.npy')
test_features = np.load('bottleneck_features_validation.npy')

train_labels = np.load('lbl_train.npy')
test_labels = np.load('lbl_test.npy')

In [3]:
inputs = Input(shape=train_features.shape[1:])
x = Flatten()(inputs)
x = Dense(256, activation='relu')(x)
x = Dropout(0.1)(x)
outputs = Dense(120, activation='softmax')(x)
model = Model(inputs=inputs, outputs=outputs)

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])


## "Top" Model Infrastructure
This model flattens the saved output of the VGG16 convolutional layers (dimensions 7, 7, 512) into 25,088 features in one dimension (a 1-D vector of 25,088 values). The "top" model treats this flattened vector as its "input," fully connected to a 256-dense layer with relu activation. I apply a dropout between this dense layer and fully-connected output layer, with 120 values for each of the 120 dog breed classes in the dataset and a softmax activation.

In [4]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 7, 7, 512)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 25088)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 256)               6422784   
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 120)               30840     
Total params: 6,453,624
Trainable params: 6,453,624
Non-trainable params: 0
_________________________________________________________________


In [5]:
checkpointer = ModelCheckpoint(filepath='VGG16_topv4.hdf5', verbose=1, save_best_only=True)

In [6]:
model.fit(
    train_features, train_labels,
    epochs=15,
    batch_size=80,
    validation_data=(test_features, test_labels),
    shuffle=True,
    callbacks=[checkpointer],
    verbose=1
)

Train on 8177 samples, validate on 2045 samples
Epoch 1/15
Epoch 00001: val_loss improved from inf to 4.78395, saving model to VGG16_topv4.hdf5
Epoch 2/15
Epoch 00002: val_loss improved from 4.78395 to 4.76258, saving model to VGG16_topv4.hdf5
Epoch 3/15
Epoch 00003: val_loss improved from 4.76258 to 4.65070, saving model to VGG16_topv4.hdf5
Epoch 4/15
Epoch 00004: val_loss improved from 4.65070 to 4.52616, saving model to VGG16_topv4.hdf5
Epoch 5/15
Epoch 00005: val_loss improved from 4.52616 to 4.38653, saving model to VGG16_topv4.hdf5
Epoch 6/15
Epoch 00006: val_loss did not improve
Epoch 7/15
Epoch 00007: val_loss improved from 4.38653 to 4.23376, saving model to VGG16_topv4.hdf5
Epoch 8/15
Epoch 00008: val_loss did not improve
Epoch 9/15
Epoch 00009: val_loss improved from 4.23376 to 4.05502, saving model to VGG16_topv4.hdf5
Epoch 10/15
Epoch 00010: val_loss did not improve
Epoch 11/15
Epoch 00011: val_loss did not improve
Epoch 12/15
Epoch 00012: val_loss improved from 4.05502 to

<keras.callbacks.History at 0x7f79a8a7cc18>

## Training Results
The model is capable and acheives an accuracy of over 15% on the validation set. This is nontrivial considering the classes are relatively balanced and many (120); the baseline accuracy for simply guessing the majority class would be 1.22%.<br><br>
In another notebook, I use an RBM and cropped images of dogs' heads to create a binary classifier between two breeds. This classifier gets 96% accuracy, but it's not nearly as flexible; I had to hand-crop those images and reduce the problem to a binary classifier to achieve that accuracy. This model performs far better, considering it requires zero feature engineering; it only requires a lot more computation power (or time, if the former is limited.) 

## Where do we go from here?
This model is certainly not as good as it can be. Some avenues for improvement are:
 1. "Fine-Tuning" the final convolutional block. This process is detailed in the blog post I mentioned before, in the final section.
 2. Adding additional depth and breadth to the "top" model. The VGG16 model trained on ImageNet makes use of two fully-connected dense layers of 4096 nodes to achieve its state-of-the-art performance. Training this up, of course, would require lots of time or GPUs.

In [18]:
print('trivial performance accuracy (setting all predictions to majority class in validation set): %', 
      str(100*test_labels.sum(axis=0).max()/test_labels.sum().sum())[:4]
)

trivial performance accuracy (setting all predictions to majority class in validation set): % 1.22
