# Dog Breed I: Keras - VGG16 

## Introduction

I am following the fast.ai course, and this is my take on the Dog Breed competition based on that. In this an other notebooks I will try to get a good score on the Dog Breed competition (say top 50%).

As a starting point, I will fine-tune the VGG16 model, for this, following the example of the dogs vs cat competition of the course. I already have saved the data as suggested. In the data folder I have train, validation and test folders.

In this notebook I will:

* Define the VGG-16 model using Keras, load the weights, and fine-tune it for this competition
* Train the fine-tune version of the model
* Produce a submission file for Kaggle

But first let's us load some libraries

In [27]:
%matplotlib inline

import numpy as np
import os
import glob
import matplotlib.pyplot as plt
import pandas as pd

from IPython.display import FileLink

from importlib import reload
import utils; reload(utils)
from utils import *

import keras.backend as K
from keras.layers import Dense, Flatten, Lambda, BatchNormalization, Dropout
from keras.layers import Conv2D, MaxPool2D
from keras.models import Model, Sequential
from keras.optimizers import Adam, RMSprop
from keras.preprocessing import image

Found 360 images belonging to 120 classes.


And this is to avoid too many OOM:

In [11]:
limit_mem()

## VGG-model

This part is just the VGG model definition. I have downloaded the weights for this previously, which I found at
https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5

In [12]:
initial_weights_path = 'models/vgg16_weights_tf_dim_ordering_tf_kernels.h5'

In [13]:
vgg_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((1,1,3))
def preproc(x):
    x = x - vgg_mean
    return x[:,:,:,::-1]

def conv_block(model, layers, filters):
    for i in range(layers):
        model.add(Conv2D(filters, kernel_size=(3,3), padding='same', activation='relu'))
    model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))

def fc_block(model, do):
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(do))

def vgg16(do):
    model = Sequential()
    
    model.add(Lambda(preproc, input_shape=(224,224,3)))
    
    conv_block(model, 2, 64)
    conv_block(model, 2, 128)
    conv_block(model, 3, 256)
    conv_block(model, 3, 512)
    conv_block(model, 3, 512)
    
    model.add(Flatten())
    fc_block(model, do)
    fc_block(model, do)
    model.add(Dense(1000, activation='softmax'))
    
    return model

In [14]:
model = vgg16(0)

In [6]:
model.load_weights(initial_weights_path)

Now that we have the model with the weights loaded, we can fine tune it. Recall that for Dog Breed competition there are 120 categories so:

In [15]:
model.pop()
for layer in model.layers: layer.trainable=False

In [16]:
model.add(Dense(120, activation='softmax'))

## Train the model

Now, we prepare the batches and train the model. In the first run, to verify that everything works, I will use the sample path, then I will subtitute it with the real path:

In [17]:
#path = 'data/sample/'
path = 'data/'

In [18]:
batch_size = 64

In [19]:
train_batch, steps_per_epoch = get_batch(path + 'train', batch_size=batch_size, shuffle=True)
valid_batch, validation_steps = get_batch(path + 'valid', batch_size=batch_size)

Found 9254 images belonging to 120 classes.
Found 968 images belonging to 120 classes.


In [20]:
model.compile(Adam(0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

In [23]:
model.fit_generator(train_batch,steps_per_epoch, epochs=1,
                   validation_data=valid_batch, validation_steps=validation_steps)

Epoch 1/1


<keras.callbacks.History at 0x7f738869d860>

In [17]:
model.fit_generator(train_batch,steps_per_epoch, epochs=4,
                   validation_data=valid_batch, validation_steps=validation_steps)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7fb334629e10>

In [24]:
model.save_weights('models/vgg16_fine_tuned_1.h5')

Right now it is overfitting but still improving. I am going to use this as my starting point.

## Create a submission file

Now, to obtain the predictions is easy, but they are not real probabilities of each category, since the model, as far as I understand, tends to be overconfident. Which means we have to adjust the result for that. I designed the following function which tries to accomplish this:

In [21]:
model.load_weights('models/vgg16_fine_tuned_1.h5')

In [22]:
test_path = path + 'test/'

In [28]:
test_df = prepare_submission('submissions/new_submissions_1bisbis.csv', model, top_sum=0.98)

Found 10357 images belonging to 1 classes.


In [29]:
test_df.head()

Unnamed: 0_level_0,affenpinscher,afghan_hound,african_hunting_dog,airedale,american_staffordshire_terrier,appenzeller,australian_terrier,basenji,basset,beagle,...,toy_poodle,toy_terrier,vizsla,walker_hound,weimaraner,welsh_springer_spaniel,west_highland_white_terrier,whippet,wire-haired_fox_terrier,yorkshire_terrier
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
000621fb3cbb32d8935728e48679680e,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,...,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172
00102ee9d8eb90812350685311fe5890,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,...,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172,0.000172
0012a730dfa437f5f3613fb75efcd4ce,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,...,0.00018,0.00018,0.00018,0.00018,0.00018,0.030007,0.00018,0.00018,0.00018,0.00018
001510bc8570bbeee98c8d80c8a95ec1,0.00018,0.030497,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,...,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018
001a5f3114548acdefa3d4da05474c2e,0.048518,0.024085,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,...,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018,0.00018


In [30]:
FileLink('submissions/new_submissions_1bisbis.csv')

This got a score of 1.13167

In [31]:
test_df = prepare_submission('submissions/new_submissions_1bis2.csv', model, top_sum=0.8)

Found 10357 images belonging to 1 classes.


  adj_pred[adj_pred>=low_bar] = (top_sum/old_top_sum)*adj_pred[adj_pred>=low_bar]


In [35]:
test_df.head()

Unnamed: 0_level_0,affenpinscher,afghan_hound,african_hunting_dog,airedale,american_staffordshire_terrier,appenzeller,australian_terrier,basenji,basset,beagle,...,toy_poodle,toy_terrier,vizsla,walker_hound,weimaraner,welsh_springer_spaniel,west_highland_white_terrier,whippet,wire-haired_fox_terrier,yorkshire_terrier
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
000621fb3cbb32d8935728e48679680e,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,...,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681
00102ee9d8eb90812350685311fe5890,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,...,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681
0012a730dfa437f5f3613fb75efcd4ce,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,...,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681
001510bc8570bbeee98c8d80c8a95ec1,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,...,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681
001a5f3114548acdefa3d4da05474c2e,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,...,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681,0.001681


In [36]:
FileLink('submissions/new_submissions_1bis2.csv')