This notebooks finetunes VGG16 by adding a couple of Dense layers and trains it to classify between cats and dogs.

This gives a better classification of around 95% accuracy on the validation dataset

In [55]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [1]:
import numpy as np

import tensorflow as tf

from tensorflow.contrib.keras import layers
from tensorflow.contrib.keras import models
from tensorflow.contrib.keras import optimizers
from tensorflow.contrib.keras import applications
from tensorflow.contrib.keras.python.keras.preprocessing import image
from tensorflow.contrib.keras.python.keras.applications import imagenet_utils

In [2]:
def get_batches(dirpath, gen=image.ImageDataGenerator(), shuffle=True, batch_size=64, class_mode='categorical'):
    return gen.flow_from_directory(dirpath, target_size=(224,224), class_mode=class_mode, shuffle=shuffle, batch_size=batch_size)

In [3]:
batch_size = 64

In [4]:
train_batches = get_batches('./data/train', batch_size=batch_size)

Found 22797 images belonging to 2 classes.


In [5]:
val_batches = get_batches('./data/valid', batch_size=batch_size)

Found 2203 images belonging to 2 classes.


Model creation

In [6]:
vgg16 = applications.VGG16(weights="imagenet", include_top=False, input_shape=(224, 224, 3))

In [7]:
##

In [8]:
finetune_in = vgg16.output

In [9]:
x = layers.Flatten(name='flatten')(finetune_in)
x = layers.Dense(4096, activation='relu', name='fc1')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(4096, activation='relu', name='fc2')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.5)(x)

In [10]:
predictions = layers.Dense(train_batches.num_class, activation='softmax', name='predictions')(x)

In [11]:
model = models.Model(inputs=vgg16.input, outputs=predictions)

In [12]:
##

We tell the model to train on the last 3 layers

In [13]:
for layer in model.layers[:-7]:
    layer.trainable = False

In [14]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [18]:
for i, layer in enumerate(model.layers):
   print(i, layer.name, layer.trainable)

0 input_1 False
1 block1_conv1 False
2 block1_conv2 False
3 block1_pool False
4 block2_conv1 False
5 block2_conv2 False
6 block2_pool False
7 block3_conv1 False
8 block3_conv2 False
9 block3_conv3 False
10 block3_pool False
11 block4_conv1 False
12 block4_conv2 False
13 block4_conv3 False
14 block4_pool False
15 block5_conv1 False
16 block5_conv2 False
17 block5_conv3 False
18 block5_pool False
19 flatten False
20 fc1 True
21 batch_normalization_1 True
22 dropout_1 True
23 fc2 True
24 batch_normalization_2 True
25 dropout_2 True
26 predictions True


In [19]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

In [20]:
epochs = 1

In [23]:
steps_per_epoch = train_batches.samples // train_batches.batch_size
validation_steps = val_batches.samples // val_batches.batch_size

In [None]:
model.fit_generator(train_batches, validation_data=val_batches, epochs=epochs,
                    steps_per_epoch=steps_per_epoch,validation_steps=validation_steps)

This give us a validation score of around: `val_loss: 0.2865 - val_acc: 0.9536`

## Gen submission file

In [59]:
import submission

In [61]:
test_batches, steps = submission.test_batches()

Found 12500 images belonging to 1 classes.


In [62]:
preds = model.predict_generator(test_batches, steps)

In [64]:
preds.shape

(12500, 2)

In [68]:
submission.gen_file(preds, test_batches)

This gave a score of around `0.39` on the public leaderboard