# Dogs-vs-cats classification with CNNs

In this notebook, we'll train a convolutional neural network (CNN, ConvNet) to classify images of dogs from images of cats using TensorFlow 2.0 / Keras. This notebook is largely based on the blog post [Building powerful image classification models using very little data](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html) by François Chollet.

**Note that using a GPU with this notebook is highly recommended.**

First, the needed imports.

In [None]:
%matplotlib inline

import os, datetime
import random
import pathlib

import tensorflow as tf

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (Dense, Activation, Dropout, Conv2D,
                                    Flatten, MaxPooling2D, InputLayer)
from tensorflow.keras.preprocessing.image import (ImageDataGenerator, 
                                                  array_to_img, 
                                                  img_to_array, load_img)
from tensorflow.keras import applications, optimizers

from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.utils import plot_model

from distutils.version import LooseVersion as LV

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

print('Using Tensorflow version:', tf.__version__,
      'Keras version:', tf.keras.__version__,
      'backend:', tf.keras.backend.backend())
assert(LV(tf.keras.__version__) >= LV("2.0.0"))

## Data

The training dataset consists of 2000 images of dogs and cats, split in half.  In addition, the validation set consists of 1000 images, and the test set of 22000 images.  Here are some random training images:

![title](imgs/dvc.png)

### Downloading the data

In [None]:
datapath = "/media/data/dogs-vs-cats/train-2000/tfrecord/"

### Parameters

In [None]:
INPUT_IMAGE_SIZE = [160, 160, 3]
BATCH_SIZE = 32

### Data augmentation

We need to resize all training and validation images to a fixed size. 

Then, to make the most of our limited number of training examples, we'll apply random transformations (crop and horizontal flip) to them each time we are looping over them. This way, we "augment" our training dataset to contain more data. There are various transformations readily available in TensorFlow, see [tf.image](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/image) for more information.

In [None]:
def preprocess_image(image, augment):
    image = tf.image.decode_jpeg(image, channels=3)
    if augment:
        image = tf.image.resize(image, [256, 256])
        image = tf.image.random_crop(image, INPUT_IMAGE_SIZE)
        if random.random() < 0.5:
            image = tf.image.flip_left_right(image)
    else:
        image = tf.image.resize(image, INPUT_IMAGE_SIZE[:2])
    image /= 255.0  # normalize to [0,1] range
    return image

feature_description = {
    "image/encoded": tf.io.FixedLenFeature((), tf.string, default_value=""),
    "image/height": tf.io.FixedLenFeature((), tf.int64, default_value=0),
    "image/width": tf.io.FixedLenFeature((), tf.int64, default_value=0),
    "image/colorspace": tf.io.FixedLenFeature((), tf.string, default_value=""),
    "image/channels": tf.io.FixedLenFeature((), tf.int64, default_value=0),
    "image/format": tf.io.FixedLenFeature((), tf.string, default_value=""),
    "image/filename": tf.io.FixedLenFeature((), tf.string, default_value=""),
    "image/class/label": tf.io.FixedLenFeature((), tf.int64, default_value=0),
    "image/class/text": tf.io.FixedLenFeature((), tf.string, default_value="")}

def load_and_augment_image(example_proto):
    ex = tf.io.parse_single_example(example_proto, feature_description)
    return (preprocess_image(ex["image/encoded"], True),
            ex["image/class/label"]-1)

def load_and_not_augment_image(example_proto):
    ex = tf.io.parse_single_example(example_proto, feature_description)
    return (preprocess_image(ex["image/encoded"], False),
            ex["image/class/label"]-1)

### TF Datasets

Let's now define our [TF `Dataset`s](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset#class_dataset) for training, validation, and test data. We use the [TFRecordDataset](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/TFRecordDataset) class, which reads the data records from multiple TFRecord files.

In [None]:
train_filenames = [datapath+"train-{0:05d}-of-00004".format(i)
                   for i in range(4)]
train_dataset = tf.data.TFRecordDataset(train_filenames)

validation_filenames = [datapath+"validation-{0:05d}-of-00002".format(i)
                        for i in range(2)]
validation_dataset = tf.data.TFRecordDataset(validation_filenames)

test_filenames = [datapath+"test-{0:05d}-of-00022".format(i)
                   for i in range(22)]
test_dataset = tf.data.TFRecordDataset(test_filenames)

We then `map()` the records to the actual image data and decode the images.
Note that we shuffle and augment only the training data.

In [None]:
train_dataset = train_dataset.map(load_and_augment_image, num_parallel_calls=10)
train_dataset = train_dataset.shuffle(2000).batch(BATCH_SIZE, drop_remainder=True)
train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)

validation_dataset = validation_dataset.map(load_and_not_augment_image, num_parallel_calls=10)
validation_dataset = validation_dataset.batch(BATCH_SIZE, drop_remainder=True)
validation_dataset = validation_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)

test_dataset = test_dataset.map(load_and_not_augment_image, num_parallel_calls=10)
test_dataset = test_dataset.shuffle(2000).batch(BATCH_SIZE, drop_remainder=False)
test_dataset = test_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)

Let's see a couple of augmented training images and not augmented validation images.

In [None]:
label_names = {0: 'cats', 1: 'dogs'}

plt.figure(figsize=(10,10))
for batch, labels in train_dataset.take(1):
    for i in range(9):    
        plt.subplot(3,3,i+1)
        plt.imshow(batch[i,:,:,:])
        plt.title(label_names[labels[i].numpy()])
        plt.grid(False)
        plt.xticks([])
        plt.yticks([])
    plt.suptitle('augmented training images', fontsize=16, y=0.93)
    
plt.figure(figsize=(10,10))
for batch, labels in validation_dataset.take(1):
    for i in range(9):    
        plt.subplot(3,3,i+1)
        plt.imshow(batch[i,:,:,:])
        plt.title(label_names[labels[i].numpy()])
        plt.grid(False)
        plt.xticks([])
        plt.yticks([])
    plt.suptitle('not augmented validation images', fontsize=16, y=0.93)

## Option 1: Train a small CNN from scratch

Similarly as with MNIST digits, we can start from scratch and train a CNN for the classification task. However, due to the small number of training images, a large network will easily overfit, regardless of the data augmentation.

### Initialization

In [None]:
model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=INPUT_IMAGE_SIZE, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

print(model.summary())

In [None]:
plot_model(model, 'tf2-dvc-small-cnn.png', show_shapes=True)

### Learning

We'll use TensorBoard to visualize our progress during training.

In [None]:
logdir = os.path.join(os.getcwd(), "logs",
                      "dvc-small-cnn-"+datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
print('TensorBoard log directory:', logdir)
os.makedirs(logdir)
callbacks = [TensorBoard(log_dir=logdir)]

In [None]:
%%time

epochs = 10

history = model.fit(train_dataset, epochs=epochs, 
                    validation_data=validation_dataset,
                    callbacks=callbacks, verbose=2)

model.save("dvc-small-cnn.h5")

In [None]:
plt.figure(figsize=(5,3))
plt.plot(history.epoch,history.history['loss'], label='training')
plt.plot(history.epoch,history.history['val_loss'], label='validation')
plt.title('loss')
plt.legend(loc='best')

plt.figure(figsize=(5,3))
plt.plot(history.epoch,history.history['accuracy'], label='training')
plt.plot(history.epoch,history.history['val_accuracy'], label='validation')
plt.title('accuracy')
plt.legend(loc='best');

### Inference

In [None]:
%%time

scores = model.evaluate(test_dataset, verbose=2)
print("Test set %s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

## Option 2: Reuse a pre-trained CNN

Another option is to reuse a pretrained network.  Here we'll use one of the [pre-trained networks available from Keras](https://keras.io/applications/).  We remove the top layers and freeze the pre-trained weights. 

We first choose either VGG16 or MobileNet as our pretrained network:

In [None]:
pretrained = 'VGG16'
#pretrained = 'MobileNet'

### Initialization

In [None]:
model = Sequential()

if pretrained == 'VGG16':
    pt_model = applications.VGG16(weights='imagenet', include_top=False,      
                                  input_shape=INPUT_IMAGE_SIZE)
    pretrained_first_trainable_layer = 15 
elif pretrained == 'MobileNet':
    pt_model = applications.MobileNet(weights='imagenet', include_top=False,
                                      input_shape=INPUT_IMAGE_SIZE)
    pretrained_first_trainable_layer = 75
else:
    assert 0, "Unknown model: "+pretrained
    
pt_name = pt_model.name
print('Using {} pre-trained model'.format(pt_name))

for layer in pt_model.layers:
    model.add(layer)

for layer in model.layers:
    layer.trainable = False

print(model.summary())

We then stack our own, randomly initialized layers on top of the pre-trained network.

In [None]:
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

print(model.summary())

In [None]:
plot_model(model, 'tf2-dvc-'+pt_name+'-reuse.png', show_shapes=True)

### Learning 1: New layers

In [None]:
logdir = os.path.join(os.getcwd(), "logs",
                      "dvc-"+pt_name+"-reuse-"+datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
print('TensorBoard log directory:', logdir)
os.makedirs(logdir)
callbacks = [TensorBoard(log_dir=logdir)]

In [None]:
%%time

epochs = 10

history = model.fit(train_dataset, epochs=epochs, 
                    validation_data=validation_dataset,
                    callbacks=callbacks, verbose=2)

model.save("dvc-" + pt_name + "-reuse.h5")

In [None]:
plt.figure(figsize=(5,3))
plt.plot(history.epoch,history.history['loss'], label='training')
plt.plot(history.epoch,history.history['val_loss'], label='validation')
plt.title('loss')
plt.legend(loc='best')

plt.figure(figsize=(5,3))
plt.plot(history.epoch,history.history['accuracy'], label='training')
plt.plot(history.epoch,history.history['val_accuracy'], label='validation')
plt.title('accuracy')
plt.legend(loc='best');

### Learning 2: Fine-tuning

Once the top layers have learned some reasonable weights, we can continue training by unfreezing the last blocks of the pre-trained network so that it may adapt to our data. The learning rate should be smaller than usual. 

In [None]:
for i, layer in enumerate(model.layers):
    print(i, layer.name, layer.trainable)

In [None]:
for layer in model.layers[pretrained_first_trainable_layer:]:
    layer.trainable = True
    print(layer.name, "now trainable")
    
model.compile(loss='binary_crossentropy',
    optimizer=optimizers.RMSprop(lr=1e-5),
    metrics=['accuracy'])

#print(model.summary())

In [None]:
logdir = os.path.join(os.getcwd(), "logs",
                      "dvc-"+pt_name+"-finetune-"+datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
print('TensorBoard log directory:', logdir)
os.makedirs(logdir)
callbacks = [TensorBoard(log_dir=logdir)]

In [None]:
%%time

epochs = 20

history_ft = model.fit(train_dataset, epochs=epochs, 
                       validation_data=validation_dataset,
                       callbacks=callbacks, verbose=2)

model.save("dvc-" + pt_name + "-finetune.h5")

In [None]:
plt.figure(figsize=(5,3))
plt.plot(history.epoch, history.history['loss'], color='0.75')
plt.plot(history.epoch, history.history['val_loss'], color='0.75')
plt.plot([e+len(history.epoch) for e in history_ft.epoch],
         history_ft.history['loss'], label='training')
plt.plot([e+len(history.epoch) for e in history_ft.epoch],
         history_ft.history['val_loss'], label='validation')
plt.title('loss')
plt.legend(loc='best')

plt.figure(figsize=(5,3))
plt.plot(history.epoch,history.history['accuracy'], color='0.75')
plt.plot(history.epoch,history.history['val_accuracy'], color='0.75')
plt.plot([e+len(history.epoch) for e in history_ft.epoch],
         history_ft.history['accuracy'], label='training')
plt.plot([e+len(history.epoch) for e in history_ft.epoch],
         history_ft.history['val_accuracy'], label='validation')
plt.title('accuracy')
plt.legend(loc='best');

### Inference

In [None]:
%%time

scores = model.evaluate(test_dataset, verbose=2)
print("Test set %s: %.2f%%" % (model.metrics_names[1], scores[1]*100))