# Introduction

In this notebook I'll be trying out few different ways of building a CNN model, building the model from a scratch, adding regularization and data augmentation, and then transfer learning using a pre-trained model, and comparing both prediction accuracy and training time from these in the end. But there's a twist: Instead of doing the proper thing and testing the model performance on the competition data and treating the task at hand as a multiclass classification problem, the models will be tested using 50 pictures of our dog: 'Ace' the corgi! The model performance will be evaluated only on how well it can predict that 'Ace' is a corgi. 

I've been reading [Chollet's Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python) so this notebook will be quite heavily influenced by that. I'll try to share my reasoning, but as this notebook is created from my learning and "for fun" purposes don't expect best practices here!

# Preparations

## Loading Libraries and Reading Data

In [None]:
import pandas as pd
import numpy as np
import os
import shutil
import random
import time
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from skimage.transform import resize

from keras import models
from keras import layers
from keras import optimizers
from keras.preprocessing import image
from keras.callbacks import EarlyStopping

random.seed(99)  # for reproducibility

The first thing needs to be done is sorting the pictures in the training and test data to corresponding folders to be read from there by `Keras`. The `input` folder in Kaggle is read-only. Because of this, one needs to use the `working` folder to arrange the images from the training data to subfolders by the dog breed:

In [None]:
dataset_dir = '/kaggle/input/dog-breed-identification/train'
test_dataset_dir = '/kaggle/input/50-corgi-pictures/'
train_labels = pd.read_csv('/kaggle/input/dog-breed-identification/labels.csv')
test_labels = pd.DataFrame({'id': os.listdir(test_dataset_dir), 'breed': 'pembroke'})

# helper function to create directory for the script not throwing an error if the 
def make_dir(x):
    if not os.path.exists(x):
        os.makedirs(x)

# directory where we’ll store our dataset subset for keras generators to read from
base_dir = '/kaggle/working/dog-breed-identification/subsets/'
make_dir(base_dir)

breeds = list(train_labels.breed.unique())

The `train_labels` are in the form of a data frame, with one column for the image name, and the other for the corresponding label:

In [None]:
print(train_labels.head())

The pictures not used for training are used for validation. This could also be specified as an argument in Keras in `model.fit()`, but here I wanted to be able to inspect the pictures by hand that is used as validation from the folders on my computer.

In [None]:
train_frac = 0.8

As mentioned above, the data needs to be arranged so that pictures for each breed are in their folder. First, we'll arrange the test and validation data provided by the competition to the correct folders:

In [None]:
train_img_fnames = []
validation_img_fnames = []
test_img_fnames = []

# directories for the training, validation, and test images
train_dir = os.path.join(base_dir, 'train')
make_dir(train_dir)
validation_dir = os.path.join(base_dir, 'validation')
make_dir(validation_dir)
test_dir = os.path.join(base_dir, 'test')
make_dir(test_dir)

breed_counts = train_labels.breed.value_counts()

# loop through breeds for training and validation
for breed in breeds:
    # make a directory for each breed
    train_breed_dir = os.path.join(train_dir, breed)
    validation_breed_dir = os.path.join(validation_dir, breed)
    
    make_dir(train_breed_dir)
    make_dir(validation_breed_dir)
    
    # get training count
    n_train = int(breed_counts[breed] * train_frac)
    i = 0
    
    # get ids for training and validation by breeds
    breed_ids = train_labels[train_labels.breed == breed].id
    breed_train = breed_ids.sample(n=n_train, random_state=57)
    breed_validation = breed_ids[~breed_ids.isin(breed_train)]

    # transfer doggo images to these folders accordingly
    for dog in breed_train:
        i+=1
        src = os.path.join(dataset_dir, dog + '.jpg')
        dst = os.path.join(train_breed_dir, dog + '.jpg')
        shutil.copyfile(src, dst)
        train_img_fnames.append(dog)

    for dog in breed_validation:
        i+=1
        src = os.path.join(dataset_dir, dog + '.jpg')
        dst = os.path.join(validation_breed_dir, dog + '.jpg')
        shutil.copyfile(src, dst)
        validation_img_fnames.append(dog)

For clarity, let's check the paths and filenames from the last iteration of the for loop:

In [None]:
print(f'Source: {src}\nDestination: {dst}')

Let's verify that the last dog is a rottweiler:

In [None]:
train_labels.loc[train_labels.id == os.path.splitext(src)[0].split('/')[-1]]

After dealing with the training and validation data comes the test data, containing only pictures of 'Ace': 

In [None]:
# loop through 'breeds' for testing
test_breed_dir = os.path.join(test_dir, 'pembroke')
make_dir(test_breed_dir)
for corgi_img in test_labels.id:
    src = os.path.join(test_dataset_dir, corgi_img)
    dst = os.path.join(test_breed_dir, corgi_img)
    shutil.copyfile(src, dst)
    test_img_fnames.append(os.path.splitext(corgi_img)[0])

And for the test data the equivalent paths look like:

In [None]:
print(f'Source: {src}\nDestination: {dst}')

## Exploring the Data

"Data exploration" here is pretty much just looking at the corgi pictures. Also, an interesting question at this point is that how represented 'pembroke' is in the training data?

In [None]:
breed_color = [['red' if (x == 'pembroke') else 'lightgrey' for x in breed_counts.index]] # attention to double []
pd.DataFrame(breed_counts).plot.bar(color=breed_color, width=0.8, figsize=(21, 5))

In [None]:
train_labels.loc[train_labels.breed == 'pembroke', 'id']

Now let's look at some random corgis from the training data:

In [None]:
# pick 9 random pembroke images from training data
pembrokes = train_labels.loc[train_labels.breed == 'pembroke', 'id']

train_img_sample = random.sample(pembrokes.tolist(), 9)
read_train_imgs = [mpimg.imread(os.path.join(dataset_dir, x + '.jpg')) for x in train_img_sample]

fig, axes = plt.subplots(nrows=3, ncols=3, figsize=(15,15))
for i, v in enumerate(fig.axes):
    v.imshow(read_train_imgs[i])
    v.text(x=read_train_imgs[i].shape[1]/2, y=read_train_imgs[i].shape[1]/40, s=train_img_sample[i], bbox=dict(facecolor='white', alpha=0.9), ha='center', va='top', size=11)
plt.tight_layout()
plt.show()

And compare them to the pictures of 'Ace':

In [None]:
# take a look at the pictures of Ace
def resize_pic(x):
    return resize(x, (x.shape[0] // 8, x.shape[1] // 8), anti_aliasing=True)

test_img_sample = random.sample(test_img_fnames, 9)
read_test_imgs = [resize_pic(mpimg.imread(os.path.join(test_dataset_dir, x + '.jpg'))) for x in test_img_sample]

fig, axes = plt.subplots(nrows=3, ncols=3, figsize=(15,15))
for i, v in enumerate(fig.axes):
    v.imshow(read_test_imgs[i])
    v.text(x=read_test_imgs[i].shape[1]/2, y=read_test_imgs[i].shape[1]/40, s=test_img_sample[i], bbox=dict(facecolor='white', alpha=0.9), ha='center', va='top', size=14)
plt.tight_layout()
plt.show()

Training data includes pictures of both puppies and adult dogs, which is nice because test data does this also.

## Keras Callbacks, Helper Functions, and Other Parameters

Here, we want to define some common parameters for the models, e.g. `batch_size`, early stopping (when the validation loss doesn't improve anymore), and helper functions for visualizing the model performance over the epochs:

In [None]:
# variables used by all models
batch_size = 20
# early stopping to stop training after validation loss stops improving
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)

# helper function with the visualization

def plot_history(history):
    acc = history.history['acc']
    val_acc = history.history['val_acc']
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    epochs = range(1, len(acc) + 1)
    plt.plot(epochs, acc,  color='#008080', mec='k', label='Training accuracy')
    plt.plot(epochs, val_acc, color='#FFA500', label='Validation accuracy')
    plt.title('Training and validation accuracy')
    plt.legend()
    plt.figure()
    plt.plot(epochs, loss, color='#008080', label='Training loss')
    plt.plot(epochs, val_loss, color='#FFA500', label='Validation loss')
    plt.title('Training and validation loss')
    plt.legend()
    plt.show()

The `test_model` function is used to make predictions and store them in a table for easy comparison between model performance at the end of the notebook. Some models use generators and others (transfer learning) use features extracted by the pre-trained model, and the `test_model`-function needs to be able to work with both.

In [None]:
results = []

def test_model(model_name, model, train_time, history, features=None, generator=None):
    if generator is None:
        predictions = model.predict(features)
    else:
        predictions = model.predict_generator(generator, steps = generator.n)
    pred = []
    truth = [86 for i in range(50)]
    for i in range(0, len(predictions)):
        pred.append(np.argmax(predictions[i]))
    acc_bool = [x == y for x, y in zip(pred, truth)]
    acc = round(sum(acc_bool) / len(acc_bool) * 100, 1)
    valid_loss = min(history.history['val_loss'])
    print(f'TEST ACCURACY\n\t{acc}% ({sum(acc_bool)}/{len(acc_bool)} correct)')
    train_time = round(train_time / 60, 1)
    return {'Model Name': model_name, 'Test Accuracy %': acc, 'Validation Loss': valid_loss, 'Training Time (minutes)': train_time}

# Own Models

## ConvNet from a Scratch

The training data is quite limited and there are not too many pictures on average per breed, so the data is probably not enough to learn a CNN from a scratch. Anyway, we'll try to do so, and at least will get to use the pre-trained model(s) as a baseline to compare the pre-trained networks with.

__CNN architecture__<br>
I used principles mentioned [here](https://towardsdatascience.com/a-guide-to-an-efficient-way-to-build-neural-network-architectures-part-ii-hyper-parameter-42efca01e5d7), "dogs vs cats" from Chollet's book as well as VGG-16 as an inspiration. I also tested some different values for the number of hidden layers and the dropout, and selected values that seemed to work reasonably well while keeping the network size moderate.

__Optimizer__<br>
As an optimizer, we'll use `Adam`, which is short for Adaptive Moment Estimation. Adam is a popular algorithm in the field of deep learning because it achieves good results fast. Andrej Karpathy (Tesla AI guy) suggested it as the default optimization method for deep learning applications, so it should be good for a simple dog picture problem. Metric is accuracy, as we're only interested in how many corgi pictures the model classifies correctly.

Because the targets are not one-hot encoded but instead stored integers, I'll use `sparse_categorical_crossentropy` instead of `categorical_crossentropy` loss while compiling the model. We'll use a generator built with `flow_from_directory()`-function, which reads through the training and validation folders and feeds the pictures in batches for calculating the weights.

In [None]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(224, 224, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(120, activation='softmax')) # 120 different breeds

model.summary()

optimizer = optimizers.Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=optimizer,
    loss='sparse_categorical_crossentropy',
    metrics=['acc'])

datagen = image.ImageDataGenerator(rescale=1./255) # rescale pixel values to [0, 1] interval

train_generator = datagen.flow_from_directory(
    train_dir,
    target_size=(224, 224),
    batch_size=batch_size,
    class_mode='sparse')
        
validation_generator = datagen.flow_from_directory(
    validation_dir,
    target_size=(224, 224),
    batch_size=batch_size,
    class_mode='sparse',
    classes=train_generator.class_indices)

test_generator = datagen.flow_from_directory(
    test_dir,
    target_size=(224, 224),
    batch_size=1, # predict one picture at a time
    class_mode='sparse',
    classes=train_generator.class_indices)

# dummy check if all class indices are the same in all generators
print(f'\nAll classes same: {train_generator.class_indices == test_generator.class_indices == validation_generator.class_indices}')

If the model initializes correctly, the loss at the beginning of the training process should be -ln(1/120), which is approximately 4.787. As a baseline accuracy, we can use a random guess, which would result in an accuracy of 0.8%.

In [None]:
start_clock = time.clock()
history = model.fit_generator(
        train_generator,
        steps_per_epoch=int(train_generator.n / train_generator.batch_size), # matching the number of samples, 8127/20  
        epochs=100,
        validation_data=validation_generator,
        validation_steps=int(validation_generator.n / validation_generator.batch_size),
        callbacks=[es],
        verbose=0)
end_clock = time.clock()
train_time = end_clock - start_clock

Now we can inspect the learning process, and store the results from the trained model for later use using the helper functions defined earlier:

In [None]:
plot_history(history)
results.append(test_model('CNN', model, train_time, history, generator=test_generator))

The model learns the features of the training data but heavily overfits, as the training error $J_{train}(\Theta)$ keeps decreasing towards 0 and is very low compared to the validation error $J_{CV}(\Theta)$ when both are plotted over the epochs. In the contrast, if both errors were high, we would observe high bias (underfitting). In this case, the next steps would be to take a step back and to modify the architecture of the model by adding regularization and "more data" through the data augmentation.

## Adding Data Augmentation and Regularization to ConvNet

As the first results were not too encouraging (though beating a random guess), we can try to improve the model performance using data augmentation and regularization. For data augmentation, we'll define a new data generator performing transformations to the input images. Test and validation images should not be augmented.

Also, it's to be noted, that the `ImageDataGenerator` takes in the original data, applies the transformations, and returns the new augmented data and disregards the original data! To use data augmentation, we need a new generator:

In [None]:
aug_datagen = image.ImageDataGenerator(
    rescale=1./255,
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    vertical_flip=False,
    brightness_range=[0.2, 1.0],
    fill_mode='nearest')

Let's look at a randomly chosen training image how it looks augmented:

In [None]:
my_img = os.path.join(dataset_dir, random.sample(train_labels[train_labels['breed'] == 'pembroke'].id.tolist(), 1)[0] + '.jpg')
my_img = image.load_img(my_img, target_size=(224, 224))
my_img = image.img_to_array(my_img)

my_img = my_img.reshape((1,) + my_img.shape)
fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(15,5))
for batch, (i, v) in zip(aug_datagen.flow(my_img, batch_size=1), enumerate(fig.axes)):
    v.imshow(image.array_to_img(batch[0]))
    if i == 3:
        break
plt.tight_layout()
plt.show()

And now training a model using data augmentation and regularization:

In [None]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(224, 224, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu', input_dim=7 * 7 * 512))
model.add(layers.Dropout(0.2)) # it's now drop-rate instead of keep-rate
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(120, activation='softmax'))

model.summary()

optimizer = optimizers.Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=optimizer,
    loss='sparse_categorical_crossentropy',
    metrics=['acc'])

In [None]:
train_generator_aug = aug_datagen.flow_from_directory(
    train_dir,
    target_size=(224, 224),
    batch_size=batch_size,
    class_mode='sparse')

In [None]:
start_clock = time.clock()
history = model.fit_generator(
        train_generator_aug,
        steps_per_epoch=int(train_generator_aug.n / train_generator_aug.batch_size) * 3, # every image augmented three times
        epochs=100,
        validation_data=validation_generator,
        validation_steps=int(validation_generator.n / validation_generator.batch_size),
        callbacks=[es],
        verbose=0)
end_clock = time.clock()
train_time = end_clock - start_clock

In [None]:
plot_history(history)
results.append(test_model('CNN+reg+aug', model, train_time, history, generator=test_generator))

The results improved, but the overall performance of the model is still not very good. As a next step in the attempt to make better predictions we'll try a pre-trained model.

# Pretrained Models

As expected, training a model from a scratch didn't work that well for the problem at hand as the training data was quite limited for a neural network classifier that can learn complex functions. Therefore, it's expected to be beneficial for the model to be exposed to a higher amount of data. We can achieve this in a computationally effective way by using a pre-trained network to extract the features, which is then used as input to our dense classifier, which makes the final predictions. First, we define helper functions to extract the features and count the files:

In [None]:
def extract_features(directory, sample_count, x, y, z, target_size):
    global class_dictionary
    global filenames
    features = np.zeros(shape=(sample_count, x, y, z))
    labels = np.zeros(shape=(sample_count))
    generator = datagen.flow_from_directory(
        directory,
        target_size=target_size,
        batch_size=batch_size,
        class_mode='sparse',
        shuffle=False,
        classes=train_generator.class_indices)
    i = 0
    for inputs_batch, labels_batch in generator:
        features_batch = conv_base.predict(inputs_batch)
        features[i * batch_size : (i + 1) * batch_size] = features_batch
        labels[i * batch_size : (i + 1) * batch_size] = labels_batch
        i += 1
        #print(i)
        if i * batch_size >= sample_count:
            break
    class_dictionary = generator.class_indices
    filenames = generator.filenames
    return features, labels

def count_files(input_dir):  # counts files from subdirs
    path = input_dir
    n = 0
    folders = ([name for name in os.listdir(path) 
                if os.path.isdir(os.path.join(path, name))])  # get all directories 
    for folder in folders:
        contents = os.listdir(os.path.join(path,folder))  # get list of contents
        
        n += len(contents)
    return(n)

train_n = count_files(train_dir)
validation_n = count_files(validation_dir)
test_n = count_files(test_dir)

## VGG19

As a first model lets try out VGG19 which takes a default input image size of 224x224 pixels:

In [None]:
from keras.applications import VGG19
conv_base = VGG19(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

Next, checking the shape of the output of the last layer of the convolutional base to get the shape of the features that will be fed to the dense classifier:

In [None]:
conv_base.layers[-1].output_shape

In [None]:
hw = conv_base.layers[-1].output_shape[1]  # height/width
de = conv_base.layers[-1].output_shape[-1]  # depth
ts = 224  # size

In [None]:
print(f'NUMBER OF IMAGE FILES\nTrain: {train_n}\nValidation: {validation_n}\nTest: {test_n}')

train_features, train_labels_arr = extract_features(train_dir, train_n, hw, hw, de, target_size=(ts, ts))
validation_features, validation_labels_arr = extract_features(validation_dir, validation_n, hw, hw, de, target_size=(ts, ts))
test_features, test_labels_arr = extract_features(test_dir, test_n, hw, hw, de, target_size=(ts, ts))

train_features = np.reshape(train_features, (train_n, hw * hw * de))
validation_features = np.reshape(validation_features, (validation_n, hw * hw * de))
test_features = np.reshape(test_features, (test_n, hw * hw * de))

In [None]:
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_dim=hw * hw * de))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(120, activation='softmax'))
optimizer = optimizers.Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=optimizer,
    loss='sparse_categorical_crossentropy',
    metrics=['acc'])

start_clock = time.clock()
history = model.fit(train_features, train_labels_arr,
    epochs=100,
    batch_size=batch_size,
    validation_data=(validation_features, validation_labels_arr),
    callbacks=[es],
    verbose=0)
end_clock = time.clock()
train_time = end_clock - start_clock

In [None]:
plot_history(history)
results.append(test_model('VGG19', model, train_time, history, features=test_features))

The accuracy is well improved over the self-made CNN, so using a pre-trained model is taking things in the right direction.

## InceptionResNetV2

Another pre-trained model that should work well with the problem at hand is InceptionResNetV2. The default input size for Inception-ResNet-V2 is 299x299. Again, we load a pre-trained model without its top:

In [None]:
from keras.applications import InceptionResNetV2
conv_base = InceptionResNetV2(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
conv_base.layers[-1].output_shape

In [None]:
hw = conv_base.layers[-1].output_shape[1]  # height/width
de = conv_base.layers[-1].output_shape[-1]  # depth
ts = 299  # size

In [None]:
print(f'NUMBER OF IMAGE FILES\nTrain: {train_n}\nValidation: {validation_n}\nTest: {test_n}')

train_features, train_labels_arr = extract_features(train_dir, train_n, hw, hw, de, target_size=(ts, ts))
validation_features, validation_labels_arr = extract_features(validation_dir, validation_n, hw, hw, de, target_size=(ts, ts))
test_features, test_labels_arr = extract_features(test_dir, test_n, hw, hw, de, target_size=(ts, ts))

train_features = np.reshape(train_features, (train_n, hw * hw * de))
validation_features = np.reshape(validation_features, (validation_n, hw * hw * de))
test_features = np.reshape(test_features, (test_n, hw * hw * de))

Building the dense classifier which takes the right input size:

In [None]:
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_dim=hw * hw * de))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(120, activation='softmax'))
optimizer = optimizers.Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['acc'])

start_clock = time.clock()
history = model.fit(train_features, train_labels_arr,
    epochs=100,
    batch_size=batch_size,
    validation_data=(validation_features, validation_labels_arr),
    callbacks=[es],
    verbose=0)
end_clock = time.clock()
train_time = end_clock - start_clock

In [None]:
plot_history(history)
results.append(test_model('Inception-ResNet-V2', model, train_time, history, features=test_features))

Huge improvement over the own model in both validation and test accuracy! Let's now make predictions using this model using the whole test data and see how the classifier performs.

# Results

## Visualizing Predictions

Now when we have a model with good enough accuracy, let's make some predictions on the test data, and see what kind of dog it thinks our test subject is.

In [None]:
test_predictions = model.predict(test_features)

Here we'll build the data for the visualization, for all the pictures we get to extract the top 5 predictions for a visualization next to the image:

In [None]:
all_labels = dict((v,k) for k,v in test_generator.class_indices.items())

test_ind_labels = np.argsort(-test_predictions, axis=1)[:, :5]
pred_labels = []
for i, v in enumerate(test_ind_labels):
    pred_labels.append([all_labels[x] for x in v])
pred_probs = [test_predictions[i][test_ind_labels[i]] for i in range(0, test_n)]

read_test_imgs = []
for i, v in enumerate(test_img_fnames[0:20]):
    read_test_imgs.append(resize_pic(mpimg.imread(os.path.join(test_dataset_dir, test_img_fnames[i] + '.jpg'))))

Now plotting the images with the predictions as bar charts next to them, coloring the truth bar with distinguishable color:

In [None]:
pics_per_row = 2
pred_n = len(read_test_imgs)
fig, axes = plt.subplots(nrows=pred_n // pics_per_row, ncols=pics_per_row * 2, figsize=(pics_per_row * 9, pred_n * 1.5))
r = -1
c = 0
for i, v in enumerate(read_test_imgs):
    if i % pics_per_row*2 == 0:
        r += 1
        c = 0
    axes[r,c].barh([str(x) for x in pred_labels[i]], pred_probs[i],
    color=['royalblue' if x == 'pembroke' else 'darkgrey' for x in pred_labels[i]], edgecolor='black')
    axes[r,c].set(xlim=(0,1))
    axes[r,c].set_aspect(0.2, anchor='W')
    axes[r,c].invert_yaxis()
    for a in range(len(pred_probs[i])):
        axes[r,c].text(x=pred_probs[i][a], y=a,
            s=str(round(pred_probs[i][a] * 100, 1)) + '%',
            verticalalignment='center',
            horizontalalignment='right' if pred_probs[i][a] > 0.25 else 'left', size = 16)
    axes[r,c].set_title('Predicted')
    axes[r,c].set_anchor('E')
    axes[r,c].tick_params(axis='y', labelsize=16)
    c += 1
    axes[r,c].imshow(v)
    axes[r,c].set_title(test_img_fnames[i])
    axes[r,c].set_anchor('W')
    c += 1
plt.tight_layout()

plt.show()

## Model Comparison and Conclusions

In [None]:
result_table = pd.DataFrame(results)
print(result_table)

In [None]:
# delete all subdirs (needed for not throwing an error for too many files)
shutil.rmtree(base_dir)

If training up from a scratch, adding augmentation improved the model performance but came with a cost regarding the training time. Inception-ResNet-v2 worked well with the problem, and with further tuning of the dense classifier a top of it would have most likely to perform even better. Also it's mentioned in [Google's AI blog](https://ai.googleblog.com/2016/08/improving-inception-and-image.html) that the architecture excels at identifying individual dog breeds, which is quite neat. With the limited data building up the classifier from the scratch did not work very well. Nonetheless, it's good to keep in mind that the dog breed classification problem is quite tough: because of the 120 classes present in the data set, taking random guesses would result in accuracy of only 0.8%.