# Dog Breed Prediction using Transfer Learning

## Introduction

Earlier, in my ["WeRateDogs Data Wrangling" project](https://github.com/carterjin/Twitter-WeRateDogs-Data-Wrangling), we used some results provided by Udacity which take dog pictures and predicts its dog breeds. Now I would like to implement this myself. 
The data is downloaded from [Dog Breed Identification Kaggle Competition](https://www.kaggle.com/c/dog-breed-identification/data). The original data is from [Stanford Dogs Dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/). The data contains 10222 dog photos and labels indicating the breed.

In [2]:
import tensorflow
from keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Input, Dense, Dropout, Activation, Flatten,\
GlobalAveragePooling2D
from keras import Sequential
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
import keras

from sklearn.model_selection import train_test_split

from os.path import join

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Using TensorFlow backend.


In [3]:
labels = pd.read_csv('labels.csv')

In [4]:
data_dir = ''
def read_img(img_id, train_or_test, size):
    '''
    Img_id: a string that is also the file name of the picture in train/test folder
    train_or_test: a string indicating if the file is in train or test folder
    size: a tuple ie (224,224) indicating the target size of the converted matrix
    returns:
    img: a matrix with shape depending on size, ie (224, 224, 3)
    '''
    img = image.load_img(join(data_dir, train_or_test, '%s.jpg' % img_id), target_size=size)
    img = image.img_to_array(img)
    return img

ResNet50 is a popular deep residual learning framework for image classification. ImageNet is a set of pretrained weights that we can later use transfer training on. Let's first see how good is the prediction using ImageNet without any training. Considering time, I am only testing the first 20 samples.

In [5]:
model = ResNet50(weights = 'imagenet')
success = 0
fail = 0
i = 0
for (img_id, breed) in labels.values:
    img = read_img(img_id, 'train', (224,224))
    x = preprocess_input(np.expand_dims(img.copy(), axis = 0))
    preds = model.predict(x)
    top_pred = decode_predictions(preds, top = 1)[0][0][1]
    if (top_pred == breed):
        success += 1
    else:
        fail += 1
    i += 1
    if i % 10 == 0: print(i, ' finished')
    if i == 20: break
print(success/(success + fail))

10  finished
20  finished
0.35


I am reading all the image into a matrix X, which has shape (num_of_training_sample, 224, 224, 3). Also preprocess_input converts the values to be 0 centered, and converted from RGB coding to BGR coding, which the ImageNet weights used. img is expanded from (224,224,3) to (1,224,224,3) to fit in matrix X.

In [None]:
from time import time
bef = time()
X = np.zeros((len(labels),224,224,3), dtype = 'float32')
for i, img_id in enumerate(labels.id):
    img = read_img(img_id, 'train', (224,224))
    x = preprocess_input(np.expand_dims(img, axis = 0))
    X[i] = x
print(int(time() - bef),' second spent')

In [10]:
X.shape

(10222, 224, 224, 3)

In [3]:
num_classes = labels.breed.value_counts().shape[0]

### Feature Extraction
I am importing the pretrained ResNet50 model without the classifier layers on the top, freeze this base model so it's not trainable, and then manually add the trainable classifier.

In [4]:
base_model = ResNet50(input_shape = (224,224,3),
                      include_top = False,
                      weights = 'imagenet')
base_model.trainable = False
base_model.summary()



Model: "resnet50"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 230, 230, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 112, 112, 64) 9472        conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 112, 112, 64) 256         conv1[0][0]                      
___________________________________________________________________________________________

### Add the classifier layers

In [5]:
global_average_layer = GlobalAveragePooling2D(input_shape = (7,7,2048))
drop_layer = Dropout(0.5)
prediction_layer = Dense(num_classes, activation = 'softmax')

In [6]:
model = Sequential([base_model, 
                    global_average_layer, 
                    drop_layer, 
                    prediction_layer])

In [7]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
resnet50 (Model)             (None, 7, 7, 2048)        23587712  
_________________________________________________________________
global_average_pooling2d_1 ( (None, 2048)              0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 2048)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 120)               245880    
Total params: 23,833,592
Trainable params: 245,880
Non-trainable params: 23,587,712
_________________________________________________________________


### Compile the model


In [8]:
learning_rate = 0.003
model.compile(optimizer = RMSprop(lr = learning_rate),
              loss = 'categorical_crossentropy',
              metrics = ['accuracy'])

Get dummy variables for the breed.

In [9]:
y = pd.get_dummies(labels.breed)

In [10]:
classes = y.columns

In [19]:
X, X_test, y, y_test = train_test_split(
    X, y, test_size=0.1)
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.125)

The size of the data is barely handled with my laptop memory and takes a long time, so I am saving all the train, test, validation data in a file.

In [20]:
#import joblib
#file = 'X_all'
#joblib.dump([X_train,y_train, X_val, y_val, X_test, y_test], file)

In [None]:
import joblib
[X_train, y_train, X_val, y_val, X_test, y_test] = joblib.load('X_all')

I am using a data generator which also augments the data with rotation, shift, zoom, etc.

In [13]:
datagen = ImageDataGenerator(
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')
train_generator = datagen.flow(x = X_train, y = y_train)
val_generator = datagen.flow(x = X_val, y = y_val)
test_generator = datagen.flow(x = X_test, y = y_test)

In [24]:
from keras.callbacks import ModelCheckpoint
checkpointer = ModelCheckpoint(filepath = 'saved_model/best_val.hdf5')
history = model.fit_generator(train_generator, validation_data = val_generator, epochs = 12, callbacks = [checkpointer],
                             verbose = 1)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


In [None]:
success = 0
fail = 0
i = 1
for (img_id, breed) in labels.values:
    img = read_img(img_id, 'train', (224,224))
    x = preprocess_input(np.expand_dims(img.copy(), axis = 0))
    preds = model.predict(x)
    top_pred = [1 if cl == max(preds[0]) else 0 for cl in preds[0]]
    if (top_pred == list(y_train.loc[i])):
        success += 1
    else:
        fail += 1
    i += 1
    if i % 10 == 0: print(i, ' finished')
    if i == 20: break
print(success/(success + fail))

In [28]:
success = 0
fail = 0
i = 0
for i,x in enumerate(X_test):
    feat = base_model.predict(np.expand_dims(x, axis = 0))
    preds = model2.predict(feat)
    top_pred = [1 if cl == max(preds[0]) else 0 for cl in preds[0]]
    if (top_pred == list(y_test.iloc[i])):
        success += 1
    else:
        fail +=1
    if i == 50: break
print(success/(success + fail))
    

0.7254901960784313


In [31]:
!mkdir saved_model
model2.save('saved_model/bottleneck_model')

A subdirectory or file saved_model already exists.


In [25]:
model = keras.models.load_model('saved_model/my_model')



In [14]:
def decode(pred, classes, top = 3):
    # decode the prediction vector into a list of tuples consisting class name and probabilities, default top 3 is given
    result = [('',0.0)] * len(classes)
    for idx,prob in enumerate(pred[0]):
        result[idx] = classes[idx], prob
    result.sort(key = lambda x: x[1], reverse = True)
    return result[:top]

In [None]:
pred2 = model2.predict(x)
ans = decode_predictions(pred2)

### Train using bottleneck features

One of the problem with the previous method is that when training, X_train needs to go through the ResNet base model (without classifier layers), and that makes the whole process taking around 10 hours to run on my laptop. While in fact, the base model parameters can't even be trained, so we can just train on the features we obtain from ResNet base models, which are called bottleneck features. This drastically decrease the computation time. However, the down side is we can't take advantage of image augmentation anymore.

In [15]:
bottleneck_features_train = base_model.predict(X_train, verbose = 1)



In [16]:
bottleneck_features_val = base_model.predict_generator(val_generator, verbose = 1)



In [17]:
import joblib
file = 'bottleneck_features'
joblib.dump([bottleneck_features_train,y_train, bottleneck_features_val, y_val],
            file)

['bottleneck_features']

In [29]:
model2 = Sequential()
model2.add(GlobalAveragePooling2D(input_shape = (7,7,2048)))
model2.add(Dense(256, activation = 'relu'))
model2.add(Dropout(0.6))
model2.add(Dense(120, activation = 'softmax'))
learning_rate= 0.001
model2.compile(optimizer = 'adam',
              loss = 'categorical_crossentropy',
              metrics = ['accuracy'])

In [30]:
history = model2.fit(bottleneck_features_train, y_train, validation_split = 0.1, epochs = 100,
                             verbose = 1)

Train on 7244 samples, validate on 805 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100


Indeed, earlier it took about 30 minute to run each epoch, now it took only a few seconds.
This method gave us 72.5% accuracy for the test data.