# DOG BREED IDENTIFICATION TASK

# 1. Problem Statement

* We are provided with a training set and a test set of images of dogs. Each image has a filename that is its unique id. The dataset comprises 120 breeds of dogs. The goal is to create a classifier capable of determining a dog's breed from a photo. The list of breeds is as follows

# 2. Task Description

* Who's a good dog? Who likes ear scratches? Well, it seems those fancy deep neural networks don't have all the answers. However, maybe they can answer that ubiquitous question we all ask when meeting a four-legged stranger: what kind of good pup is that?

* In this Task, we were provided a strictly canine subset of ImageNet in order to practice fine-grained image categorization. How well we can tell our Norfolk Terriers from our Norwich Terriers? With 120 breeds of dogs and a limited number training images per class, we might find the problem more, err, ruff than we anticipated.

# 3. Dog Breed Names

affenpinscher
afghan_hound
african_hunting_dog
airedale
american_staffordshire_terrier
appenzeller
australian_terrier
basenji
basset
beagle
bedlington_terrier
bernese_mountain_dog
black-and-tan_coonhound
blenheim_spaniel
bloodhound
bluetick
border_collie
border_terrier
borzoi
boston_bull
bouvier_des_flandres
boxer
brabancon_griffon
briard
brittany_spaniel
bull_mastiff
cairn
cardigan
chesapeake_bay_retriever
chihuahua
chow
clumber
cocker_spaniel
collie
curly-coated_retriever
dandie_dinmont
dhole
dingo
doberman
english_foxhound
english_setter
english_springer
entlebucher
eskimo_dog
flat-coated_retriever
french_bulldog
german_shepherd
german_short-haired_pointer
giant_schnauzer
golden_retriever
gordon_setter
great_dane
great_pyrenees
greater_swiss_mountain_dog
groenendael
ibizan_hound
irish_setter
irish_terrier
irish_water_spaniel
irish_wolfhound
italian_greyhound
japanese_spaniel
keeshond
kelpie
kerry_blue_terrier
komondor
kuvasz
labrador_retriever
lakeland_terrier
leonberg
lhasa
malamute
malinois
maltese_dog
mexican_hairless
miniature_pinscher
miniature_poodle
miniature_schnauzer
newfoundland
norfolk_terrier
norwegian_elkhound
norwich_terrier
old_english_sheepdog
otterhound
papillon
pekinese
pembroke
pomeranian
pug
redbone
rhodesian_ridgeback
rottweiler
saint_bernard
saluki
samoyed
schipperke
scotch_terrier
scottish_deerhound
sealyham_terrier
shetland_sheepdog
shih-tzu
siberian_husky
silky_terrier
soft-coated_wheaten_terrier
staffordshire_bullterrier
standard_poodle
standard_schnauzer
sussex_spaniel
tibetan_mastiff
tibetan_terrier
toy_poodle
toy_terrier
vizsla
walker_hound
weimaraner
welsh_springer_spaniel
west_highland_white_terrier
whippet
wire-haired_fox_terrier
yorkshire_terrier

# 4. Data Set Description

Reference: https://www.kaggle.com/c/dog-breed-identification
1. train.zip - the training set, you are provided the breed for these dogs
2. test.zip - the test set, you must predict the probability of each breed for each image
3. sample_submission.csv - a sample submission file in the correct format
4. labels.csv - the breeds for the images in the train set

# 5. Solution

In [16]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from keras.models import Model, Sequential
from keras.layers import Input, Flatten, Dense, Conv2D, MaxPooling2D, Dropout
from keras.utils import layer_utils
from keras import backend as K
from keras.optimizers import RMSprop, SGD, Adam
from keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint, TensorBoard, CSVLogger
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss, accuracy_score
from keras.preprocessing import image
from keras.applications.resnet50 import ResNet50
from keras.preprocessing.image import ImageDataGenerator
import cv2
import pandas as pd
import keras

### Import ResNet50 weights trained on ImageNet

In [17]:
#General idea: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
#The author of this article tried to classify dogs vs cats with a pretrained model and provides an abstract idea 
model = ResNet50(weights='imagenet',include_top=False, input_shape=(224, 224, 3))

### Add 3 hidden layers and an output layer

In [18]:
x = model.output
x = Flatten()(x)
x=Dropout(0.35)(x)
x=Dense(units=1000,activation='relu')(x)
x=Dropout(0.4)(x)
x=Dense(units=750,activation='relu')(x)
x=Dropout(0.4)(x)
x=Dense(units=1000,activation='relu')(x)
x=Dropout(0.8)(x)
#clf.add(Dense(units=120,activation='softmax')
#stochastic gradient descent -Adam -optimizer
#loss func categorical cross entropy
#metrics = accuracy
#clf.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
predictions = Dense(120, activation='softmax')(x)

In [19]:
main_model = Model(inputs=model.input, outputs=predictions)

#train only the hidden layers and output layer, donot train the resnet model
for curLayer in model.layers:
    curLayer.trainable = False
    
main_model.compile(loss='categorical_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])

callbacks_list = [keras.callbacks.EarlyStopping(monitor='val_acc', patience=3, verbose=1)]
main_model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_3 (InputLayer)             (None, 224, 224, 3)   0                                            
____________________________________________________________________________________________________
zero_padding2d_3 (ZeroPadding2D) (None, 230, 230, 3)   0           input_3[0][0]                    
____________________________________________________________________________________________________
conv1 (Conv2D)                   (None, 112, 112, 64)  9472        zero_padding2d_3[0][0]           
____________________________________________________________________________________________________
bn_conv1 (BatchNormalization)    (None, 112, 112, 64)  256         conv1[0][0]                      
___________________________________________________________________________________________

### Image Augmentation and Model Fit

In [20]:
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)
#check class_mode in keras documentation https://keras.io/preprocessing/image/
training_set = train_datagen.flow_from_directory(
        '/training_images',
        target_size=(224, 224),
        batch_size=20,
        class_mode='categorical')

test_set = test_datagen.flow_from_directory(
        '/validation_images',
        target_size=(224, 224),
        batch_size=22,
        class_mode='categorical')

main_model.fit_generator(
        training_set,
        steps_per_epoch=400,
        epochs=25,
        validation_data=test_set,
        validation_steps=101,callbacks=callbacks_list)

Found 8000 images belonging to 120 classes.
Found 2222 images belonging to 120 classes.
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 00022: early stopping


<keras.callbacks.History at 0x7fa953727198>

### Obtain Predictions

In [21]:
test_set = []
test_set_ids = []
for curImage in os.listdir('/test_images'):
    test_set_ids.append(os.path.splitext(curImage)[0])
    curImage = cv2.imread('/test_images/'+curImage)
    test_set.append(cv2.resize(curImage,(224, 224)))
    
test_set = np.array(test_set, np.float32)/255.0

predictions= main_model.predict(test_set)

### Process the Predictions to an Output File

In [22]:
training_set.class_indices

{'affenpinscher': 0,
 'afghan_hound': 1,
 'african_hunting_dog': 2,
 'airedale': 3,
 'american_staffordshire_terrier': 4,
 'appenzeller': 5,
 'australian_terrier': 6,
 'basenji': 7,
 'basset': 8,
 'beagle': 9,
 'bedlington_terrier': 10,
 'bernese_mountain_dog': 11,
 'black-and-tan_coonhound': 12,
 'blenheim_spaniel': 13,
 'bloodhound': 14,
 'bluetick': 15,
 'border_collie': 16,
 'border_terrier': 17,
 'borzoi': 18,
 'boston_bull': 19,
 'bouvier_des_flandres': 20,
 'boxer': 21,
 'brabancon_griffon': 22,
 'briard': 23,
 'brittany_spaniel': 24,
 'bull_mastiff': 25,
 'cairn': 26,
 'cardigan': 27,
 'chesapeake_bay_retriever': 28,
 'chihuahua': 29,
 'chow': 30,
 'clumber': 31,
 'cocker_spaniel': 32,
 'collie': 33,
 'curly-coated_retriever': 34,
 'dandie_dinmont': 35,
 'dhole': 36,
 'dingo': 37,
 'doberman': 38,
 'english_foxhound': 39,
 'english_setter': 40,
 'english_springer': 41,
 'entlebucher': 42,
 'eskimo_dog': 43,
 'flat-coated_retriever': 44,
 'french_bulldog': 45,
 'german_shepherd'

In [23]:
classes= {index:breed for breed,index in training_set.class_indices.items()}
column_names = [classes[i] for i in range(120)]
column_names

['affenpinscher',
 'afghan_hound',
 'african_hunting_dog',
 'airedale',
 'american_staffordshire_terrier',
 'appenzeller',
 'australian_terrier',
 'basenji',
 'basset',
 'beagle',
 'bedlington_terrier',
 'bernese_mountain_dog',
 'black-and-tan_coonhound',
 'blenheim_spaniel',
 'bloodhound',
 'bluetick',
 'border_collie',
 'border_terrier',
 'borzoi',
 'boston_bull',
 'bouvier_des_flandres',
 'boxer',
 'brabancon_griffon',
 'briard',
 'brittany_spaniel',
 'bull_mastiff',
 'cairn',
 'cardigan',
 'chesapeake_bay_retriever',
 'chihuahua',
 'chow',
 'clumber',
 'cocker_spaniel',
 'collie',
 'curly-coated_retriever',
 'dandie_dinmont',
 'dhole',
 'dingo',
 'doberman',
 'english_foxhound',
 'english_setter',
 'english_springer',
 'entlebucher',
 'eskimo_dog',
 'flat-coated_retriever',
 'french_bulldog',
 'german_shepherd',
 'german_short-haired_pointer',
 'giant_schnauzer',
 'golden_retriever',
 'gordon_setter',
 'great_dane',
 'great_pyrenees',
 'greater_swiss_mountain_dog',
 'groenendael',


In [24]:
predictions_df = pd.DataFrame(predictions)
predictions_df.columns = column_names
predictions_df.insert(0,'id', test_set_ids)
predictions_df.set_index('id',inplace=True)
predictions_df

Unnamed: 0,id,affenpinscher,afghan_hound,african_hunting_dog,airedale,american_staffordshire_terrier,appenzeller,australian_terrier,basenji,basset,...,toy_poodle,toy_terrier,vizsla,walker_hound,weimaraner,welsh_springer_spaniel,west_highland_white_terrier,whippet,wire-haired_fox_terrier,yorkshire_terrier
0,5c31a03d1769fc7c8feec1da94845832,1.024283e-04,6.245137e-04,9.412906e-04,2.632139e-03,1.441035e-02,9.259890e-03,8.142071e-05,8.953725e-03,5.695231e-02,...,1.196571e-04,2.561126e-03,1.751565e-02,3.916050e-02,3.211018e-02,1.283078e-02,1.158179e-05,1.316362e-02,4.887831e-04,1.715536e-04
1,68e9ff449f66e7f9f6ca64eb987695c9,1.289049e-03,6.940818e-03,2.236598e-04,1.107692e-02,3.786311e-04,5.561443e-05,1.043473e-02,2.777994e-04,4.811669e-04,...,7.043970e-02,6.531709e-04,3.853314e-04,1.605656e-04,1.930904e-04,4.923375e-04,2.710620e-02,1.228174e-03,6.118338e-03,4.907986e-03
2,297517505e03ecbbcff2039bfe14e56a,1.239671e-06,7.429042e-06,5.768436e-05,1.770780e-05,2.892021e-03,9.133689e-02,1.421475e-06,8.486685e-03,6.837911e-02,...,1.488392e-07,4.982020e-03,3.010924e-05,9.360307e-02,2.290602e-04,9.462078e-03,3.652391e-07,5.743349e-03,6.906147e-05,4.822872e-07
3,a45627424a181ad5c4c3ed4c082247e0,3.025171e-02,2.651049e-03,4.073233e-08,1.587124e-03,4.097885e-10,4.082590e-12,1.930376e-05,5.607321e-11,3.363646e-11,...,1.510544e-03,1.599712e-11,7.061467e-10,1.524015e-11,6.905984e-10,4.046954e-10,1.184961e-07,1.259499e-08,1.139142e-05,6.802206e-06
4,014da249523b906a840f8c33ae055cf3,9.382571e-02,6.082369e-04,1.280285e-03,1.531790e-03,1.039593e-05,1.715896e-06,2.090742e-02,8.936057e-06,3.029696e-07,...,2.061730e-04,1.006693e-04,7.378057e-07,3.941484e-07,2.382845e-06,1.234421e-06,4.645200e-04,1.285987e-05,2.331098e-04,1.393350e-02
5,75f65f02a53a08b5a73b40502daa430a,7.240071e-05,1.915204e-04,1.164726e-06,3.245931e-07,9.201093e-06,7.692147e-05,4.021387e-05,2.992943e-06,1.269724e-05,...,3.222846e-05,4.987140e-05,4.788232e-08,1.219710e-05,2.014682e-08,7.911585e-04,4.192506e-05,4.773628e-06,7.179256e-05,1.261312e-04
6,86475c1242c3f2ec067ff34c451097d5,1.473526e-03,3.512332e-03,3.885350e-03,1.380166e-03,1.087776e-02,7.104918e-02,1.351275e-03,1.795571e-02,1.671267e-02,...,3.337035e-04,1.034930e-02,6.250041e-04,1.347321e-02,1.781624e-03,1.130291e-02,4.517661e-04,8.991951e-03,1.891704e-03,1.223760e-03
7,f4e1d93f8d4f389547d69a5cd468e49d,1.080414e-02,2.088539e-01,1.654784e-03,1.232810e-02,2.162709e-04,7.660028e-05,5.697223e-03,1.022874e-04,1.374158e-04,...,4.651830e-03,4.605992e-05,1.964280e-04,3.755822e-05,2.370949e-04,1.251087e-03,6.323427e-04,1.592353e-03,4.554429e-03,3.777773e-03
8,68c075d18cc938a89f3e583676e8eecb,3.580919e-06,4.579676e-04,3.050687e-06,3.774052e-05,5.275958e-06,1.246992e-06,7.012707e-05,6.696950e-06,1.264047e-06,...,2.036929e-03,1.896417e-05,3.799760e-08,2.186496e-07,4.263518e-07,1.146739e-05,7.175485e-03,9.698314e-05,8.844592e-05,3.929482e-05
9,d82c5115aab54301bb52f7678f0cb5c3,1.892641e-06,6.452147e-05,3.658382e-03,5.253759e-04,1.718730e-02,3.326878e-03,2.748911e-05,1.107662e-01,7.807823e-03,...,3.818682e-06,5.148476e-02,3.768772e-03,3.328995e-02,5.862521e-03,9.721291e-04,1.542427e-05,9.958988e-02,1.398301e-03,2.050410e-05


In [25]:
predictions_df.to_csv('/output/third_submission.csv',sep=",")

# 6. Observations

* Here we used Convolutional Neural Networks with ResNet50 and 3 additional layers to obtain the Multinomial Log loss and Accuracy.
            loss: 3.2285 - acc: 0.1970 - val_loss: 2.4160 - val_acc: 0.3798