# Preface 

This jupyter notebook contain the steps of how to solve for the Computer Vision problem of Grab-AI competition at https://www.aiforsea.com/computer-vision. The goal is to create an AI that is capable of automatically recognize the model and make of a car given the image. This section explains the approach taken to solve the problem, followed by each section of the steps filled with the codes and small snipets of the file.

To solve for this problem the approach used will be deep learning based model, which will be explained in detailed further below. Because of the usage of the model approach the feature engineering steps will be removed as the model will automatically learn it. The deep learning model is build using Keras framework with Tensorflow as it's backend. Several steps that are taken in order to fulfil the goal are as follow:

1. Data analysis  
First before doing anything we will do checking on the training and testing data. The distribution of the data will be analyzed to make sure than imbalance between the classes won't cause any problem with the model. Result of this step shows that for both the training and test data they have __very good distribution__ for each classes.  
Also in this section validation data will be generated. However with remark to the number of training and testing data (almost the same amount), the validation data will be generated from the testing data. After that to get sense of the training data, an image grid by the size of 28\*28 containing representation of the image are created using t-sne. 


2. Training Preparation  
Preparing the callbacks and data generator for the model.


3. Model Benchmark  
Creating base model benchmark that is not so complex and fast to train. InceptionV3 is used as the model benchmark. The model are able to get __63.625%__ accuracy with testing data.


4. Further Model  
Two more complex model will be created for final comparison, both of which are based on SeResNet50 [(paper)](https://arxiv.org/pdf/1709.01507.pdf). The first model is trained using _learning rate reducer and scheduler_, while the second model is trained with [*Stochastic Gradient Descent with Restarts*](https://arxiv.org/abs/1608.03983) ([link to code](https://gist.github.com/jeremyjordan/5a222e04bb78c242f5763ad40626c452)) and [*Snapshot Ensemble* (which require SGDR)](https://arxiv.org/abs/1704.00109) ([link to code](https://github.com/keras-team/keras-contrib/blob/master/keras_contrib/callbacks/snapshot.py)). Result of the SeResNet50 first model shows __71.316%__ accuracy with testing data, while the second model perform at __51.938%__ accuracy. It seems that the implementation of the Snapshot Ensemble is not fully correct, causing the training not to run as expected.

Directory of the data is as follow: (FILES)  
__Notebook.ipynb__ : This file  
__hollance_model.py__ : python file used in creating SeResNet50  
__tsne_data_prep.py__ : python file used for copying data file for tsne  
__tsne_grid.py__ : python file used for creating image grid based on tsne  
__SENet2_params.npy__ : numpy file containing the seresnet50 weight  


Directory of the data is as follow: (FOLDER)   
__data__  : Data folder. For train, test, and validation folder the image data is saved on different folder, each folder representing one class.  
 ---> car_ims : raw data of images  
 ---> car_devkit : devkit of car   
 ---> cars_train : train data folder  
 ---> cars_test : test data folder  
 ---> cars_vald : validation data folder   
 
__parser__ : Folder containing weight of SeResNet50 (in caffe) and it's code to parse them to Keras.  
 ---> seresnet_weight : SeResNet50 weight  
 ---> weight_parser_hollance_original : code to convert the weight and original code of hollance_model.py  
  
__weight__ : Folder containing inceptionv3 and seresnet50 model weights  
 ---> inceptionv3 : self explanatory   
 ---> seresnet50 : self explanatory    

__weights__ : Folder containing the ensemble model weights

__tsne_grid__ : result of tsne_grid.py

# General

In [None]:
import tensorflow as tf
from keras import backend as K
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
K.set_session(sess)
import keras
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import scipy.io as sio
import os
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, ReduceLROnPlateau
from keras.layers import Dense, Flatten
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras.models import Model

In [None]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
K.tensorflow_backend._get_available_gpus()

In [None]:
def create_folder(path):
    if not os.path.exists(path): 
        os.mkdir(path)

In [None]:
data_path = 'data/'
label_path = os.path.join(data_path, 'car_devkit/devkit')
train_path = os.path.join(data_path,'cars_train')
vald_path = os.path.join(data_path,'cars_vald')
test_path = os.path.join(data_path,'cars_test')
train_label_path = os.path.join(label_path, 'train_perfect_preds.txt')

---

# Data Analysis

For the data analysis 2 things will be done: 
1. Check for the number of data used in the whole pipeline process. 
2. Doing T-SNE to the images, the goal is to get sense of the data distribution. (If time is enough)

In [None]:
cars_annot = sio.loadmat(os.path.join(label_path, 'cars_annos.mat'))
class_names = cars_annot['class_names'][0]
df_annot = pd.DataFrame(cars_annot['annotations'][0])
df_annot.head()

Remove unused columns, then reformat the remaining data into necessary format for data generator.

In [None]:
df_annot = df_annot[['relative_im_path','class', 'test']]
df_annot = df_annot.applymap(lambda x: x[0])
df_annot.tail()

In [None]:
df_annot.tail()

## Dataset Distribution Check

In [None]:
temp_df=df_annot.copy()
temp_df['class'] = df_annot['class'].map(lambda x: x[0])
temp_df = temp_df.groupby(['class']).count()['test']
thresh = 1 / temp_df.size
thresh

Percentage check

In [None]:
temp_df.apply(lambda x: x*100 / temp_df.sum()).head()

In [None]:
temp_df.describe()

Print any class with data distribution smaller than 67% of threshold

In [None]:
print(0.67* thresh)
temp_df[temp_df < 0.67 * thresh]

Since there is none, it safe to assume that the class have relatively even distribution of data.

### Train to Test Ratio

In [None]:
temp_df=df_annot.copy()
temp_df['class'] = df_annot['class'].map(lambda x: x[0])
temp_df['test'] = df_annot['test'].map(lambda x: x[0])
temp_df = temp_df.groupby(['class','test']).size()
temp_df.head()

In [None]:
train_to_test = temp_df.values[::2]/temp_df.values[1::2]
train_to_test

## T-SNE of the images [OPTIONAL]

This is an optional step of the process. The goal is to do EDA of the image data using T-SNE, code taken from [here](https://github.com/prabodhhere/tsne-grid). This steps is done to give more sense on what is the data and how it should be handled. The grid image is created by taking 4 images from each classes of car model and make training data, then apply the t-sne algorithm to cluster them and find their best two principal components. Below are the image grid result.

To get the image please run two python file __tsne_data_prep.py__ (`python tsne_data_prep.py`) and __tsne_grid.py__ (`python tsne_grid.py --size 28 --dir data/tsne_data/`). Make sure you already have the training data folder.

<img src="tsne_grid/tsne_grid.jpg">

## Separate the data into Train, Validation, and Test set

Split part of the *test* data into test and validation set.  
Reason for splitting the test rathen the train dataset, is caused by the almost 50%-50% number of train and test size. It seems like a waste to only use so many data for testing.  
__Only run it once__, iff you haven't run it before.

In [None]:
df_train = df_annot[df_annot['test']==0][['relative_im_path', 'class']]
df_test = df_annot[df_annot['test']==1][['relative_im_path', 'class']]
print(df_train.shape)
print(df_test.shape)

Checked, same as the mentioned number from the dataset page.

###  Code to create the validation dataset. 


---

# Training Preparation

## Callbacks

In [None]:
def create_basic_callbacks(weight_folder_path):
    create_folder(weight_folder_path)
    filepath_acc = os.path.join(weight_folder_path, "weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5")
    filepath_loss = os.path.join(weight_folder_path, "weights-improvement-{epoch:02d}-{val_loss:.2f}.hdf5")
    checkpoint_acc = ModelCheckpoint(filepath_acc, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
    checkpoint_loss = ModelCheckpoint(filepath_loss, monitor='val_loss', verbose=1, save_best_only=True, mode='max')
    return [checkpoint_acc, checkpoint_loss]
    

## Data generator

Apply small value of transformation to the data to help account for variability

In [None]:
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.1,
        zoom_range=0.1,
        horizontal_flip=True)
vald_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        train_path,
        target_size=(160, 160),
        batch_size=32,
        class_mode='categorical')
validation_generator = vald_datagen.flow_from_directory(
        vald_path,
        target_size=(160, 160),
        batch_size=32,
        class_mode='categorical')
test_generator = test_datagen.flow_from_directory(
        test_path,
        target_size=(160, 160),
        batch_size=32,
        class_mode='categorical')

---

# Base Model

For the model baseline InceptionV3 will be used to get a minimum model performance. 

### InceptionV3

Define the model

In [None]:
from keras.applications.inception_v3 import InceptionV3

In [None]:
inceptionv3_model= InceptionV3(include_top=False, weights='imagenet', input_tensor=None, input_shape=(160,160,3), pooling='max')
output_l = Dense(196, activation='softmax', name='fc6')(inceptionv3_model.layers[-1].output)
inceptionv3_model = Model(inceptionv3_model.input, output_l)
inceptionv3_model.summary()

In [None]:
callbacks_list = snapshot.get_callbacks(model_prefix=model_prefix)

In [None]:
callbacks_list

In [None]:
inceptionv3_model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["acc"])

In [None]:
spe=200
train_generator.reset()
validation_generator.reset()
test_generator.reset()
history_callback = inceptionv3_model.fit_generator(
        train_generator,
        steps_per_epoch=spe,
        epochs=100,
        validation_data=validation_generator,
        validation_steps=50, 
        callbacks=create_basic_callbacks('weight/inceptionv3/'), verbose=1)
loss_history = history_callback.history["loss"]
np_loss_history = np.array(loss_history)
np.savetxt("model_inceptionv3_loss_history.txt", np_loss_history, delimiter=",")

## Check result on Testing data

In [None]:
y_test = inceptionv3_model.evaluate_generator(test_generator, 100)
print(inceptionv3_model.metrics_names)
y_test

---

# Complex Model Benchmark

For the complex model *Squeeze-and-Excitation Networks* based on resnet50 will be used. As this model focus on the relationship between the channel it should perform better at task such as detailed classification using transfer learning model. Link to [paper](https://arxiv.org/pdf/1709.01507.pdf).

## First Complex Model

Implement the model with learning rate reducer and scheduler.  
_Either do model training or load the best weight_

In [None]:
import importlib
from hollance_model import SEResNet50
# importlib.reload(SEResNet50)

In [None]:
seresnet_model = SEResNet50(weights=None, input_shape=(160, 160, 3), classes=1000)
seresnet_model.summary()

In [None]:
def lr_schedule(epoch):
    """Learning Rate Schedule

    Learning rate is scheduled to be reduced after 80, 120, 160, 180 epochs.
    Called automatically every epoch as part of callbacks during training.

    # Arguments
        epoch (int): The number of epochs

    # Returns
        lr (float32): learning rate
    """
    lr = 1e-3
    if epoch > 180:
        lr *= 0.5e-3
    elif epoch > 160:
        lr *= 1e-3
    elif epoch > 120:
        lr *= 1e-2
    elif epoch > 80:
        lr *= 1e-1
    print('Learning rate: ', lr)
    return lr

## Model Training

To the train the model, we will a pretrained model on image-net as the initial weight. Since the only available pre-trained weight is in the form of caffe weight, a parser to keras is needed. The parser is found from [here](https://gist.github.com/hollance/8d30bf5c1622036d16c4f27bd0ec88bf) and slightly modifidied to do the work.

### Load the network pre-trained weight 

In [None]:
seresnet_model = SEResNet50(weights=None, input_shape=(160, 160, 3), classes=1000)
# seresnet_model.summary()

params = np.load("SENet2_params.npy", allow_pickle=True)
for key in params[()].keys():
    layer_name = key.replace("/", "_")   
    print(key, "-->", layer_name)
    layer = seresnet_model.get_layer(layer_name)
    layer.set_weights(params[()][key])

pop the last layer

In [None]:
seresnet_model.layers.pop()
output_l = Dense(196, activation='softmax', name='fc6')(seresnet_model.layers[-1].output)
seresnet_model = Model(seresnet_model.input, output_l)
seresnet_model.summary()

### Fit the model

In [None]:
lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
                               cooldown=0,
                               patience=5,
                               min_lr=0.5e-6)

lr_scheduler = LearningRateScheduler(lr_schedule)
seresnet_model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=lr_schedule(0)),
              metrics=['accuracy'])
callbacks_list = create_basic_callbacks('weight/seresnet50_model/') + [lr_reducer,lr_scheduler]

In [None]:
seresnet_model.summary()

In [None]:
spe=100
history_callback = seresnet_model.fit_generator(
        train_generator,
        steps_per_epoch=spe,
        epochs=100,
        validation_data=validation_generator,
        validation_steps=10, 
        callbacks=callbacks_list, verbose=1)
loss_history = history_callback.history["loss"]
np_loss_history = np.array(loss_history)
np.savetxt("seresnet_model_loss_history.txt", np_loss_history, delimiter=",")

## Load best Weight

Load best trained model with file name 'weights-improvement-93-0.77.hdf5' 

In [None]:
seresnet_model.layers.pop()
output_l = Dense(196, activation='softmax', name='fc6')(seresnet_model.layers[-1].output)
seresnet_model = Model(seresnet_model.input, output_l)
seresnet_model.summary()

In [None]:
seresnet_model.load_weights('weight/seresnet50_model/weights-improvement-93-0.77.hdf5')
lr_scheduler = LearningRateScheduler(lr_schedule)
seresnet_model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=lr_schedule(0)),
              metrics=['accuracy'])

## Check result on Testing data

In [None]:
y_test = seresnet_model.evaluate_generator(test_generator, 100)
print(seresnet_model.metrics_names)
y_test

---

## Second Complex Model

Trying SeResNet50 with implementing the [*Stochastic Gradient Descent with Restarts*](https://arxiv.org/abs/1608.03983) ([link to code](https://gist.github.com/jeremyjordan/5a222e04bb78c242f5763ad40626c452)) and [*Snapshot Ensemble* (which require SGDR)](https://arxiv.org/abs/1704.00109) ([link to code](https://github.com/keras-team/keras-contrib/blob/master/keras_contrib/callbacks/snapshot.py)).

In [None]:
from keras_contrib.callbacks.snapshot import SnapshotCallbackBuilder

In [None]:
M = 10 # number of snapshots
nb_epoch = T = 200 # number of epochs
alpha_zero = 0.1 # initial learning rate
model_prefix = 'Model_'
snapshot = SnapshotCallbackBuilder(T, M, alpha_zero) 

In [None]:
callbacks_list = create_basic_callbacks('weight/seresnet50_snapshot/') + snapshot.get_callbacks(model_prefix=model_prefix)

### SeResnet50

In [None]:
train_generator.reset()
validation_generator.reset()
test_generator.reset()
history_callback = seresnet_model.fit_generator(
        train_generator,
        steps_per_epoch=int(8000/nb_epoch),
        epochs=nb_epoch,
        validation_data=validation_generator,
        validation_steps=50, 
        callbacks=snapshot.get_callbacks(model_prefix='Model_seresnet'), verbose=1)
loss_history = history_callback.history["loss"]
np_loss_history = np.array(loss_history)
np.savetxt("model_seresnet50_loss_history.txt", np_loss_history, delimiter=",")

## Check result on Testing data

In [None]:
y_test = seresnet_model.evaluate_generator(test_generator, 100)
print(seresnet_model.metrics_names)
y_test

Result per time of test: [3.102921153306961, 0.519375]  
Meaning bigger loss and lower accuracy