# Hierarchical implementation for multi-label classifications

The idea is to take advantage of the underlying hierarchies/categories that a number of objects (n) belong to, and classify them accurately, rather than having a flat n number of classes. The goal is to see if the performance of the classifier improves in any significant way.

**Why introduce hierarchies?**

Say for example, I had to categorize between 4 different objects - cat, dog, house, table.
Cat and dog are related (they are living beings), whereas house and table are inanimate objects. 
In terms of how well my classifier performs when clasifying between these 4 objects - I would be more forgiving if my classifier mis-classifies a cat as a dog - atleast my classifier still understands that the cat is a living being! But I wouldn't like it if it misclassifies a table for a dog.

**Any suggestions and advice on how to go about implementing this will be greatly welcome**

### Here are some research papers I read for reference

* https://arxiv.org/abs/1410.0736
* https://arxiv.org/abs/1709.09890

### Other references from the internet
* https://keras.io/getting-started/functional-api-guide/#getting-started-with-the-keras-functional-api
* https://machinelearningmastery.com/keras-functional-api-deep-learning/
* https://github.com/ankonzoid/Google-QuickDraw/blob/master/QuickDraw_noisy_classifier.py
* https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/

### Architecture of HD-CNN for hierarchical classification looks like this:
![](./models/HDD-CNN_architecture.png)

### Images to be used for training are Numpy Bitmap files, taken from here: https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/numpy_bitmap

## Importing necessary libraries:

I'm using Keras for experimenting with and building our CNN models

In [2]:
import os, scipy.misc, pylab, random
import numpy as np
import matplotlib.pyplot as plt

import keras
from keras.layers import Input, UpSampling2D
from keras.models import Model
from keras.callbacks import TensorBoard
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, GlobalAveragePooling2D 
from keras.models import load_model  # save keras model
import pydot
from keras.utils import plot_model
from keras.layers.merge import concatenate


Using TensorFlow backend.


In [3]:
# Initialize some variables for training purposes

#directory where the image files are present. Files are of the format .npy
data_dir = "./data"

#mention file names
file_names = ["house", "table", "cat", "dog"]

#Mention higher level of classes, in order.
#0 for nonliving, 1 for living
coarse_classes = [0, 0, 1, 1] 

#Mention lower level of classes (finer classes)
# 0 for house, 1 for table, 2 for cat, 3 for dog
fine_classes = [0, 1, 2, 3]

n_epochs = 10
batch_size = 500

xpixels = 28  # set x pixel numbers for query/training/test examples
ypixels = 28  # set y pixel numbers for query/training/test examples
input_shape = (ypixels, xpixels, 1)  # our data format for the input layer of our NN

In [4]:
# converts image list to a normalized numpy array which will be used for training our CNN
def convert_img2norm(img_list, ypixels, xpixels):
    norm_list = img_list.copy()
    norm_list = norm_list.astype('float32') / 255
    norm_list = np.reshape(norm_list, (len(norm_list), ypixels, xpixels, 1))
    return norm_list


In [5]:
'''
Takes in  file names, coarse classes and fine classes as input and returns output

Input:
Give the function how many training & testing samples you need

Output:
x_train, x_test
y_train, y_test for coarse classes
y_train_fine, y_test_fine for finer classes
'''
def preprocess_data(data_dir, file_names, coarse_classes, fine_classes, n_training_samples, n_testing_samples):
    category_filenames = []
    
    for catname in file_names:
        filename = os.path.join(data_dir, catname + ".npy")
        category_filenames.append(filename)
        training_samples = []

    n_categories = len(list(set(coarse_classes))) # number of classes
    x_train = []
    y_train_coarse = []; y_train_fine = []
    
    x_test = []
    y_test_coarse = []; y_test_fine = []

    for i_filename, filename in enumerate(file_names):
        i_category_coarse = coarse_classes[i_filename] #respective coarse class
        i_category_fine = fine_classes[i_filename] #respective fine class
        
        #load the input files
        data = np.load(category_filenames[i_filename])

        n_data = len(data)
        print("[%d/%d] Reading filename index %d: '%s' under coarse category '%s' and fine category '%s' (%d images: take %d training samples, take %d testing samples)" %
              (i_filename+1, len(file_names), i_filename, filename, i_category_coarse, i_category_fine, n_data, n_training_samples, n_testing_samples))

        #Split into training and testing sets
        for j, data_j in enumerate(data):
            img = np.array(data_j).reshape((ypixels, xpixels))
            if j < n_training_samples:
                # append to training set
                x_train.append(img)
                y_train_coarse.append(i_category_coarse) 
                y_train_fine.append(i_category_fine)   
            elif j - n_training_samples < n_testing_samples:
                # append to test set
                x_test.append(img)
                y_test_coarse.append(i_category_coarse)
                y_test_fine.append(i_category_fine)
            else:
                break

    # convert to numpy arrays
    x_train = np.array(x_train) 
    y_train_coarse = np.array(y_train_coarse); y_train_fine = np.array(y_train_fine)
    
    x_test = np.array(x_test) 
    y_test_coarse = np.array(y_test_coarse); y_test_fine = np.array(y_test_fine)

    # Convert our greyscaled image data sets to have values [0,1] and reshape to form (n, ypixels, xpixels, 1)
    x_train = convert_img2norm(x_train, ypixels, xpixels)
    x_test = convert_img2norm(x_test, ypixels, xpixels)
    
    return x_train, y_train_coarse, y_train_fine, x_test, y_test_coarse, y_test_fine, n_categories

### Use the *preprocess_data* function to divide the data into training and testing samples for both coarse and fine categories:

In [6]:
x_train, y_train_coarse, y_train_fine, x_test, y_test_coarse, y_test_fine, n_categories = preprocess_data(data_dir, file_names, coarse_classes, fine_classes, 50000, 10000)

[1/4] Reading filename index 0: 'house' under coarse category '0' and fine category '0' (135420 images: take 50000 training samples, take 10000 testing samples)
[2/4] Reading filename index 1: 'table' under coarse category '0' and fine category '1' (128021 images: take 50000 training samples, take 10000 testing samples)
[3/4] Reading filename index 2: 'cat' under coarse category '1' and fine category '2' (123202 images: take 50000 training samples, take 10000 testing samples)
[4/4] Reading filename index 3: 'dog' under coarse category '1' and fine category '3' (152159 images: take 50000 training samples, take 10000 testing samples)


In [9]:
#Just returns the architecture of the simpler working model for comparison purposes
def simple_sequential_model(model_path):
    # Build our CNN mode layer-by-layer
    cnn = Sequential()
    cnn.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
    cnn.add(Conv2D(64, (3, 3), activation='relu'))
    cnn.add(MaxPooling2D(pool_size=(2, 2)))
    cnn.add(Dropout(0.25))
    cnn.add(Flatten())
    cnn.add(Dense(128, activation='relu'))
    cnn.add(Dropout(0.5))
    cnn.add(Dense(n_categories, activation='softmax')) 

    cnn.summary()

    # Set our optimizer and loss function (similar settings to our CAE approach)
    cnn.compile(loss = keras.losses.sparse_categorical_crossentropy,
                optimizer = keras.optimizers.Adadelta(),
                metrics = ['accuracy'])

    plot_model(cnn, to_file = model_path+'.png')
    return cnn
    

### Trying out a simple hierarchical architecture first that can perform multi label classification:
Please have a look at *working_model.png*. Performing training and testing on this one

In [10]:
# Trying out an architecture that can perform multi-label classification
def build_cnn(model_path):
    #first build a shared layer
    input_1 = Input(shape=input_shape)
    
    #build coarse component layer
    conv_1 = Conv2D(32, kernel_size=(3, 3), activation='relu')(input_1)
    conv_2 = Conv2D(64, kernel_size = (3,3), activation="relu")(conv_1)
    pool_1 = MaxPooling2D(pool_size=(2,2))(conv_2)
    
    flatten_1 = Flatten()(pool_1)
    #coarse output prediction
    output_coarse = Dense(2, activation="softmax")(flatten_1)
    
    #fine feature component layer
    dropout_1 = Dropout(0.25)(pool_1)
    conv_3 = Conv2D(64, kernel_size = (3,3), activation="relu")(dropout_1)
    flatten = Flatten()(conv_3)
    dense_1 = Dense(128, activation="relu")(flatten)
    dropout_2 = Dropout(0.5)(dense_1)
    
    #this will give us the fine category predictions
    output_fine = Dense(4, activation="softmax")(dropout_2)
    
    model = Model(inputs=input_1, outputs=[output_coarse, output_fine])
    
    model.summary()
    
    # Set our optimizer and loss function
    model.compile(loss = keras.losses.sparse_categorical_crossentropy,
                optimizer = keras.optimizers.Adadelta(),
                metrics = ['accuracy'])   
    
    plot_model(model, to_file=model_path+'.png')
    return model

In [11]:
mymodel = build_cnn("./models/working_model_1")

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 28, 28, 1)    0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 26, 26, 32)   320         input_1[0][0]                    
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 24, 24, 64)   18496       conv2d_1[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 12, 12, 64)   0           conv2d_2[0][0]                   
__________________________________________________________________________________________________
dropout_1 

### Current working model architecture - simple hierarchical one (*/models/working_model_1.png*): 
![](./models/working_model_1.png)

In [17]:
# Function for training and saving our model
def train_validate_save(model, model_path, x_train, y_train_coarse, y_train_fine, x_test, y_test_coarse, y_test_fine): 
    model.fit(x_train, [y_train_coarse, y_train_fine],
            batch_size = batch_size,
            epochs = n_epochs,
            verbose = 1,
            validation_data = (x_test, [y_test_coarse, y_test_fine]))

    # cnn trained CNN model
    model.save(model_path)  # creates a HDF5 file

    # Evaluate our model test loss/accuracy
    score = model.evaluate(x_test, [y_test_coarse, y_test_fine], verbose=1)
    print("CNN Classification test performance:")
    print(score)
    return model

In [18]:
classifier_path = './models/working_model_1.h5'

if os.path.isfile(classifier_path):
    classifier = load_model(classifier_path)  # load saved model
    classifier.summary() 
else:
    # Build our CNN layer-by-layer
    classifier = train_validate_save(mymodel, classifier_path, x_train, y_train_coarse, y_train_fine, x_test, y_test_coarse, y_test_fine)

Train on 200000 samples, validate on 40000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CNN Classification test performance:
[0.213365438794301, 0.04591087463991426, 0.16745456401039277, 0.9852, 0.94095]


In [19]:
# Evaluate our model test loss/accuracy
score = classifier.evaluate(x_test, [y_test_coarse, y_test_fine], verbose=1)
print(score)

[0.213365438794301, 0.04591087463991426, 0.16745456401039277, 0.9852, 0.94095]


In [15]:
#Function to test the model out. 
def test_model(model, img_path):
    from keras.preprocessing.image import array_to_img, img_to_array, load_img

    # loads RGB image as PIL.Image.Image type and converts into greyscale
    img = img_to_array(load_img(img_path, grayscale=True, target_size=(28, 28)))
    
    # convert PIL.Image.Image type to 3D tensor with shape (28, 28, 1)
    img = np.array(img)
    
    #print(img.shape) #print shape of image
    # convert 3D tensor to 4D tensor with shape (1, 28, 28, 1) and return 4D tensor
    img = np.expand_dims(img, axis=0)
    prediction = model.predict(img)
    return prediction

# Observations:

* Trained the classifier on 40,000 samples from each category and validated against 10,000 from each category (total of 1,60,000 training samples and 40,000 testing samples). 
* After 10 epochs, the validation score appears to be 0.9829 for the coarse categories (living, non-living) and 0.9345 for the finer categories (house, table, cat, dog)

## Experimental model, which I want to implement like in the paper https://arxiv.org/abs/1410.0736:
Need some more data preprocessing before I can test it.

Please have a look at the architecture image titled *experimental_model.png*

In [21]:
#experimental model architecture
def build_cnn_experimental(model_path):
    #first build a shared layer
    input_1 = Input(shape=input_shape)
    conv_1 = Conv2D(32, kernel_size=(3, 3), activation='relu')(input_1)
    pool_1 = MaxPooling2D(pool_size=(1, 1))(conv_1)
    dense_1 = Dense(2, activation="softmax")(pool_1)
    
    #build coarse component layer
    conv_2 = Conv2D(32, kernel_size = (1,1), activation="relu")(dense_1)
    pool_2 = MaxPooling2D(pool_size=(1,1))(conv_2)
    
    #coarse output prediction
    output_coarse = Dense(2, activation="softmax")(pool_2)
    
    #fine features
    merge = concatenate([dense_1, output_coarse])
    
    #fine feature 1
    conv_3 = Conv2D(32, kernel_size=(1,1), activation="relu")(merge)
    pool_3 = MaxPooling2D(pool_size=(2, 2))(conv_3)
    flat_1 = Flatten()(pool_3)
    output_fine_1 = Dense(4, activation='softmax')(flat_1)
    
    conv_4 = Conv2D(32, kernel_size=(1,1), activation="relu")(merge)
    pool_4 = MaxPooling2D(pool_size=(2, 2))(conv_4)
    flat_2 = Flatten()(pool_4)
    output_fine_2 = Dense(4, activation='softmax')(flat_2)
    
    model = Model(inputs=input_1, outputs=[output_coarse, output_fine_1, output_fine_2])
    
    model.summary()
    
    # Set our optimizer and loss function
    model.compile(loss = keras.losses.sparse_categorical_crossentropy,
                optimizer = keras.optimizers.Adadelta(),
                metrics = ['accuracy'])   
    
    plot_model(model, to_file=model_path+'.png')
    return model

In [18]:
experimental_model = build_cnn_experimental("./models/experimental_model")

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_5 (InputLayer)            (None, 28, 28, 1)    0                                            
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 26, 26, 32)   320         input_5[0][0]                    
__________________________________________________________________________________________________
max_pooling2d_11 (MaxPooling2D) (None, 26, 26, 32)   0           conv2d_13[0][0]                  
__________________________________________________________________________________________________
dense_15 (Dense)                (None, 26, 26, 2)    66          max_pooling2d_11[0][0]           
__________________________________________________________________________________________________
conv2d_14 

## Architecture of the experimental model looks like this (*/models/experimental_model.png*):
![](./models/experimental_model.png)

# The training approach:

* First, train the coarse category component - a regular CNN with few convolutional layers
* This will also be used as a shared layer for initializing the fine category components
* This component will also give an output for the coarse, higher level categories (non-living or living)

Next, for training upon the fine categories:
* Each of the fine category components are separate CNNs
* For both the components, we will initialize the initial rear layers by copying the weights from the shared layer of the coarse component. The weights are kept fixed/frozen for the initial layers
* Once initialized, we will train each component by only using the images from their respective coarse categories.

  For fine category component 1, we will train it using only examples of non-living objects, example (house, table)
  
  For fine category component 2, we will train it using only examples of living objects, example (cat, dog)
  
  Each fine tune component will spit out the second set of outputs for finer categories
* Once all the parameters are trained, we will fine tune the entire HD-CNN