<img src="https://github.com/dc-aihub/dc-aihub.github.io/blob/master/img/ai-logo-transparent-banner.png?raw=true" 
alt="Ai/Hub Logo"/>

<h1 style="text-align:center;color:#0B8261;"><center>Artificial Intelligence</center></h1>
<h1 style="text-align:center;"><center>Part 1 - Planes Trains & Automobiles: </center></h1>
<h1 style="text-align:center;"><center>Multiple Classification Exercise with Multi-Binarized-Labels</center></h1>

<center>***Original Tutorial by Adrian Rosebrock:*** <br/>https://www.pyimagesearch.com/2018/05/07/multi-label-classification-with-keras/</center>

<hr/>

<center><a href="#OVERVIEW">Overview</a></center>
<center><a href="#BEFORE-YOU-BEGIN">Before You Begin</a></center>
<center><a href="#BUILD-THE-MODEL">Build the Model</a></center>
<center><a href="#IMAGE-PREPROCESSING">Image Pre-Processing</a></center>
<center><a href="#LABEL-BINARIZATION">Label Binarization</a></center>
<center><a href="#TRAIN-THE-MODEL">Train the Model</a></center>
<center><a href="#ACCURACY-STATISTICS">Accuracy Statistics</a></center>
<center><a href="#IMPLEMENTATION">Implementation</a></center>
<center><a href="#CONCLUSION">Conclusion</a></center>

<hr/>

<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="OVERVIEW">
OVERVIEW
</div>

<center style="color:#0B8261;">
This exercise will aim to answer the question of how to generate multiple output classifications given a single input image using Convolutional Neural Networks. Two distinct solutions to this problem will be tested in order to compare their performance and analyze their merits. The first example will use a single model that will produce multiple outputs; essentially we will be two-hot encoding our training input and two-hot-decoding its output to identify multiple classes. The second model will be built with multiple branches that will handle the classification of each category independently with their own respective loss functions and output.
</center>
<br/>
<center style="color:#0B8261;">
This exercise closely follows the tutorial series written by Adrian Rosebrock however we have substituted our own dataset that contains images of various vehicle types and colours. The models we will build will be tasked with learning, and then predicting, both the type and colour of an input image. The classes that have been chosen are shown in the directory tree below:
    
![](images/data-tree.png)

</center>

It is important to note that not all possible combinations of classes are present in our dataset. For example we will want our models to identify the colour blue and also identify a plane, but we have not shown it a 'blue train' specifically in our training data. This will become important later as our main test to evaluate the performance of the model will revolve around seeing if they can correctly classify an object it has not explicitly seen.  

The images were collected using Bing's Search API v7.0 that can be accessed using a Microsoft Azure account or by temporary free trail. More information on how to scrape images for Bing Search can be found [here](https://www.pyimagesearch.com/2018/04/09/how-to-quickly-build-a-deep-learning-image-dataset/).

<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="BEFORE-YOU-BEGIN">
BEFORE YOU BEGIN
</div>

<center style="color:#0B8261;">
Ensure that you have the proper python packages installed. You will need the following:
</center>
<br/>
<center>
Keras will help us build our neural network.<br/>
    
[Keras](https://anaconda.org/conda-forge/keras)<br/>

Numpy will allow us to convert our training input to two-hot-encoded arrays.<br/>

[Numpy](https://anaconda.org/anaconda/numpy)<br/>

SciKitLearn will binirize our labels specifically for multi-label output.<br/>

[SciKitLearn](https://anaconda.org/anaconda/scikit-learn)<br/>

Tools in OpenCV will help us pre-process our images for input to the model.<br/>

[OpenCV](https://anaconda.org/conda-forge/opencv)<br/>

By using argparse we will be able interpret the output of our model.<br/>

[Argparse](https://anaconda.org/anaconda/argparse)<br/>

Resources from Matplotlib will help us visualize the performance of our model over the various training epochs.<br/>

[Matplotlib](https://anaconda.org/conda-forge/matplotlib)<br/>

The Imutils library will help us overlay our predictions in text onto the image that was tested.<br/>

[Imutils](https://anaconda.org/mlgill/imutils)<br/>
</center>

<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="BUILD-THE-MODEL">
BUILD THE MODEL
</div>

The model architecture of our convolutional neural network is based off one called “SmallerVGGNet” a lighter version of VGGNet first introduced in a 2014 paper by Simonyan and Zisserman. You can learn more about VGGNet and its inner workings by reading this post also written by Adrian Roseback and can be found [here](https://www.pyimagesearch.com/2018/04/16/keras-and-convolutional-neural-networks-cnns/).

Our first approach will train a dedicated model on a single array that will represent each of our 8 different categorical possibilities of both types.  For example if our input array corresponded to [automobile, black, blue, plane, red, train, white, yellow] we can use Two-hot-encoding to denote a white automobile as [1,0,0,0,0,0,1,0].

Once trained, the final layer of our model will output a probability matrix for the 8 classes with a single array. We can then take the two highest values of that array (Two-Hot-Decoding) in order to determine the type and colour of the pictured vehicle.


In [2]:
class SmallerVGGNet:
	@staticmethod
	def build(width, height, depth, classes, finalAct="softmax"):
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1

		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1
		model.add(Conv2D(32, (3, 3), padding="same",
			input_shape=inputShape))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(3, 3)))
		model.add(Dropout(0.25))
        
		model.add(Conv2D(64, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(64, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))
 
		model.add(Conv2D(128, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(128, (3, 3), padding="same"))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		model.add(Dropout(0.25))
        
		model.add(Flatten())
		model.add(Dense(1024))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		model.add(Dense(classes))
		model.add(Activation(finalAct))
 
		return model

We now have a method that can instantiate a model with 5 parameters; width, height, depth, classes, and final activation. Width and Height will be the dimensions in pixels of the input image. Depth represents the number of channels, which will be 3 handle our RBG input image.

Classes will be the number of categories we have in our dataset. In this case it will be 8 classes for our types of vehicles and colour possibilities combined.

Although the final activation is by default set to "softmax" if we change its value to "sigmoid" on instantiation it will enable Keras to perform multi-label classification for our output.

The model itself consists of multiple layers following a similar pattern. 2D convolution layers with ‘relu’ activation that doubles its filter size on each instantiation to allow for increased abstraction. Dropout is used to randomly disconnect nodes leading to the next layer that will work to prevent over-fitting our model.  

We will start by importing necessary packages; we will be using the [Keras API](https://keras.io/) to build and train our model.

We will continue by pre-processing image data and binarizing our labels/categories. We start by importing the necessary packages. We will also set matplotlib backend so that our figures are saved in the background.

In [3]:
import matplotlib
matplotlib.use("Agg")
 
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras.preprocessing.image import img_to_array
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import argparse
import random
import pickle
import cv2
import os

Below we will initialize important variables to train the model. Tweak these variables to modify its performance and how long it will train for. 20 Epochs should be enough for this example.

In [4]:
EPOCHS = 20
INIT_LR = 1e-3
BATCH_SIZE = 30
IMAGE_DIMENSIONS = (96, 96, 3)

Batch Size refer to the number of training examples that will be used in one iteration. Image dimensions will match the input shape defined in the model.

We will also initialize some constants to hold the folder and file paths in our working directory. The model and binarized class labels will be stored in the relative output folder.

In [5]:
INPUT_DATA_FOLDER = "data/"
OUTPUT_FOLDER = "multi_label_output/"
MODEL_FILE = "vehicle_classification.model"
LABELS_FILE = "labels.pickle"

Below we initialize some variables to handle image processing.

In [6]:
data = []
labels = []
imagePaths = []

<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="IMAGE-PREPROCESSING">
IMAGE PRE-PROCESSING
</div>

Next is to cycle through all image file paths within our input data folder and add them to a list. The list of file paths is then shuffled randomly.

In [7]:
for dir_, _, files in os.walk(INPUT_DATA_FOLDER):
    for fileName in files:
        relDir = os.path.relpath(dir_, INPUT_DATA_FOLDER)
        relFile = os.path.join(relDir, fileName)
        if fileName is not None:
            imagePaths.append(relFile)
           

random.seed(43)
random.shuffle(imagePaths)

For every file path in our list we will load the image, pre-process it for input, and add it to the data[] array. 

The classes will be determined by the folder path of the given image. By splitting the folder name we can extract its correct classification and add it to the labels array.

We will end up with two arrays where the labels at labels[i] will correctly identify the image stored in data[i]

In [8]:
for imagePath in imagePaths:
        imagePath = INPUT_DATA_FOLDER + imagePath         
        image = cv2.imread(imagePath)
        
        if  image is not None:        
            image = cv2.resize(image, (IMAGE_DIMENSIONS[1], IMAGE_DIMENSIONS[0]))
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            image = img_to_array(image)
            data.append(image)
         
            l = imagePath.split(os.path.sep)[-2].split("_")
            labels.append(l)    

The image is pre-processed above by resizing it to the dimensions necessitated as input by out model. We will also convert the image to an array using a Scitkit-learn method.

We can now print an element in both our arrays to confirm that the labels are being saved accordingly.

In [9]:
print(INPUT_DATA_FOLDER + imagePaths[1])
print(labels[1])

data/white_plane\00000380.jpg
['data/white', 'plane']


<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="LABEL-BINARIZATION">
LABEL BINARIZATION
</div>

Using numpy we will convert the labels to an array and then binarize them with Scikit-learn. This is a crucial step in this tutorial as in order to perform a multiple classification problem we must use Scikit-learn's MultiLabelBinarizer which will transform our human readable labels into a vector that two hot encodes the classes in the image.

In [10]:
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels) 

print("[INFO] class labels:")
multiLabelBinarizer = MultiLabelBinarizer()
labels = multiLabelBinarizer.fit_transform(labels)

[INFO] class labels:


We can print the array of our classes and also how many images our model will be trained on in total.

In [11]:
for (i, label) in enumerate(multiLabelBinarizer.classes_):
	print("{}. {}".format(i + 1, label))

print("[INFO] data matrix: {} images ({:.2f}MB)".format(
	len(imagePaths), data.nbytes / (1024 * 1000.0)))

1. automobile
2. data/black
3. data/blue
4. data/red
5. data/white
6. data/yellow
7. plane
8. train
[INFO] data matrix: 2752 images (594.43MB)


<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="TRAIN-THE-MODEL">
TRAIN THE MODEL
</div>

Next is to split the data into training and testing segments. In the example 80% of the data will be used for training as denoted by the test_size parameter.

We will also initialize a data augmenter object that is recommended practice for datasets with under 100 images per class.

In [12]:
(trainX, testX, trainY, testY) = train_test_split(data,
	labels, test_size=0.2, random_state=42)
 
# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=25, width_shift_range=0.1,
	height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
	horizontal_flip=True, fill_mode="nearest")

We can now build the model and initialize the optimizer. The model is constructed with the parameters that were discussed earlier and an image of size 96 x 96 pixels, 3 colour channels to account for RGB, the numbers of classes in our dataset, and be sure to set the final activation function to "sigmoid" for our multiple-classification example.

In [13]:
model = SmallerVGGNet.build(
	width=IMAGE_DIMENSIONS[1], height=IMAGE_DIMENSIONS[0],
	depth=IMAGE_DIMENSIONS[2], classes=len(multiLabelBinarizer.classes_),
	finalAct="sigmoid")
 
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)

Next is to compile the model and start it's training using the fit method. For this example we will use binary cross-entropy in order to treat each output label as an independent Bernoulli distribution. A progress bar will indicate the passage of each epoch.

In [14]:
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])

H = model.fit_generator(
	aug.flow(trainX, trainY, batch_size=BATCH_SIZE),
	validation_data=(testX, testY),
	steps_per_epoch=len(trainX) // BATCH_SIZE,
	epochs=EPOCHS, verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


We can then save the model to our output folder as well as a pickle file to hold the binarized labels.

In [15]:
model.save(OUTPUT_FOLDER + MODEL_FILE)
 
f = open(OUTPUT_FOLDER + LABELS_FILE, "wb")
f.write(pickle.dumps(multiLabelBinarizer))
f.close()

<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="ACCURACY-STATISTICS">
ACCURACY STATISTICS
</div>

In order to visualize the training accuracy of our model over each epoch we can use matplotlib plot a graph and save it to the output folder.

In [16]:
plt.style.use("ggplot")
plt.figure()
N = EPOCHS
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="upper left")

plt.tight_layout()
plt.savefig(OUTPUT_FOLDER + "{}_accs.png".format("output"))
plt.close()

<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="IMPLEMENTATION">
IMPLEMENTATION
</div>

Once our model has been trained we can now feed it new images for classification and analyze its performance.

We will start by importing the necessary packages. 

In [17]:
import tensorflow as tf
from keras.models import Model
from keras.preprocessing.image import img_to_array
from keras.models import load_model
import numpy as np
import argparse
import pickle
import cv2
import os
import random
import imutils

Next we initialize constants to handle file and folder names in our working directory. For each image we classify with the model we will overlay the prediction onto the image and then save it to the appropriate folder depending if it has been correctly classified or not.

In [18]:
INPUT_DATA_FOLDER = "unseen_class_combinations/"
RESULTS_FOLDERS = ["correct_predictions/", "incorrect_predictions/"]

Below will reset variables that were used previously for training that will now be used to feed test images to our model.

In [19]:
data = []
labels = []
imagePaths = []

Similar to when we trained our model we will loop over every file within the given directory and add its file path to an array and shuffle it.

In [20]:
for dir_, _, files in os.walk(INPUT_DATA_FOLDER):
    for fileName in files:
        relDir = os.path.relpath(dir_, INPUT_DATA_FOLDER)
        relFile = os.path.join(relDir, fileName)
        if fileName is not None:
            imagePaths.append(relFile)
           
        
random.seed(43)
random.shuffle(imagePaths)

For every image stored in our file-paths array will pre-process the image and keep track of its true classification.

In [21]:
for imagePath in imagePaths:
        imagePath = INPUT_DATA_FOLDER + imagePath         
        image = cv2.imread(imagePath)
        
        if  image is not None: 
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            image = cv2.resize(image, (IMAGE_DIMENSIONS[1], IMAGE_DIMENSIONS[0]))
            image = img_to_array(image)
            image = np.expand_dims(image, axis=0)
            data.append(image)
            labels.append(imagePath) 

Next is to load the model in addition to the binarized labels that were stored in our output folder.

In [22]:
model = load_model(OUTPUT_FOLDER + MODEL_FILE)    
mlb = pickle.loads(open(OUTPUT_FOLDER + LABELS_FILE, "rb").read())

For the sake neatness we will clear all files that are currently stored in the correct and incorrect predictions folders.

In [23]:
for i in RESULTS_FOLDERS:
	for the_file in os.listdir(OUTPUT_FOLDER + i):
	    file_path = os.path.join(OUTPUT_FOLDER + i, the_file)
	    try:
	        if os.path.isfile(file_path):
	            os.unlink(file_path)
	    except Exception as e:
	        print(e)

Below we will cycle through all the images we want to classify in our chosen input folder and feed them to the model using its Predict() method. 

Subsequently we will overlay the prediction of the model onto the image and save it to the appropriate folder if it was successfully classified.

In [24]:
counter = 0
for (images, lab) in zip(data, labels):
    proba = model.predict(images)[0]
    idxs = np.argsort(proba)[::-1][:2]
 
    categoryLabel = mlb.classes_[idxs[0]]
    colorLabel = mlb.classes_[idxs[1]]

    image = cv2.imread(lab)
    
    output = imutils.resize(image, width=1000)

    cv2.putText(output, categoryLabel, (10, 25), cv2.FONT_HERSHEY_SIMPLEX,
        0.7, (0, 255, 0), 2)
    cv2.putText(output, colorLabel, (10, 55), cv2.FONT_HERSHEY_SIMPLEX,
        0.7, (0, 255, 0), 2)

    (actual_colour, actual_category) = lab.split(os.path.sep)[-2].split("_")

    if actual_category == categoryLabel:
        if actual_colour == colorLabel:
            cv2.imwrite(OUTPUT_FOLDER + RESULTS_FOLDERS[0] + str(counter) + '.jpg', output)
        elif actual_colour != colorLabel:
            cv2.imwrite(OUTPUT_FOLDER + RESULTS_FOLDERS[1]+ str(counter) + '.jpg', output)
    elif actual_category != categoryLabel:
        cv2.imwrite(OUTPUT_FOLDER + RESULTS_FOLDERS[1]+ str(counter) + '.jpg', output)
    
    counter += 1

ValueError: too many values to unpack (expected 2)

Lastly we count the number of images that were correctly identified by examining the images saved in its folder. A figure for the overall accuracy on our test set of images can also be calculated.

In [None]:
total_files = len(data)

list = os.listdir(OUTPUT_FOLDER + RESULTS_FOLDERS[0]) 
number_files = len(list)

unseen_accuracy =  number_files/total_files * 100

print ('The prediction accuracy for unseen combinations: %' + str(unseen_accuracy))

<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="CONCLUSION">
CONCLUSION
</div>

Remember that the images we used to implement this model specifically contained objects whose class combinations had not been seen during training. This could explain why the model performs so poorly on our test set of images compared to the validation accuracy we saw during training. The take away for this exercise is to understand a fundamental flaw to the multi-binarized-label approach to the multiple output problem; that is that the trained model cannot make predictions for each class independently and therefore needs to be trained on all possible combinations of classes that you would desire to see as possible predictions. 

A better approach may be to create a model with multiple branches with independent layer architecture for each class; both for colour and vehicle type. We will test this approach in the next exercise.