# Image Classification with **_Deep Learning_**

To build a model classifying clothes into male and female we will be using the same dataset of [male](https://course-resources.minerva.kgi.edu/uploaded_files/mke/nA93zn/male-clothing.zip) and [female](https://course-resources.minerva.kgi.edu/uploaded_files/mke/VL14ar/female-clothing.zip) clothing, this time using Support Vector machines and a pre-trained deep neural network. 

## Preparing the data

Since VGG16 (the pre-trained model) was built on 224x224 images, we would resize our images to that resolution. We can compress them further and use some form of dimensionality reduction like PCA for the SVMs as well, but it would be a more interesting task to use as much data as we have to achieve an absolute maximum performance these models can achieve.

The metric we can use to measure the models' performance is accuracy, since the dataset is well-balanced (because for an unbalanced dataset accuracy might be misleading), and it also treats true negatives equally as true positives (unlike f-1 score), and false positives equally as false negatives (unlike precision or recall), and we don't have any reason to prefer performance on male or female clothes over the other. Also, accuracy is pretty straightforward to interpret (while keeping in mind that in an equally split dataset a random classifier would achieve ~0.5).

In [1]:
import numpy as np
from keras.applications import vgg16
from keras.layers import Dense, Flatten, Dropout
from keras.models import Model
from keras_preprocessing.image import ImageDataGenerator
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from tensorflow.python.keras.wrappers.scikit_learn import KerasClassifier

In [2]:
# Directory containing the Man's Clothing and Woman's Clothing subfolders
fashion_dir = r"C:\Users\breedoon\Downloads\Fashion Dataset"
target_im_size = (224, 224)  # default vgg16 input size

In [3]:
img_gen = ImageDataGenerator(rescale=1 / 255).flow_from_directory(fashion_dir,
                                                                  target_size=target_im_size,
                                                                  class_mode='binary',
                                                                  batch_size=1)
x, y = tuple(zip(*[img_gen[i] for i in range(len(img_gen))]))  # cannot do just *img_gen because will run infinitely
x, y = np.array(x)[:, 0, ...], np.array(y)[:, 0]

x_flat = x.reshape(x.shape[0], -1)

Found 2512 images belonging to 2 classes.


In [5]:
x_train, x_test, x_flat_train, x_flat_test, y_train, y_test = train_test_split(x, x_flat, y, train_size=0.8)

## SVMs

We will train an SVM wit three different kernels: a linear one, a polynomial one, and an RBF one. For each of them we would cross-validate to find the best value of C (controlling the error margin), and then calculate the test accuracies.

In [7]:
kernels = [
    dict(kernel='linear'),
    dict(kernel='poly', degree=2),
    dict(kernel='rbf'),
]
for kernel_params in kernels:
    grid_params = dict(C=[0.001, 0.1, 1, 10, 1000], **({k: [v] for k, v in kernel_params.items()}))

    grid_search = GridSearchCV(SVC(), param_grid=grid_params)
    grid_search.fit(x_flat_train, y_train)
    best_params = grid_search.best_params_

    model_svm = SVC(**best_params, C=10)
    model_svm.fit(x_flat_train, y_train)

    y_pred = model_svm.predict(x_flat_test)
    print('Kernel:', kernel_params['kernel'])
    print('Test Accuracy:', accuracy_score(y_test, y_pred), '\n')

Kernel: linear
Test Accuracy: 0.6284095427435387 

Kernel: poly
Test Accuracy: 0.6821471172962226 

Kernel: rbf
Test Accuracy: 0.7218489065606361 



## Deep Learning Network

First, we would load the VGG16 model without the top layers and perform transfer learning by putting a single trainable layer with a sigmoid activation function (since we're doing binary classification) which would serve as the output of the model. Additionally, we would need to use a flatten layer to reshape the 224x224x3 input into 151,875-size output. Then, to prevent overfitting, we can put a dropout layer between the flatten and the sigmoid layers with varying dropout rates, which would serve as a regularization technique by occasionally dropping out some of the 151,875 connections while training, so that the model does not unreasonably learn to rely on only a few portion of them. We will use binary crossentropy as our loss function simply because we're doing binary classification, and there doesn't seem to be a better-suited loss function for that purpose.

In [8]:
model_raw = vgg16.VGG16(weights='imagenet', include_top=False, input_shape=(*target_im_size, 3))


In [9]:
def get_dnn_model(dropout_rate=0.0):
    flatten_layer = Flatten()
    dropout_layer = Dropout(rate=dropout_rate)
    output_layer = Dense(1, activation='sigmoid', name='clothes_output')

    raw_input = model_raw.input
    new_output = output_layer(dropout_layer(flatten_layer(model_raw.output)))

    model = Model(raw_input, new_output)

    # Freeze native VGG layers to not bother retraining them
    for layer in model.layers[:len(model_raw.layers)]:
        layer.trainable = False

    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

    return model


Then, we would perform cross-validation to find the optimal dropout rate and the number of epochs to train the model for.

In [10]:
grid_params = dict(dropout_rate=[0.1, 0.3, 0.5, 0.7, 0.9], epochs=[5, 10, 20])

model = KerasClassifier(build_fn=get_dnn_model)
grid_search = GridSearchCV(model, param_grid=grid_params, cv=3)
grid_search.fit(x_train, y_train)
best_params = grid_search.best_params_
print('Best params:', best_params)

Best params: {'dropout_rate': 0.5, 'epochs': 20}


In [11]:
model = get_dnn_model(dropout_rate=best_params['dropout_rate'])
model.fit(x_train, y_train, batch_size=1, epochs=best_params['epochs'], validation_split=0.1)

Epoch 1/10

In [12]:
print('Train Accuracy:', accuracy_score(y_train, model.predict(x_train).round()))
print('Test Accuracy:', accuracy_score(y_test, model.predict(x_test).round()))

Train Accuracy: 0.9925335988053758
Test Accuracy: 0.8500497017892644


## Summary

So, using an unreduced dataset with extensive cross-validation, the neural network was able to achieve an astounding 85% accuracy on the training data while the SVM lagged behind with only 72%, which, compared to liner regression's maximum 68% does not seem like a huge improvement. This all comes at a cost of about equal grid-search time for both models (~4 hours). The time could've been brought down by reducing the dimensionality of the data, but since the goal was to achieve an absolute maximum, this might have resulted in lower performance of both models.

Given that, the DNN seems to be the superior technique to classify images, which shouldn't come as a surprise given it was trained in the first place for that purpose, and also contains far more (sometimes deliberately placed) parameters, compared to a more general-purpose SVM which was trained on the spot. On the other hand, a liner regression took considerably less time to be trained and achieved a comparable level of accuracy as SVM (68% vs 72%), which might be the consequence of the data (a set of images) not being well suited for either of the two techniques, with only properly configured DNN managing to handle it well. 