# Image Augmentation and Deeper Convolution

Using 2 simple layers of convolution I achieved ~55% accuracy on the test set. To try and improve this I've done two things. First I implemented an image augmentation step. This essentially injects noise into each image sent through the model by randomly rotating or translating the image. This makes the model more robust and less succeptible to overfitting. In addition, I added in a concatenated layer of multiple filter sizes, as seen in architectures like GoogLeNet. The idea here is that we don't know which size filter is best, and maybe at a given layer, combining multiple kernel sizes is useful.

In [8]:
from keras.utils.np_utils import to_categorical
from keras import Sequential, Model, Input
from keras.layers import Dense, Dropout, Conv2D, MaxPool2D, Flatten, concatenate
from keras import backend as K
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import ParameterGrid
import keras.backend as K
import pandas as pd
import numpy as np
import pickle

In [2]:
traindata = pd.DataFrame(pickle.load(open(r'D:\Facial_Epression_Web_App\src\models\data\TrainData.p','rb')))
traindata = traindata.sample(n = len(traindata))
testdata = pd.DataFrame(pickle.load(open(r'D:\Facial_Epression_Web_App\src\models\data\TestData.p','rb')))

In [3]:
Xtrain, ytrain = traindata.iloc[:,0:2304].as_matrix(), traindata.iloc[:,2304]
Xtest, ytest = testdata.iloc[:,0:2304].as_matrix(), testdata.iloc[:,2304]

In [4]:
Xtrain, Xtest = Xtrain.reshape(len(Xtrain),48,48,1) , Xtest.reshape(len(Xtest),48,48,1)

In [5]:
ytrain = to_categorical(ytrain)
ytest = to_categorical(ytest)

The model needs to be built a little differently when concatenating layers. Instead of using the Sequential() object and adding layers, each layer is defined as a variable, given an input layer, and then the final model is compiled.

In [7]:
def create_model():
    input_shape = (48, 48, 1)
    kernel_sizes = [(2, 2), (3, 3), (4, 4), (5, 5)]
    convs = []
    inp = Input(shape=input_shape)

    for k in range(len(kernel_sizes)):#this code creates a stack of convoltions
        conv = Conv2D(16, kernel_sizes[k], padding='same',
                      activation='relu')(inp)
        convs.append(conv)

    concatenated = concatenate(convs, axis=1) #we concatenate them here
    concatenated = MaxPool2D((2, 2), strides=(2, 2))(concatenated)
    concatenated = Conv2D(64, (3, 3), activation='relu', padding='same')(concatenated)
    concatenated = MaxPool2D((2, 2), strides=(2, 2))(concatenated)
    concatenated = Conv2D(64, (3, 3), activation='relu', padding='same')(concatenated)
    concatenated = MaxPool2D((2, 2), strides=(2, 2))(concatenated)

    flat = Flatten()(concatenated)
    d1 = Dense(1000, activation='relu')(flat)
    d1 = Dropout(.3)(d1)
    d2 = Dense(1000, activation='relu')(d1)
    d2 = Dropout(.3)(d2)
    d3 = Dense(500, activation='relu')(d2)
    d3 = Dropout(.3)(d3)
    out = Dense(7, activation='softmax')(d3)

    model = Model(inp, out)
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
    return(model)

Now I'll make the data generator. We specify some parameters that are fairly self explanatory. This makes the images all a little different to make the model hopefully more generalizable. 

In [9]:
datagen = ImageDataGenerator(
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

In [10]:
mod = create_model()

In [29]:
#fit command is now fit_generator. datagen is input as an argument. 
hist = mod.fit_generator(datagen.flow(
    Xtrain, ytrain, batch_size=1000),
                         steps_per_epoch=len(Xtrain) / 1000,
                         epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [30]:
preds = np.argmax(mod.predict(Xtest),axis = 1)
print(classification_report(y_true = np.argmax(ytest,axis=1), y_pred = preds))
print('\n')
print('accuracy', np.round(np.mean(preds ==  np.argmax(ytest,axis=1)),2))
print('\n')
print('confusion matrix')
print(confusion_matrix(y_true =  np.argmax(ytest,axis=1), y_pred = preds))

             precision    recall  f1-score   support

          0       0.56      0.49      0.53       958
          1       0.63      0.70      0.67       111
          2       0.50      0.26      0.35      1024
          3       0.79      0.84      0.82      1774
          4       0.49      0.45      0.47      1247
          5       0.69      0.78      0.73       831
          6       0.50      0.69      0.58      1233

avg / total       0.60      0.61      0.60      7178



accuracy 0.61


confusion matrix
[[ 474   15   63   69  126   37  174]
 [  15   78    4    1    5    2    6]
 [ 121   13  271   60  237  151  171]
 [  47    1   28 1498   44   43  113]
 [ 112   11   89   98  563   26  348]
 [  21    3   51   52   19  652   33]
 [  49    2   41  111  149   35  846]]


And here we can see some more improvements in accuracy. Furthermore we can see that the model overfits significantly less -- the training accuracy is very similar to the testing accuracy, so we can see augmentation was helpful to prevent overfitting. Of course I tested two things at once (augmentation and deeper architecture) so I can't say which is responsible for the accuracy gains here. 

This concludes the model building for this project at least for now. I could fiddle around with the architechture a bit, and possibly try ensembling several models to get an increase in accuracy. But I'd like to learn a bit about deploying machine learning models and web development, so I'm going to take the model here and use it to make a web app to predict the emotion of a user image taken from a webcam shot. 

In [31]:
mod.save_weights(r'D:\Github\Facial_Expression_Recognition\src\static\assets\stack_mod1.h5')