# Emotion Recognition

### Introduction to CNN Keras

- **1.Introduction**
- **2.Data preparation**
   - Load data
   - Check for null and missing values
   - Normalization
   - Reshape
   - Label encoding
   - Split training and testing set
- **3.CNN**
   - Define the model
   - Set the optimizer
   - Data augmentation
- **4.Evaluate the model**
   -  Training and validation curves
- **5.Prediction and submition**
   - Predict and Submit results

## 1. Introduction:


- In this Notebook, I built my first CNN for emotion recognition. I choosed to build it with keras API (Tensorflow backend) which is very intuitive. Firstly, I will prepare the data then i will focus on the CNN modeling and evaluation. 

- _This Notebook follows three main parts_ :
   - The data preparation
   - The CNN modeling and evaluation
   - The results prediction and submission


> **About the dataset** :
- The data consists of 48*48 pixel grayscale images of faces. The faces have been automatically registred so that the face is more or less concentered and occupies about the same amount of spaces in each images
- The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories
- Emotion column :
     - 0 = Angry     
     - 1 = Disgust
     - 2 = Fear
     - 3 = Happy
     - 4 = Sad 
     - 5 = Surprise
     - 6 = Neutral

##### Import libraries

In [22]:
import os
import sys
import cv2

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

# Importing the Keras libraries and packages
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from keras.optimizers import Adam, RMSprop, Adagrad 
from keras.utils.np_utils import to_categorical # to covert to OneHotEncoder
from keras.preprocessing.image import ImageDataGenerator #for data Augmentation
from keras.regularizers import l2 
from keras.utils import np_utils

## 2.Data preparation

### Load The data

In [2]:
data = pd.read_csv('fer2013.csv')

In [3]:
#Peck at the data
data.head()

Unnamed: 0,emotion,pixels,Usage
0,0,70 80 82 72 58 58 60 63 54 58 60 48 89 115 121...,Training
1,0,151 150 147 155 148 133 111 140 170 174 182 15...,Training
2,2,231 212 156 164 174 138 161 173 182 200 106 38...,Training
3,4,24 32 36 30 32 23 19 20 30 41 21 22 32 34 21 1...,Training
4,6,4 0 0 0 0 0 0 0 0 0 0 0 3 15 23 28 48 50 58 84...,Training


- The Target variable is emotion
- We notice that the pixels column contains all the 48*48 pixels. So as preprocessing, We have to split pixels column 

- Usage column will help us to extract both the train and test data from the raw data 

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35887 entries, 0 to 35886
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   emotion  35887 non-null  int64 
 1   pixels   35887 non-null  object
 2   Usage    35887 non-null  object
dtypes: int64(1), object(2)
memory usage: 841.2+ KB


In [5]:
# Data size
data.shape

(35887, 3)

### Check for null and missing values

In [6]:
data.isna().sum()

emotion    0
pixels     0
Usage      0
dtype: int64

- There is no missing values in the dataset. So we can safely go ahead.

In [7]:
data.Usage.value_counts()

Training       28709
PrivateTest     3589
PublicTest      3589
Name: Usage, dtype: int64

In [8]:
data.emotion.value_counts()

3    8989
6    6198
4    6077
2    5121
0    4953
5    4002
1     547
Name: emotion, dtype: int64

### Split the data into the train and test set

In [9]:
X_train, y_train, X_test, y_test = [],[],[],[]

for index, row in data.iterrows():
    val = row['pixels'].split(" ")
    try:
        if 'Training' in row['Usage']:
            X_train.append(np.array(val, 'float32'))
            y_train.append(row['emotion'])
        elif 'PublicTest' in row['Usage']:
            X_test.append(np.array(val, 'float32'))
            y_test.append(row['emotion'])
    except:
        print(f'Error occured at index :{index} and row :{row}')

In [10]:
print(f'X_train sample data \n :{X_train[0:2]}')
print('-------------------------------')
print(f'Y_train sample data \n:{y_train[0:2]}')
print('-------------------------------')
print(f'X_test sample data \n:{X_test[0:2]}')
print('-------------------------------')
print(f'y_test sample data \n:{y_test[0:2]}')

X_train sample data 
 :[array([ 70.,  80.,  82., ..., 106., 109.,  82.], dtype=float32), array([151., 150., 147., ..., 193., 183., 184.], dtype=float32)]
-------------------------------
Y_train sample data 
:[0, 0]
-------------------------------
X_test sample data 
:[array([254., 254., 254., ...,  42., 129., 180.], dtype=float32), array([156., 184., 198., ..., 172., 167., 161.], dtype=float32)]
-------------------------------
y_test sample data 
:[0, 1]


In [11]:
#Covert data into an array
X_train = np.array(X_train, 'float32')
y_train = np.array(y_train, 'float32')
X_test  = np.array(X_test,'float32')
y_test  = np.array(y_test,'float32')

### Normalization

- We perform a grayscale normalization to reduce the effect of illumination's differences.
- Moreover the CNN converg faster on [0..1] data than on [0..255].

In [13]:
# Normalize the data between 0 and 1
#def NormalizeData(X):
    X = X - np.mean(X, axis=0) / np.std(X, axis=0)
    return X
#X_train = NormalizeData(X_train)
#X_test  = NormalizeData(X_test)

#Normalisation des ds par Keras
#x_train = tf.keras.utils.normalize(x_train, axis=1)
#x_test = tf.keras.utils.normalize(x_test, axis=1)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### Reshape

In [14]:
# Reshape the pixel into (x_train[0],48,48,1)

width, height = 48, 48

X_train = X_train.reshape(X_train.shape[0], width, height, 1)
X_test = X_test.reshape(X_test.shape[0], width, height, 1)

- Train and test images (48px x 48px) has been stock into pandas.Dataframe as 1D vectors. We reshape all data to 48x48x1 3D matrices.
- Keras requires an extra dimension in the end which correspond to channels. Emotion recognition images are gray scaled so it use only one channel. For RGB images, there is 3 channels, we would have reshaped 48*48px vectors to 28x28x3 3D matrices.

### Label encoding


In [15]:
y_train = to_categorical(y_train, num_classes = 7)
y_test  = to_categorical(y_test, num_classes = 7)

- We have seven emotions from 0 to 6. We need to encode these lables to one hot vectors (ex : 2 -> [0,0,1,0,0,0,0,0,0,0]), in order to compare it to the outpout (predicted value) which represents the probabilties of each labels


In [16]:
#X_test.shape
#X_train.shape
y_train.shape

(28709, 7)

## Building The CNN :

- I used the Keras Sequential API, where you have just to add one layer at a time, starting from the input.

- The first is the convolutional (Conv2D) layer. It is like a set of learnable filters. I choosed to set 60 filters for the the first conv2D layer and 32 filters for the second one and so one so for . Each filter transforms a part of the image (defined by the kernel size) using the kernel filter. The kernel filter matrix is applied on the whole image. Filters can be seen as a transformation of the image.

- The CNN can isolate features that are useful everywhere from these transformed images (feature maps).

- I important also the pooling (MaxPool2D) layer. This layer simply acts as a downsampling filter. It looks at the 2 neighboring pixels and picks the maximal value. These are used to reduce computational cost, and to some extent also reduce overfitting. We have to choose the pooling size (i.e the area size pooled each time) more the pooling dimension is high, more the downsampling is important.

- Combining convolutional and pooling layers, CNN are able to combine local features and learn more global features of the image.

- Dropout is a regularization method, where a proportion of nodes in the layer are randomly ignored (setting their wieghts to zero) for each training sample. This drops randomly a propotion of the network and forces the network to learn features in a distributed way. This technique also improves generalization and reduces the overfitting.

- 'relu' is the rectifier (activation function max(0,x). The rectifier activation function is used to add non linearity to the network.

- The Flatten layer is use to convert the final feature maps into a one single 1D vector. This flattening step is needed so that you can make use of fully connected layers after some convolutional/maxpool layers. It combines all the found local features of the previous convolutional layers.

### Define the model

In [17]:
# Initialising the CNN
model = Sequential()

In [18]:
model.add(Conv2D(60, kernel_size=(5,5),padding='same', activation='relu', input_shape=(width, height,1)))

model.add(Conv2D(32, kernel_size=(3,3),padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.5))

model.add(Conv2D(60, kernel_size=(5,5),padding='same', activation='relu'))

model.add(Conv2D(32, kernel_size=(3,3),padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.4))

model.add(Conv2D(60, kernel_size=(5,5),padding='same',activation='relu'))

model.add(Conv2D(32, kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.3))

model.add(Flatten())

#1st Hidden Layer
model.add(Dense(150, activation='relu'))
model.add(Dropout(0.3))

#2nd Hidden Layer
model.add(Dense(100, activation='relu'))
model.add(Dropout(0.3))

#3rd Hidden Layer
model.add(Dense(50, activation='relu'))
model.add(Dropout(0.3))

#Output
model.add(Dense(7, activation='softmax'))

- Once our layers are added to the model, we need to set up a score function, a loss function and an optimisation algorithm.
- We define the loss function to measure how poorly our model performs on images with known labels. It is the error rate between the oberved labels and the predicted ones. We use a specific form for categorical classifications (>2 classes) called the "categorical_crossentropy".
- The most important function is the optimizer. This function will iteratively improve parameters (filters kernel values, weights and bias of neurons ...) in order to minimise the loss.
- I choosed Adam , it is a very effective optimizer. The RMSProp update adjusts the Adagrad method in a very simple way in an attempt to reduce its aggressive, monotonically decreasing learning rate. We could also have used Stochastic Gradient Descent ('sgd') optimizer, but it is slower than RMSprop.
- The metric function "accuracy" is used is to evaluate the performance our model. This metric function is similar to the loss function, except that the results from the metric evaluation are not used when training the model (only for evaluation).

In [19]:
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [20]:
print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 48, 48, 60)        1560      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 48, 48, 32)        17312     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 24, 24, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 24, 24, 60)        48060     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 24, 24, 32)        17312     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 32)        0

### Data Augmentation

- In order to avoid overfitting problem, we need to expand artificially our handwritten digit dataset. We can make your existing dataset even larger. The idea is to alter the training data with small transformations to reproduce the variations occuring when someone is writing a digit.
- For example, the number is not centered The scale is not the same (some who write with big/small numbers) The image is rotated...
- Approaches that alter the training data in ways that change the array representation while keeping the label the same are known as data augmentation techniques. Some popular augmentations people use are grayscales, horizontal flips, vertical flips, random crops, color jitters, translations, rotations, and much more.
- By applying just a couple of these transformations to our training data, we can easily double or triple the number of training examples and create a very robust model.


In [23]:
# With data augmentation to prevent overfitting (accuracy 0.99286)

datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image 
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip images


datagen.fit(X_train)

- For the data augmentation, i choosed to :

   - Randomly rotate some training images by 10 degrees
   - Randomly Zoom by 10% some training images
   - Randomly shift images horizontally by 10% of the width
   - Randomly shift images vertically by 10% of the height

- I did not apply a vertical_flip nor horizontal_flip since it could have lead to misclassify symetrical numbers such as 6 and 9.
- Once our model is ready, we fit the training dataset .

### Train the model

In [24]:
%%time
model.fit(X_train, y_train, 
          batch_size=32,
          epochs=30, 
          shuffle=True,
          verbose=1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
CPU times: user 7h 32min 20s, sys: 56min 50s, total: 8h 29min 10s
Wall time: 10h 19min 12s


<tensorflow.python.keras.callbacks.History at 0x7fd840420f10>

In [26]:
model_loss, model_acuracy = model.evaluate(X_test, y_test)

print(f'model Loss : {model_loss}')
print(f'model Accuarcy : {model_acuracy}')

model Loss : 1.2274876832962036
model Accuarcy : 0.5335748195648193


# Save the model

In [27]:
fer_json = model.to_json()
with open('fer.json','w') as json_file:
    json_file.write(fer_json)
model.save_weights('fer.h5')

In [1]:
import os 
os.getcwd()

'/Users/mac/Downloads/Real-Time-Emotion-Detection-main'