# Facial Expression Recognition

### Main Objective:
For human beings, understanding expressions is a very easy task. But the same cannot be said for the computers. Even though there has been a tremendous growth in the Machine Learning/ AI field such that computers are proved to be more accurate than humans in a lot of tasks related to computer vision, NLP and many more. But for a computer to interpret the facial expressions has still been a difficult task for computers. In this project, we will construct a model that can interpret the facial expressions.


### About the Dataset:
We will be using the famous FER2013 dataset (link for the dataset: https://www.kaggle.com/jonathanoheix/face-expression-recognition-dataset). The data consists of 35k 48x48 pixel images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral).

The dataset is divided into 29k training dataset and 7k testing dataset.

### Preprocessing

### Imports

In [1]:
# For interacting with operating system
import os

# For interacting with files
import glob

# Data manipulation & Visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from tqdm import tqdm

# For saving data
import pickle

### Read in Data

We are provided with separate train and test folders each consisting of the 7 folders namely classes. We will use these folders to create a dataframe (mixed randomly) for each train and test sets.

In [2]:
# File paths
base_dir = os.path.join('../input/face-expression-recognition-dataset/images/images/')
train_dir = os.path.join(base_dir, 'train')
test_dir = os.path.join(base_dir, 'validation')

# List dirs under train folder
print(os.listdir(train_dir))

['fear', 'sad', 'disgust', 'happy', 'angry', 'neutral', 'surprise']


In [3]:
# Distribution of expressions in the train dataset
for exp in os.listdir(train_dir):
    train_files = glob.glob(os.path.join(train_dir, exp) + '/*.jpg')    
    print('Train images with expression %s: '%(exp), len(train_files))
    
print('\nTotal number of images in Train Set: ', len(glob.glob(train_dir + '/*/*.jpg')))

Train images with expression fear:  4103
Train images with expression sad:  4938
Train images with expression disgust:  436
Train images with expression happy:  7164
Train images with expression angry:  3993
Train images with expression neutral:  4982
Train images with expression surprise:  3205

Total number of images in Train Set:  28821


In [4]:
# Distribution of expressions in the test dataset
for exp in os.listdir(test_dir):
    test_files = glob.glob(os.path.join(test_dir, exp) + '/*.jpg')
    print('Test images with expression %s: '%(exp), len(test_files))

print('\nTotal number of images in Test Set: ', len(glob.glob(test_dir + '/*/*.jpg')))

Test images with expression fear:  1018
Test images with expression sad:  1139
Test images with expression disgust:  111
Test images with expression happy:  1825
Test images with expression angry:  960
Test images with expression neutral:  1216
Test images with expression surprise:  797

Total number of images in Test Set:  7066


In [5]:
# Create train dataframe using train images 
train_angry = glob.glob(os.path.join(train_dir, 'angry') + '/*.jpg')
train_disgust = glob.glob(os.path.join(train_dir, 'disgust') + '/*.jpg')
train_fear = glob.glob(os.path.join(train_dir, 'fear') + '/*.jpg')
train_happy = glob.glob(os.path.join(train_dir, 'happy') + '/*.jpg')
train_neutral = glob.glob(os.path.join(train_dir, 'neutral') + '/*.jpg')
train_sad = glob.glob(os.path.join(train_dir, 'sad') + '/*.jpg')
train_surprise = glob.glob(os.path.join(train_dir, 'surprise') + '/*.jpg')

np.random.seed(42)
train = pd.DataFrame({
    'filename': train_angry+train_disgust+train_fear+train_happy+train_neutral+train_sad+train_surprise,
    'label': ['angry'] * len(train_angry) + ['disgust'] * len(train_disgust) + ['fear'] * len(train_fear) + 
        ['happy'] * len(train_happy) + ['neutral'] * len(train_neutral) + ['sad'] * len(train_sad) + ['surprise']*len(train_surprise)
}).sample(frac=1, random_state=42).reset_index(drop=True)

train

Unnamed: 0,filename,label
0,../input/face-expression-recognition-dataset/i...,sad
1,../input/face-expression-recognition-dataset/i...,sad
2,../input/face-expression-recognition-dataset/i...,happy
3,../input/face-expression-recognition-dataset/i...,fear
4,../input/face-expression-recognition-dataset/i...,neutral
...,...,...
28816,../input/face-expression-recognition-dataset/i...,sad
28817,../input/face-expression-recognition-dataset/i...,fear
28818,../input/face-expression-recognition-dataset/i...,angry
28819,../input/face-expression-recognition-dataset/i...,neutral


In [6]:
# Create test dataframe using test images
test_angry = glob.glob(os.path.join(test_dir, 'angry') + '/*.jpg')
test_disgust = glob.glob(os.path.join(test_dir, 'disgust') + '/*.jpg')
test_fear = glob.glob(os.path.join(test_dir, 'fear') + '/*.jpg')
test_happy = glob.glob(os.path.join(test_dir, 'happy') + '/*.jpg')
test_neutral = glob.glob(os.path.join(test_dir, 'neutral') + '/*.jpg')
test_sad = glob.glob(os.path.join(test_dir, 'sad') + '/*.jpg')
test_surprise = glob.glob(os.path.join(test_dir, 'surprise') + '/*.jpg')

np.random.seed(42)
test = pd.DataFrame({
    'filename': test_angry+test_disgust+test_fear+test_happy+test_neutral+test_sad+test_surprise,
    'label': ['angry'] * len(test_angry) + ['disgust'] * len(test_disgust) + ['fear'] * len(test_fear) + 
        ['happy'] * len(test_happy) + ['neutral'] * len(test_neutral) + ['sad'] * len(test_sad) + ['surprise']*len(test_surprise)
}).sample(frac=1, random_state=42).reset_index(drop=True)

test

Unnamed: 0,filename,label
0,../input/face-expression-recognition-dataset/i...,neutral
1,../input/face-expression-recognition-dataset/i...,fear
2,../input/face-expression-recognition-dataset/i...,neutral
3,../input/face-expression-recognition-dataset/i...,happy
4,../input/face-expression-recognition-dataset/i...,angry
...,...,...
7061,../input/face-expression-recognition-dataset/i...,happy
7062,../input/face-expression-recognition-dataset/i...,sad
7063,../input/face-expression-recognition-dataset/i...,sad
7064,../input/face-expression-recognition-dataset/i...,sad


In [7]:
# Save the values from test dataframe into new varibales
X_test = test['filename'].values
y_test = test['label'].values

# Save the values from test dataframe into new varibales
X_train = train['filename'].values
y_train = train['label'].values

In [8]:
# Take a look at the distribution at once 
from collections import Counter
print(X_train.shape, X_test.shape)

print('Train: ', Counter(y_train))
print('Test: ', Counter(y_test))

(28821,) (7066,)
Train:  Counter({'happy': 7164, 'neutral': 4982, 'sad': 4938, 'fear': 4103, 'angry': 3993, 'surprise': 3205, 'disgust': 436})
Test:  Counter({'happy': 1825, 'neutral': 1216, 'sad': 1139, 'fear': 1018, 'angry': 960, 'surprise': 797, 'disgust': 111})


Above we can see, images are properly splitted accross the three datasets. 

### Encoding

**Image to Numpy Array**: Our Model will not take .JPG files as input rather an numpy array of the images. We will use keras img_to_array and load_img to convert the images to numpy arrays.

In [9]:
# Convert Images to Arrays
from keras.preprocessing.image import img_to_array, load_img
X_train =  [img_to_array(load_img(img, grayscale=False)) for img in tqdm(X_train)]
X_train = np.array(X_train) #changing from list to array

X_test =  [img_to_array(load_img(img, grayscale=False)) for img in tqdm(X_test)]
X_test = np.array(X_test) #changing from list to array

Using TensorFlow backend.
100%|██████████| 28821/28821 [00:59<00:00, 485.56it/s]
100%|██████████| 7066/7066 [00:13<00:00, 522.15it/s]


In [10]:
print('Shape of Training dataset: ', X_train.shape)
print('Shape of Training labels: ', y_train.shape, '\n')

print('Shape of Testing dataset: ', X_test.shape)
print('Shape of Testing labels: ', y_test.shape)

#print(list(set(y_train)))
#print(list(set(y_val)))

Shape of Training dataset:  (28821, 48, 48, 3)
Shape of Training labels:  (28821,) 

Shape of Testing dataset:  (7066, 48, 48, 3)
Shape of Testing labels:  (7066,)


**Label Encoding**: Similarly as above, for labels the model will not take words of english rather numberically encoded values of those words. We will use LabelEncode from sklearn library along with to_categorical from keras.

In [11]:
# Encode labels with value between 0 and n_classes-1.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(y_train)
y_train_enc = le.transform(y_train)
y_test_enc = le.transform(y_test)

# Converts a class vector (integers) to binary class matrix.
from keras.utils.np_utils import to_categorical
y_train_enc = to_categorical(y_train_enc)
y_test_enc = to_categorical(y_test_enc)

In [12]:
print('Shape of Training Labels: ', y_train_enc.shape)
print('Shape of Testing Labels: ', y_test_enc.shape)

Shape of Training Labels:  (28821, 7)
Shape of Testing Labels:  (7066, 7)


### Saving the variables

In [13]:
# Creating directory
os.mkdir('pickle') if not os.path.isdir('pickle') else None 

In [14]:
# Save data using Pickle.
#Save training data
with open('pickle/X_train.pickle','wb') as f:
    pickle.dump(X_train, f)
with open('pickle/y_train_enc.pickle','wb') as f:
    pickle.dump(y_train_enc, f)
    
# Save testing data
with open('pickle/X_test.pickle','wb') as f:
    pickle.dump(X_test, f)
with open('pickle/y_test_enc.pickle','wb') as f:
    pickle.dump(y_test_enc, f)

In [15]:
print('Shape of Training dataset: ', X_train.shape)
print('Shape of Training labels: ', y_train_enc.shape, '\n')

print('Shape of Testing dataset: ', X_test.shape)
print('Shape of Testing labels: ', y_test_enc.shape)

Shape of Training dataset:  (28821, 48, 48, 3)
Shape of Training labels:  (28821, 7) 

Shape of Testing dataset:  (7066, 48, 48, 3)
Shape of Testing labels:  (7066, 7)
