# Data Preprocessing

tqdm simply gives a progress bar for loops.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm_notebook as tqdm

In [2]:
data = pd.read_csv('fer2013.csv', delimiter=';')

In [3]:
data.head()

Unnamed: 0,emotion,pixels,usage
0,0,70 80 82 72 58 58 60 63 54 58 60 48 89 115 121...,Training
1,0,151 150 147 155 148 133 111 140 170 174 182 15...,Training
2,2,231 212 156 164 174 138 161 173 182 200 106 38...,Training
3,4,24 32 36 30 32 23 19 20 30 41 21 22 32 34 21 1...,Training
4,6,4 0 0 0 0 0 0 0 0 0 0 0 3 15 23 28 48 50 58 84...,Training


Emotion contains a number corresponding to an emotion:

0: 'angry'

1: 'disgust'

2: 'fear'

3: 'happy'

4: 'sad'

5: 'surprise'

6: 'neutral'

Pixels contains a string of 2304 numbers corresponding to that pixel's color value (0-255)

In [4]:
data.usage.value_counts()

Training       28709
PublicTest      3589
PrivateTest     3589
Name: usage, dtype: int64

Usage shows the original dataset's split up - 28709 for training and the remaining 7178 for testing

In [5]:
def split_stuff(image):
    return np.array([int(x) for x in image.split(' ')])

In [6]:
data['pixels'].head()

0    70 80 82 72 58 58 60 63 54 58 60 48 89 115 121...
1    151 150 147 155 148 133 111 140 170 174 182 15...
2    231 212 156 164 174 138 161 173 182 200 106 38...
3    24 32 36 30 32 23 19 20 30 41 21 22 32 34 21 1...
4    4 0 0 0 0 0 0 0 0 0 0 0 3 15 23 28 48 50 58 84...
Name: pixels, dtype: object

In [7]:
data['pixels'] = data['pixels'].apply(split_stuff)

In [8]:
data['pixels'][0]

array([ 70,  80,  82, ..., 106, 109,  82])

We've converted the string of numbers into an array of 2304 integers.

In [9]:
data['emotion'] = np.array(data['emotion'])

In [10]:
X = np.zeros((data['pixels'].shape[0], 48*48))
X.shape

(35887, 2304)

In [11]:
X

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

We've defined a new matrix with 35887 rows and 2304 columns and intialized it all to 0. This is done so that we obtain a matrix of shape (35887, 2304). 

## DO NOT RUN THIS

It takes a lot of time. You'll have to run it once in the starting though. Remove the tqdm part if you don't have it installed -> 
    for i in range(X.shape[0]):

In [None]:
for i in tqdm(range(X.shape[0])):
    for j in range(X.shape[1]):
        X[i, j] = int(data['pixels'][i][j])

We then store the dataset into this matrix and prepare to run the neural network.

In [None]:
np.save('features', X)
np.save('labels', data['emotion'])

We save both the features and labels to allow us to simply continue from where we left off.