## FacialExpressionRecognition
The data is found at [dataset](https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/data)

### Dataset Description

The data consists of 48x48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).

train.csv contains two columns, "emotion" and "pixels". The "emotion" column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image. The "pixels" column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row major order. test.csv contains only the "pixels" column and your task is to predict the emotion column.

The training set consists of 28,709 examples. The public test set used for the leaderboard consists of 3,589 examples. The final test set, which was used to determine the winner of the competition, consists of another 3,589 examples.

This dataset was prepared by Pierre-Luc Carrier and Aaron Courville, as part of an ongoing research project. They have graciously provided the workshop organizers with a preliminary version of their dataset to use for this contest.

In [3]:
import pandas as pd
import numpy as np
from sklearn.utils import shuffle

In [4]:
path_to_data = fr'E:\Downloads\csv\facial-expressions\train'
labels = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']

In [3]:
df = pd.read_csv(path_to_data + fr'\train.csv')
df.head()

Unnamed: 0,emotion,pixels
0,0,70 80 82 72 58 58 60 63 54 58 60 48 89 115 121...
1,0,151 150 147 155 148 133 111 140 170 174 182 15...
2,2,231 212 156 164 174 138 161 173 182 200 106 38...
3,4,24 32 36 30 32 23 19 20 30 41 21 22 32 34 21 1...
4,6,4 0 0 0 0 0 0 0 0 0 0 0 3 15 23 28 48 50 58 84...


In [47]:
df[(df.emotion == 0) | (df.emotion == 1)]

Unnamed: 0,emotion,pixels
0,0,70 80 82 72 58 58 60 63 54 58 60 48 89 115 121...
1,0,151 150 147 155 148 133 111 140 170 174 182 15...
10,0,30 24 21 23 25 25 49 67 84 103 120 125 130 139...
22,0,123 125 124 142 209 226 234 236 231 232 235 22...
23,0,8 9 14 21 26 32 37 46 52 62 72 70 71 73 76 83 ...
...,...,...
28675,0,111 111 112 110 111 111 109 106 99 88 44 68 12...
28686,0,178 184 187 195 199 194 197 205 202 194 201 20...
28702,0,196 194 188 177 156 124 81 60 65 64 84 119 114...
28705,0,114 112 113 113 111 111 112 113 115 113 114 11...


In [16]:
def get_train_test_data(percent_train = 0.8, balance_ones = False) :
    df = pd.read_csv(path_to_data + fr'\train.csv')
    if balance_ones:
        #balance the 1 class
        new_df = pd.DataFrame(df[df.emotion == 1].values.repeat(9, axis = 0), columns = df.columns)
        df = pd.concat((new_df, df))
        del new_df
        
    df = df.sample(frac = 1)
    
    last_train_index = int(df.shape[0] * percent_train)
    Xtrain = df.iloc[:last_train_index, 1].map(lambda x : np.array(x.split(' ')).astype(np.int16)).values / 255
    Ttrain = df.iloc[:last_train_index, 0].values.astype(np.int16)
    Xtrain = np.stack(Xtrain)
    
    Xtest = df.iloc[last_train_index:, 1].map(lambda x : np.array(x.split(' ')).astype(np.int16)).values / 255
    Ttest = df.iloc[last_train_index:, 0].values.astype(np.int16)
    Xtest = np.stack(Xtest)
    
    return Xtrain, Ttrain, Xtest, Ttest

def get_binary_data(balance_ones = False) :
    df = pd.read_csv(path_to_data + fr'\train.csv')
    df = df[(df.emotion == 0) | (df.emotion == 1)]
    if balance_ones:
        #balance the 1 class
        new_df = pd.DataFrame(df[df.emotion == 1].values.repeat(9, axis = 0), columns = df.columns)
        df = pd.concat((new_df, df))
        del new_df
        
    df = df.sample(frac = 1)
    
    X = df.iloc[:, 1].map(lambda x : np.array(x.split(' ')).astype(np.int16)).values / 255
    T = df.iloc[:, 0].values.astype(np.int16)
    X = np.stack(X)
    
    return X, T


def get_data(count = 10) :
    df = pd.read_csv(path_to_data + fr'\train.csv')
    assert(count < df.shape[0])

    df = df.sample(frac = 1)
    
    X = df.iloc[:count, 1].map(lambda x : np.array(x.split(' ')).astype(np.int16)).values / 255
    T = df.iloc[:count, 0].values.astype(np.int16)
    X = np.stack(X)
    
    return X, T

In [54]:
def get_labels() :
    return ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']

In [15]:
get_data(1)[0].shape

(1, 2304)

In [19]:
x, _  = get_binary_data()
_.shape

(4431,)