# Feature Engineering

Using the results of the extract workload, we have our minimally preprocessed data that we can use to craft the following features.

1. **Eye length ratio**: length of eye (maximum of two) over distance between points 8 and 13.
2. **Eye distance ratio**: distance between center of two eyes over distance between points 8 and 13.
3. **Nose ratio**: Distance between points 15 and 16 over distance between 20 and 21.
4. **Lip size ratio**: Distance between points 2 and 3 over distance between 17 and 18.
5. **Lip length ratio**: Distance between points 2 and 3 over distance between 20 and 21.
6. **Eye-brow length ratio**: Distance between points 4 and 5 (or distance between points 6 and 7 whichever is larger) over distance between 8 and 13.
7. **Aggressive ratio**: Distance between points 10 and 19 over distance between 20 and 21.

## Fetch Extract Workload Results

In [21]:
import pandas as pd

ex_file_path = "../extract/ex_res/ex_res.csv"

ex_df = pd.read_csv(ex_file_path)

ex_df.head()

Unnamed: 0,index,gender,person_id,neutral,smile,anger,left_light,p_1_x,p_2_x,p_3_x,...,p_13_y,p_14_y,p_15_y,p_16_y,p_17_y,p_18_y,p_19_y,p_20_y,p_21_y,p_22_y
0,0,1,m001,1,0,0,0,328.444,275.496,434.921,...,374.253,395.527,374.253,416.925,373.276,483.314,280.342,404.39,499.835,402.522
1,1,1,m001,0,1,0,0,344.026,270.09,449.912,...,385.07,389.178,386.006,421.015,390.18,491.438,281.39,393.009,511.685,397.247
2,2,1,m001,0,0,1,0,329.17,291.426,436.553,...,376.119,414.293,378.616,440.265,380.614,505.195,279.723,418.289,501.982,416.291
3,3,1,m001,0,0,0,1,345.098,260.392,451.765,...,387.765,367.059,387.765,393.412,387.765,468.706,286.118,383.373,509.49,389.647
4,4,1,m002,1,0,0,0,327.193,263.025,437.671,...,386.155,367.268,386.155,394.529,388.628,472.462,282.728,382.655,499.719,377.464


## Craft Features from Extract Dataframe

In [29]:
import numpy as np

# Function to calculate distance between two points
def distance(x1, y1, x2, y2):
    return np.sqrt((x2 - x1)**2 + (y2 - y1)**2)

column_names = ['gender', 'person_id', 'EyeDistanceRatio']

# Calculate the features
ex_features_df = pd.DataFrame(columns=column_names)

ex_features_df['gender'] = ex_df[['gender']]
ex_features_df['person_id'] = ex_df[['person_id']]

# Eye distance ratio: distance between center of two eyes over distance between points 8 and 13
ex_features_df['EyeDistanceRatio'] = ex_df.apply(lambda row: distance(row['p_8_x'], row['p_8_y'], row['p_13_x'], row['p_13_x']), axis=1)

ex_features_df

Unnamed: 0,gender,person_id,EyeDistanceRatio
0,1,m001,49.285666
1,1,m001,82.483020
2,1,m001,39.614121
3,1,m001,92.083191
4,1,m002,83.649102
...,...,...,...
504,0,w058,40.344380
505,0,w059,35.389558
506,0,w059,68.075829
507,0,w060,39.372542
