# Data Preprocessing

## Emotion Face Classifier Notebook 1

Reads initial csv data in and generates images by usage (train/test) and emotion category.

Pixel data is stored in a single column and imported as a string, so conversion to a 2D matrix is needed. 

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import os
import warnings
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
from datascifuncs.tidbit_tools import load_json, print_json, check_directory_name

In [None]:
# Ensure working directory for correct filepaths
main_dir = 'EmotionFaceClassifier'
check_directory_name(main_dir)

In [None]:
from utils.preprocessing import (
    convert_pixels_to_array,
    save_image
)

### Import and Check Details

Imports settings from json and displays relevant sections.

In [None]:
# Load common dicts from json config file
common_dicts = load_json('./configs/input_mappings.json')
print_json(common_dicts)

In [None]:
# Select emotion mapping section of json
emo_dict = common_dicts['emo_dict']
print_json(emo_dict)

In [None]:
# Select color mappings for emotion categories
emo_color_dict = common_dicts['color_dict']
print_json(emo_color_dict)

In [None]:
# Get set order to display results
category_order = common_dicts['category_order']
print_json(category_order)

# Import Data

Imports and explores basic aspects of FER 2013 data.

In [None]:
# Read in FER 2013 data
fer2013_path = 'data/fer2013.csv'
fer2013 = pd.read_csv(fer2013_path)

In [None]:
# Check column names and shape
print(fer2013.columns)
print(fer2013.shape)

In [None]:
# Check emotion values
print(sorted(fer2013['emotion'].unique()))

In [None]:
# Map emotion labels to values for clarity
fer2013 = fer2013.rename(columns={'emotion': 'emotion_id'})
fer2013['emotion'] = fer2013['emotion_id'].astype(str).map(emo_dict)

In [None]:
# Pixel data converted to np.array
fer2013['image'] = fer2013['pixels'].apply(convert_pixels_to_array)

In [None]:
# Initial data has 3 usages: train, public test, private test
# Mapping reduces to train and test only 
fer2013['usage']=fer2013['Usage'].map(common_dicts['usage_dict'])

In [None]:
# Add emotion color tags
fer2013['color'] = fer2013['emotion'].map(emo_color_dict)

In [None]:
# Create counts of each emotion
gby = fer2013.groupby(['emotion'], as_index=False, observed=True).size()
gby

In [None]:
# Add a color column to the DataFrame based on the emotion
gby['color'] = gby['emotion'].map(emo_color_dict)
gby

In [None]:
# Filter the order list to include only categories present in the DataFrame
filtered_order = [cat for cat in category_order if cat in gby['emotion'].unique()]
print(filtered_order)

In [None]:
# Convert emotion column data type to be categorical for plotting
gby['emotion'] = pd.Categorical(gby['emotion'], categories=filtered_order, ordered=True)
gby

### Count Plot

Using count data, generates a barplot to visual category distribution.

More advanced plots are generated in next notebook.

FutureWarnings arise during this step and are ignored as they do not impact the plot. 

In [None]:
with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)

    fig, ax = plt.subplots(figsize=(10, 6))
    sns.barplot(data=gby, x='emotion', y='size', ax=ax, palette=gby['color'])
    
    plt.xlabel('Emotion')
    plt.ylabel('Count')
    plt.title('Emotion Image Counts')
    plt.show()

### Save Data

Each image is saved to a jpg based on usage (train/test) and emotion category.

An identifying interger is added to filenames, for organization.

A new csv, fer2013_paths.csv, is written out to the data directory with filepaths for all generated images. 

In [None]:
# Creates a numeric index for each usage/emotion group
fer2013['emo_count_id'] = fer2013.groupby(['usage', 'emotion']).cumcount()+1

In [None]:
# Write image to jpg and returns filepath
fer2013['img_path'] = fer2013.apply(save_image, axis=1)

In [None]:
# Save updated df
save_path = os.path.join('data', 'fer2013_paths.csv')
fer2013.to_csv(save_path, index=False)