# Exploratory Data Analysis (EDA) Overview

## Emotion Face Classifier Notebook 2

Generates summary and visuals of counts of emotion by usage.

This notebook focuses on data proportions and counts, next notebook explores image properties such as pixel density.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import os
import pandas as pd

In [None]:
from datascifuncs.tidbit_tools import load_json, write_json, print_json, check_directory_name

In [None]:
# Ensure working directory for correct filepaths
main_dir = 'EmotionFaceClassifier'
check_directory_name(main_dir)

In [None]:
from utils.eda import plot_emotion_counts, plot_emotion_waffle

### Set Paths and Load Data

Defines paths for imports and exports.

Imports json details for plotting styles.

In [None]:
intermediate_data = os.path.join('data', 'intermediate')
os.makedirs(intermediate_data, exist_ok=True)

In [None]:
image_dir = './images'
os.makedirs(image_dir, exist_ok=True)

In [None]:
gby_df_path = os.path.join(intermediate_data, 'count_pivot.csv')

In [None]:
common_dicts = load_json('./configs/input_mappings.json')
print_json(common_dicts)

In [None]:
# Select emotion mapping section of json
emo_dict = common_dicts['emo_dict']
print_json(emo_dict)

In [None]:
# Style settings for plotly 
style_dict = common_dicts['plotly_styles']
print_json(style_dict)

In [None]:
# Get subset of emo-color mappings
color_dict = common_dicts['color_dict']
color_dict

In [None]:
# Category ordering
category_order = common_dicts['category_order']

In [None]:
style_dict['Training']['color']=color_dict
style_dict['Testing']['color']=color_dict

### Load and Explore Data

Loads data, explores features, and generates counts df for plotting.

In [None]:
# Read in FER 2013 data
fer2013_path = 'data/fer2013_paths.csv'
fer2013 = pd.read_csv(fer2013_path)

In [None]:
# Check column names and shape
print(fer2013.columns)
print(fer2013.shape)

In [None]:
# Check emotion values
print(sorted(fer2013['emotion'].unique()))

In [None]:
# Create groupby counts of each emotion
gby = fer2013.groupby(['emotion', 'usage'], as_index=False).size()
gby

In [None]:
# Sort df for consistency
gby.sort_values(by=['usage'], ascending=False, inplace=True)
gby

In [None]:
# Rename size column to Count for plots
gby = gby.rename(columns={'size':'Count'})
gby

In [None]:
# Map colors to emotions
gby['color'] = gby['emotion'].map(color_dict)
gby

In [None]:
# Map opacity by usage for plots (Train=1, Test=.5)
gby['opacity'] = gby['usage'].apply(lambda x: 1.0 if x == 'Training' else 0.5)
gby

In [None]:
# Save df to path set above
gby.to_csv(gby_df_path, index=False)

## Data Count Visualizations

Plots display both usage and emotion in multiple formats to show data distrubtion.

### Bar Plots

Key options for customization of bar plots include:
    - output_path: If None will not save, otherwise saves to given path
    - is_stacked: Boolean, if True stack usage, otherwise group
    - auto_text: Boolean, default False. If True, adds counts as text above data
    - legend_note: If not None, text will be displayed in a box. Used for additional details as needed.

In [None]:
# Set order of categories to consistency across plots
filtered_order = [cat for cat in category_order if cat in gby['emotion'].unique()]

In [None]:
annotation_text = """
Train Images: Solid color   Test Images: 0.5 opacity with 'x' pattern
"""

In [None]:
# key settings: stacked, totals displayed
fig = plot_emotion_counts(
    dataframe=gby, 
    x_axis='emotion', 
    y_axis='Count',
    color_by='usage', 
    plot_title='Emotion Counts by Usage (Train/Test)', 
    output_path=os.path.join(image_dir, 'emotion_usage_count_bar.png'),
    is_stacked=True,
    styling_dict=style_dict,
    legend_note=annotation_text,
    auto_text=True,
    order_categories=filtered_order    
)
fig.show()

## Waffle Graph

This type of visual is helpful for classification proportion assessment.

Data will be reduced to the specified number of points and displayed by color.

In [None]:
waffle_path = os.path.join(image_dir, 'waffle_side_by_side.png')

In [None]:
plot_emotion_waffle(
        dataframe=gby,
        count_column='Count', 
        group_column='emotion', 
        split_column='usage',
        color_column='color',
        rows=20, 
        columns=20,
        output_path=waffle_path)