# UTKFace Image Dataset Analysis

-----

## Pearson Correlations Of Contributing Image Segments To Classifications

Here we detail the relationship between the image segments that contribute to age estimations, and the varying combinations of racial categories and age ranges. This is achieved by considering the percentage that each image segment contributes to the inference of a subject’s age. We aggregate the results for the entire sample and group them by racial categories and age ranges.

Thereafter, we implement the Pearson correlation coefficient on all image segments of all subjects within their respective groups to determine how (anti)correlated said segments are to the determination of a subject’s age. We detail the correlations, and specifically filter for age estimations that returned accurate results.

In [1]:
import json
import numpy as np
import pandas as pd
from PIL import Image
import seaborn as sns
import matplotlib.pyplot as plt
plt.ioff()
%matplotlib inline
from analysis_3_filter_visualise import *

# Load in the results for the UTKFace image activations dataset.
image_masks_fpath = "utkface/img-masks/"
utkface_data = json.loads(open("metadataset_utkface_activations.json").read())
utkface_data_face_vs_outer, utkface_data_contribution_pct = generate_processed_datasets(utkface_data)

# Generate the correlations
correlation_list = list()
for this_age_range in [[0,12], [13,18], [19,25], [26,100]]:
    for this_type in ["positive", "negative"]:
        correlation_list.extend(visualise_heatmap_corr_pairplot(filter_dataset(utkface_data_face_vs_outer[this_type], 
            KEYS_RACE_AGE_BOUNDARIES, "race_real", RACES, age_min=this_age_range[0], age_max=this_age_range[1]), image_masks_fpath, 
                                        x_vars=KEYS_BOUNDARIES, pairplot_scatter_alpha=0.025, age_range="%s - %s" % (this_age_range[0], this_age_range[1]), this_type=this_type))

# Produce the dataframe
pd.DataFrame(correlation_list)



Unnamed: 0,face,outer_boundary_left,outer_boundary_right,outer_boundary_top,outer_boundary_bottom,race,age_range,classification_type
0,0.016513,0.005753,-0.012306,-0.079399,0.080752,Caucasian,0 - 12,positive
1,0.102614,0.063821,-0.062802,-0.058457,-0.022206,Asian,0 - 12,positive
2,-0.036242,-0.146797,0.041303,-0.092133,0.075251,African,0 - 12,positive
3,0.046936,0.050545,-0.08047,-0.112087,0.095438,Indian,0 - 12,positive
4,0.185341,-0.053459,-0.069378,-0.10996,0.011903,Other,0 - 12,positive
5,-0.084793,0.004101,0.050588,0.026588,0.043224,Caucasian,0 - 12,negative
6,-0.043408,-0.075418,0.02656,0.059268,0.037837,Asian,0 - 12,negative
7,-0.071872,-0.093589,0.036649,-0.021509,0.05408,African,0 - 12,negative
8,-0.111061,0.052495,0.174075,-0.018333,-0.017884,Indian,0 - 12,negative
9,-0.012457,0.011175,-0.012707,-0.004625,0.026115,Other,0 - 12,negative


## Notes

Beyond the correlations given here, we isolate results that contribute to Appendix B of our paper in the file Excel Worksheet entitled "Correlations_Image_Segment_Activations.xlsx"