# UTKFace Image Dataset Analysis

-----

## Statistical Significance

In our analysis, we have contextualised all sample images by age ranges, binary gender norms, and given race categories. We attempted to achieve statistical significance for all combinations of said categories by evaluating the central tendency of the inferences for the associated results. Specifically, we implemented D'Agostino’s (1986) normality test on the age estimations associated with each of the combinations, and disregard instances where the null hypothesis could not be rejected for a confidence interval of 95%; the details are here conveyed. All instances that fail to achieve central tendency for this test are disregarded.

In [17]:
import json
import numpy as np
import pandas as pd
from PIL import Image
import seaborn as sns
import matplotlib.pyplot as plt
plt.ioff()
%matplotlib inline
from analysis_3_filter_visualise import *

# Load in the results generated by our analysis for the UTKFace dataset 
utkface_data = json.loads(open("metadataset_utkface_activations.json").read())

# This function generates and describes the statistical significance for a given age range
def describe_statistical_significance(this_age_range):
    confidence_level = 0.95
    csv_filename = f"{this_age_range[0]}_{this_age_range[1]}_statistical_significance.csv"
    # Generate significance statistics for `0 - 12' year old age range
    statistic_details_for_age_range(utkface_data, this_age_range, (1 - confidence_level)).to_csv(csv_filename)
    # Describe the results
    return pd.read_csv(csv_filename).iloc[: , 1:]


In [18]:
# Age range of `0 - 12' years old
describe_statistical_significance(AGE_RANGES[0])

Unnamed: 0,age_min,age_max,race,gender,mean,standard_deviation,normality_test_p_value,normality_test_statistic,n_participants,kurtosis_test_n_greater_than_or_equal_to_20,normality_test_passed,skew_test_samples_greater_than_or_equal_to_8,k_test_statistic,k_test_p
0,0,12,Caucasian,Male,16.421141,9.20181,6.884956e-46,207.979151,596,True,True,True,3073.190457,0.0
1,0,12,Asian,Male,18.261224,9.11811,1.962729e-26,118.385753,490,True,True,True,2230.877962,5.978656e-220
2,0,12,African,Male,19.089286,11.943732,0.001309472,13.276263,56,True,True,True,418.48363,2.3109949999999998e-57
3,0,12,Indian,Male,16.196347,9.677578,7.610336e-18,78.834049,219,True,True,True,1266.369326,3.432632e-147
4,0,12,Other,Male,19.337931,10.659767,4.070652e-06,24.823415,145,True,True,True,852.027104,6.652727e-101
5,0,12,Caucasian,Female,13.27454,8.17484,4.514974999999999e-78,356.188475,652,True,True,True,3282.362912,0.0
6,0,12,Asian,Female,15.954338,9.237244,1.539899e-36,164.922693,438,True,True,True,2342.503148,2.218738e-257
7,0,12,African,Female,15.338028,9.860925,1.114959e-07,32.018556,71,True,True,True,450.115702,6.873597e-57
8,0,12,Indian,Female,13.940741,6.826162,5.5342949999999994e-36,162.364198,270,True,True,True,902.466525,2.044862e-69
9,0,12,Other,Female,16.956863,10.527001,8.044615e-16,69.512717,255,True,True,True,1666.494912,6.897608e-206


In [19]:
# Age range of `13 - 18' years old
describe_statistical_significance(AGE_RANGES[1])

Unnamed: 0,age_min,age_max,race,gender,mean,standard_deviation,normality_test_p_value,normality_test_statistic,n_participants,kurtosis_test_n_greater_than_or_equal_to_20,normality_test_passed,skew_test_samples_greater_than_or_equal_to_8,k_test_statistic,k_test_p
0,13,18,Caucasian,Male,20.145594,9.860029,4.099221e-14,61.650789,261,True,True,True,1259.554203,9.748909e-131
1,13,18,Asian,Male,23.263158,10.222998,0.1638143,3.618043,19,False,False,True,85.357466,9.738341e-11
2,13,18,African,Male,16.7,5.1,0.5775812,1.097812,10,False,False,True,15.57485,0.07630838
3,13,18,Indian,Male,19.823529,4.488355,0.7806819,0.495175,34,True,False,True,34.551929,0.3935891
4,13,18,Other,Male,27.787879,10.764019,1.497289e-06,26.823708,33,True,True,True,137.59651,4.711097e-15
5,13,18,Caucasian,Female,18.33123,7.947823,5.236566e-24,107.212753,317,True,True,True,1092.355533,4.334189e-86
6,13,18,Asian,Female,18.363636,7.842457,9.436909e-05,18.536594,33,True,True,True,110.524752,1.424204e-10
7,13,18,African,Female,22.65,10.978502,0.2265364,2.969699,20,True,False,True,106.426049,3.616165e-14
8,13,18,Indian,Female,23.041667,8.955814,0.05052273,5.970664,48,True,False,True,167.084991,2.319378e-15
9,13,18,Other,Female,25.085714,11.234822,6.866226e-06,23.777792,70,True,True,True,352.211845,1.3794390000000001e-39


In [20]:
# Age range of `19 - 25' years old
describe_statistical_significance(AGE_RANGES[2])

Unnamed: 0,age_min,age_max,race,gender,mean,standard_deviation,normality_test_p_value,normality_test_statistic,n_participants,kurtosis_test_n_greater_than_or_equal_to_20,normality_test_passed,skew_test_samples_greater_than_or_equal_to_8,k_test_statistic,k_test_p
0,19,25,Caucasian,Male,26.452055,7.465062,0.01536667,8.351109,73,True,True,True,153.790782,7.12582e-08
1,19,25,Asian,Male,25.827586,7.367538,0.3578508,2.055279,29,True,False,True,60.947931,0.0003077492
2,19,25,African,Male,31.2,8.340264,0.4043407,1.810995,10,False,False,True,22.294872,0.007989972
3,19,25,Indian,Male,28.783784,6.160621,0.001602447,12.872447,37,True,True,True,48.786854,0.07566282
4,19,25,Other,Male,28.241379,8.546648,0.339299,2.161747,58,True,False,True,150.014652,2.659834e-10
5,19,25,Caucasian,Female,26.16568,8.517772,3.243211e-05,20.672723,169,True,True,True,468.604704,3.3692149999999996e-30
6,19,25,Asian,Female,24.704762,8.061577,0.003314348,11.418989,105,True,True,True,276.215883,1.507787e-17
7,19,25,African,Female,32.266667,8.598191,0.1027109,4.551673,15,False,False,True,34.367769,0.001821845
8,19,25,Indian,Female,25.314433,6.658,1.269124e-15,68.600899,194,True,True,True,339.720016,3.560324e-10
9,19,25,Other,Female,25.705882,7.127121,5.776758e-07,28.728506,170,True,True,True,335.926773,3.944214e-13


In [21]:
# Age range of `26 - 100' years old
describe_statistical_significance(AGE_RANGES[3])

Unnamed: 0,age_min,age_max,race,gender,mean,standard_deviation,normality_test_p_value,normality_test_statistic,n_participants,kurtosis_test_n_greater_than_or_equal_to_20,normality_test_passed,skew_test_samples_greater_than_or_equal_to_8,k_test_statistic,k_test_p
0,26,100,Caucasian,Male,49.932969,13.211823,7.954385e-18,78.745617,1462,True,True,True,5110.760253,0.0
1,26,100,Asian,Male,43.536313,12.581469,3.347995e-05,20.609127,179,True,True,True,650.824843,4.338521e-55
2,26,100,African,Male,46.604839,12.393949,0.01474482,8.433728,124,True,True,True,408.705139,2.3338440000000001e-32
3,26,100,Indian,Male,41.853448,12.512583,0.0001123381,18.187995,232,True,True,True,867.861998,1.690717e-74
4,26,100,Other,Male,39.17619,8.911331,0.001285709,13.31289,210,True,True,True,425.678984,6.313727e-17
5,26,100,Caucasian,Female,49.361963,14.628748,3.316732e-09,39.048572,1630,True,True,True,7066.583644,0.0
6,26,100,Asian,Female,40.152174,16.861027,2.307066e-05,21.353898,230,True,True,True,1628.496481,2.72875e-209
7,26,100,African,Female,43.87234,13.13371,0.0003251168,16.062652,94,True,True,True,369.582929,1.2412569999999999e-34
8,26,100,Indian,Female,34.108586,12.722795,2.463334e-18,81.090032,396,True,True,True,1879.301918,2.247019e-191
9,26,100,Other,Female,34.794521,10.34535,2.123751e-06,26.124654,146,True,True,True,449.088976,8.077914e-33


## Notes

As a further step of statistical analysis that could improve the validity of the results, we can undertake bootstrapping to remove doubts of bias in sequencing.

## References

* D’Agostino RB, Stephens MA. MA (1986), _Goodness-of-Fit Techniques._