# Aequitas Introduction

Practitioners face the challenge of determining whether or not such patterns reflect bias or not. The fact that we have multiple ways to measure bias adds complexity to the decisionmaking process. With Aequitas, we provide a tool that automates the reporting of various fairnes metrics to aid in this process.

In this introduction, we provide a highlevel overview of the bias reports output by Aequitas. 

In [4]:
import pandas as pd
import seaborn as sns
%matplotlib inline

In [5]:
def color_tf(val):
    """
    Takes a scalar and returns a string with
    the css property `'color: red'` for negative
    strings, black otherwise.
    """
    if val == True:
        color = 'green'
    elif val == False:
        color = 'red'
    else:
        color = ''
    return 'color: %s' % color

In [6]:
df = pd.read_csv('../data/aequinas_test_20180223.csv')

### Input data

You input a csv using the command-line function `aequitas_audit`. The csv has columns for `entity_id`, `score`, and `label_value` as well as group attributes against which to test for disparities. In this cases we include `race`, `sex` and `age_cat`. 

In [7]:
df[['model_id', 'entity_id', 'score', 'label_value','race', 'gender', 'age']].head()

Unnamed: 0,model_id,entity_id,score,label_value,race,gender,age
0,1,1,1,1,W,M,3
1,1,2,1,0,B,M,3
2,1,3,1,1,B,F,2
3,1,4,1,1,W,M,1
4,1,5,1,0,B,M,4


### Output
`aequitas_audit` returns multiple levels of analysis. We will start looking at supervised fairness at the group level.

#### Group-level 'supervised fairness'

Supervised fairness is comprised of Type I and Type II errors familar from statistics. In machine learning, these are captured by the False Discovery Rate or False Positive Rate (Type I) and False Ommision Rate or False Negative Rate (Type II). 

A model is considered to display "supervised fairness" if disparity ratios of the above statistics fall within an acceptable range. 

$$ Disparity =  \frac{metric_{group}}{metric_{base group}} $$
For details [link to more documentation]. 

In [8]:
df.head()

Unnamed: 0,model_id,entity_id,score,rank_abs,rank_pct,label_value,race,gender,age
0,1,1,1,1,0.05,1,W,M,3
1,1,2,1,2,0.1,0,B,M,3
2,1,3,1,3,0.15,1,B,F,2
3,1,4,1,4,0.2,1,W,M,1
4,1,5,1,5,0.25,0,B,M,4


In [9]:
df = pd.read_csv('../data/compas_group_value_fairness.csv')
df[['group_value', 'group_variable', 'Supervised Fairness']].style.applymap(color_tf)

Unnamed: 0,group_value,group_variable,Supervised Fairness
0,African-American,race,False
1,Asian,race,False
2,Caucasian,race,True
3,Hispanic,race,True
4,Native American,race,False
5,Other,race,False
6,Female,sex,False
7,Male,sex,True
8,25 - 45,age_cat,True
9,Greater than 45,age_cat,False


In this case, our base groups are Caucasian for race, Male for gender, and 25-45 for age_cat. By construction, the base group has supervised fairness. (The disparity ratio is 1). Relative to the base groups, the COMPAS predictions only provide supervised fairness to one group, Hispanic.

To understand these results, we will look at the underlying metrics. 

In [12]:
race_df = df[['group_value', 'group_variable', 'FOmR', 'FDR', 'FPR', 'FNR']]\
        .query("group_variable == 'race'")
race_df.loc[df.index.max() + 1] = ["normalizer","normalizer", 1, 1,1,1]
race_df = race_df.style.set_properties(**{'border-style':'solid', 'border-color': 'white'})\
        .bar(subset=['FDR', 'FOmR', 'FPR', 'FNR'], align='mid', color='#4682B4')
    
race_df

# As far as I can tell, there is not a built in way to have normalized bar lengths without 
# the inclusion of a row of 1s.
# e.g.
# race_df.data.drop(axis=0, index=11, inplace=True)

# pandas.io.formats.style.Styler

Unnamed: 0,group_value,group_variable,FOmR,FDR,FPR,FNR
0,African-American,race,0.34954,0.370285,0.448468,0.279853
1,Asian,race,0.125,0.25,0.0869565,0.333333
2,Caucasian,race,0.288125,0.408665,0.234543,0.477226
3,Hispanic,race,0.288591,0.457895,0.214815,0.556034
4,Native American,race,0.166667,0.25,0.375,0.1
5,Other,race,0.302013,0.455696,0.147541,0.676692
11,normalizer,normalizer,1.0,1.0,1.0,1.0


By default Aequitas says a metric is fair for a group if the group rate is within 25% of the base group. Above, the African-American false omission and false discovery are within the bounds of fairness. This result is expected because COMPAS is calibrated. (Given calibration, it is surprising that Asian and Native American rates are so low. This may be a matter of having few observations for these groups.)

On the other hand, African-Americans are roughly twice as likely to have false positives and 40 percent less likely to false negatives. In real terms, 44.8% of African-Americans who did not recidivate were marked high or medium risk (with potential for associated penalties), compared with 23.4% of Caucasian non-reoffenders. This is unfair and is marked False below.

In [7]:
df[['group_value', 'group_variable', 'FOmR Parity','FDR Parity',  'FPR Parity',
       'FNR Parity', 'Supervised Fairness']].query("group_variable == 'race'").style.applymap(color_tf)

Unnamed: 0,group_value,group_variable,FOmR Parity,FDR Parity,FPR Parity,FNR Parity,Supervised Fairness
0,African-American,race,True,True,False,False,False
1,Asian,race,False,False,False,False,False
2,Caucasian,race,True,True,True,True,True
3,Hispanic,race,True,True,True,True,True
4,Native American,race,False,False,False,False,False
5,Other,race,True,True,False,False,False


When comparing metrics by sex, COMPAS data fails fairness due to FOmR and FDR for female defenants on account of false omission and false discovery. 

In [8]:
sex_df = df[['group_value', 'group_variable', 'FOmR', 'FDR', 'FPR', 'FNR']]\
        .query("group_variable == 'sex'")#, \
sex_df.loc[df.index.max() + 1] = ["normalizer","normalizer", 1, 1,1,1]
sex_df = sex_df.style.set_properties(**{'border-style':'solid', 'border-color': 'white'})\
        .bar(subset=['FDR', 'FOmR', 'FPR', 'FNR'], align='mid', color='#4682B4')
    
sex_df

Unnamed: 0,group_value,group_variable,FOmR,FDR,FPR,FNR
6,Female,sex,0.242537,0.48731,0.32107,0.391566
7,Male,sex,0.3301,0.364637,0.324201,0.370868
11,normalizer,normalizer,1.0,1.0,1.0,1.0


In [9]:
df[['group_value', 'group_variable', 'FOmR Parity','FDR Parity',  'FPR Parity',
       'FNR Parity', 'Supervised Fairness']].query("group_variable == 'sex'").style.applymap(color_tf)

Unnamed: 0,group_value,group_variable,FOmR Parity,FDR Parity,FPR Parity,FNR Parity,Supervised Fairness
6,Female,sex,False,False,True,True,False
7,Male,sex,True,True,True,True,True


#### Group-level unsupervised fairness

TEXT TO FOLLOW

In [30]:
df_uf = df[['group_value', 'group_variable','PPR', 'PPrev','Statistical Parity','Impact Parity','Unsupervised Fairness']]\
        
df_uf.loc[df.index.max() + 1] = ["normalizer","normalizer", 1, 1, "","",""]

df_uf.style.applymap(color_tf)\
     .set_properties(**{'border-style':'solid', 'border-color': 'white', 'white-space': 'nowrap'})\
     .bar(subset=['PPR', 'PPrev'], align='mid', color='#4682B4')
    

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,group_value,group_variable,PPR,PPrev,Statistical Parity,Impact Parity,Unsupervised Fairness
0,African-American,race,0.655412,0.588203,False,False,False
1,Asian,race,0.00241182,0.25,False,False,False
2,Caucasian,race,0.257462,0.348003,True,True,True
3,Hispanic,race,0.0572807,0.298273,False,True,False
4,Native American,race,0.00361773,0.666667,False,False,False
5,Other,race,0.0238167,0.209549,False,False,False
6,Female,sex,0.178173,0.423656,False,True,False
7,Male,sex,0.821827,0.468465,True,True,True
8,25 - 45,age_cat,0.580042,0.46824,True,True,True
9,Greater than 45,age_cat,0.118782,0.25,False,False,False


### Group Variable Fairness

When there are many groups it is useful to have a broader overview of fairness. Aequitas summarizes parity and fairness for "group variables" (e.g. race, sex, age category). Similar to above, we report True if all parity measures of the subgroups are True. For example, below we see that Impact Parity is True for sex, this implies Impact Parity is fair for male and female (which can be verified above). 

In [26]:
group_var_df = pd.read_csv('../data/compas_group_variable_fairness.csv',index_col=0)
display(group_var_df.style.applymap(color_tf))

Unnamed: 0,model_id,parameter,group_variable,Impact Parity,FDR Parity,FPR Parity,FOmR Parity,FNR Parity,TypeI Parity,TypeII Parity,Unsupervised Fairness,Supervised Fairness
0,1,3317_abs,age_cat,False,True,False,False,False,False,False,False,False
1,1,3317_abs,race,False,False,False,False,False,False,False,False,False
2,1,3317_abs,sex,True,False,True,False,True,False,False,False,False


SHOULD WE SAY SOMETHING LIKE ...
The more subgroups under a "group variable" the more likely a parity measure will be false. As we saw above, despite calibration, . The user should consider how sample size impacts their analysis and consider adjusting data accordingly.