In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from aequitas.group import Group
from aequitas.bias import Bias
from aequitas.preprocessing import preprocess_input_df

  return f(*args, **kwds)


In [3]:
df = pd.read_csv("../../../examples/data/compas_for_aequitas.csv")
df.head()

Unnamed: 0,entity_id,score,label_value,race,sex,age_cat
0,1,0.0,0,Other,Male,Greater than 45
1,3,0.0,1,African-American,Male,25 - 45
2,4,0.0,1,African-American,Male,Less than 25
3,5,1.0,0,African-American,Male,Less than 25
4,6,0.0,0,Other,Male,25 - 45


In [81]:
df.shape

(7214, 6)

## Pre-Aequitas: Exploring the COMPAS Dataset

__Risk assessment by race__

COMPAS produces a risk score that predicts a person's likelihood of commiting a crime in the next two years. The output is a score between 1 and 10 that maps to low, medium or high. For Aequitas, we collapse this to a binary prediction. A score of 0 indicates a prediction of "low" risk according to COMPAS, while a 1 indicates "high" or "medium" risk.

This categorization is based on ProPublica's interpretation of Northpointe's practioner guide:

    "According to Northpointe’s practitioners guide, COMPAS “scores in the medium and high range garner more interest from supervision agencies than low scores, as a low score would suggest there is little risk of general recidivism,” so we considered scores any higher than “low” to indicate a risk of recidivism."

In the bar charts below, we see a large difference in how these scores are distributed by race, with a majority of white and Hispanic people predicted as low risk (score = 0) and a majority of black people predicted high and medium risk (score = 1). We also see that while the majority of people in age categories over 25 are predicted as low risk (score = 0), the majority of people below 25 are predicted as high and medium risk (score = 1).

### Data Formatting

Data for this example was preprocessed for compatibility with Aequitas. **The Aequitas tool always requires a `score` column and requires a binary `label_value` column for supervised metrics**, (i.e., False Discovery Rate, False Positive Rate, False Omission Rate, and False Negative Rate).

Preprocessing includes but is not limited to checking for mandatory `score` and `label_value` columns as well as at least one column representing attributes specific to the data set. See [documentation](../input_data.html) for more information about input data.

Note that while `entity_id` is not necessary for this example, Aequitas recognizes `entity_id` as a reserve column name and will not recognize it as an attribute column.

[Back to Top](#top_cell)
<a id='existing_biases'></a>
## What biases exist in my model?
### Aequitas Group() Class

<a id='xtab'></a>
### What is the distribution of groups, predicted scores, and labels across my dataset?

Aequitas's `Group()` class enables researchers to evaluate biases across all subgroups in their dataset by assembling a confusion matrix of each subgroup, calculating commonly used metrics such as false positive rate and false omission rate, as well as counts by group and group prevelance among the sample population. 

The **`get_crosstabs()`** command tabulates a confusion matrix for each subgroup and calculates commonly used metrics such as false positive rate and false omission rate. It also provides counts by group and group prevelances.

#### Group Counts Calculated:

| Count Type | Column Name |
| --- | --- |
| False Positive Count | 'fp' |
| False Negative Count | 'fn' |
| True Negative Count | 'tn' |
| True Positive Count | 'tp' |
| Predicted Positive Count | 'pp' |
| Predicted Negative Count | 'pn' |
| Count of Negative Labels in Group | 'group_label_neg' |
| Count of Positive Labels in Group | 'group_label_pos' | 
| Group Size | 'group_size'|
| Total Entities | 'total_entities' |

#### Absolute Metrics Calcuated:

| Metric | Column Name |
| --- | --- |
| True Positive Rate | 'tpr' |
| True Negative Rate | 'tnr' |
| False Omission Rate | 'for' |
| False Discovery Rate | 'fdr' |
| False Positive Rate | 'fpr' |
| False Negative Rate | 'fnr' |
| Negative Predictive Value | 'npv' |
| Precision | 'precision' |
| Predicted Positive Ratio$_k$ | 'ppr' |
| Predicted Positive Ratio$_g$ | 'pprev' |
| Group Prevalence | 'prev' |


**Note**: The **`get_crosstabs()`** method expects a dataframe with predefined columns `score`, and `label_value` and treats other columns (with a few exceptions) as attributes against which to test for disparities. In this cases we include `race`, `sex` and `age_cat`. 

In [6]:
g = Group()
xtab, _, score_thresholds_dict = g.get_crosstabs(df)

model_id, score_thresholds 1 {'rank_abs': [3317]}
COUNTS::: race
African-American    3696
Asian                 32
Caucasian           2454
Hispanic             637
Native American       18
Other                377
dtype: int64
COUNTS::: sex
Female    1395
Male      5819
dtype: int64
COUNTS::: age_cat
25 - 45            4109
Greater than 45    1576
Less than 25       1529
dtype: int64


In [83]:
preprocessed_df, _ = preprocess_input_df(df, required_cols=['score', 'label_value'])

In [84]:
preprocessed_df.head()

Unnamed: 0,entity_id,score,label_value,race,sex,age_cat
0,1,0.0,0,Other,Male,Greater than 45
1,3,0.0,1,African-American,Male,25 - 45
2,4,0.0,1,African-American,Male,Less than 25
3,5,1.0,0,African-American,Male,Less than 25
4,6,0.0,0,Other,Male,25 - 45


In [107]:
non_attr_cols = [
                'id', 'model_id', 'entity_id', 'score', 'label_value',
                'rank_abs', 'rank_pct']
# index of the columns that are attributes
attr_cols = df.columns[~df.columns.isin(non_attr_cols)]

In [108]:
score_thresholds_dict = None
count_ones = None
if not score_thresholds_dict:
    df['score'] = df['score'].astype(float)
    count_ones = preprocessed_df['score'].value_counts().get(1.0, 0)
    score_thresholds_dict = {'rank_abs': [count_ones]}

    df = df.sort_values('score', ascending=False)
    df['rank_abs'] = range(1, len(df) + 1)
    df['rank_pct'] = df['rank_abs'] / len(df)

In [109]:
df.head()

Unnamed: 0,entity_id,score,label_value,race,sex,age_cat,rank_abs,rank_pct
3607,5511,1.0,0,Caucasian,Female,25 - 45,1,0.000139
5638,8598,1.0,1,African-American,Male,Less than 25,2,0.000277
5661,8630,1.0,1,African-American,Male,25 - 45,3,0.000416
2852,4365,1.0,0,Caucasian,Male,25 - 45,4,0.000554
2853,4367,1.0,1,African-American,Female,25 - 45,5,0.000693


In [168]:
binary_true_pos = lambda rank_col, label_col, thres: lambda x: ((x[rank_col] < thres) & (x[label_col] == 1)).astype(int)

binary_false_pos = lambda rank_col, label_col, thres: lambda x: ((x[rank_col] < thres) & (x[label_col] == 0)).astype(int)

binary_true_neg = lambda rank_col, label_col, thres: lambda x: ((x[rank_col] > thres) & (x[label_col] == 0)).astype(int)

binary_false_neg = lambda rank_col, label_col, thres: lambda x: ((x[rank_col] > thres) & (x[label_col] == 1)).astype(int)



In [169]:
# true_pos_count = lambda rank_col, label_col, thres, k: lambda x: \
#     ((x[rank_col] <= thres) & (x[label_col] == 1)).sum()

In [170]:
binary_col_functions = {'b_tp': binary_true_pos,
                        'b_fp': binary_false_pos,
                        'b_tn': binary_true_neg,
                        'b_fn': binary_false_neg
                        }

In [171]:
score_thresholds_dict

{'rank_abs': [3317]}

In [172]:
attr_cols

Index(['race', 'sex', 'age_cat'], dtype='object')

In [173]:
for col in attr_cols:
    # find the priors_df
    col_group = df.fillna({col: 'pd.np.nan'}).groupby(col)
    for thres_unit, thres_values in score_thresholds_dict.items():
        for thres_val in thres_values:
            for name, func in binary_col_functions.items():
                func = func(thres_unit, 'label_value', thres_val)
                df[name] = col_group.apply(func).reset_index(level=0, drop=True)

In [7]:
g.get_statistical_significance(df, attr_cols=["sex"])

['sex']


Unnamed: 0,entity_id,score,label_value,race,sex,age_cat,rank_abs,rank_pct,b_tp,b_fp,b_tn,b_fn
3607,5511,1.0,0,Caucasian,Female,25 - 45,1,0.000139,0,1,0,0
5638,8598,1.0,1,African-American,Male,Less than 25,2,0.000277,1,0,0,0
5661,8630,1.0,1,African-American,Male,25 - 45,3,0.000416,1,0,0,0
2852,4365,1.0,0,Caucasian,Male,25 - 45,4,0.000554,0,1,0,0
2853,4367,1.0,1,African-American,Female,25 - 45,5,0.000693,1,0,0,0
5658,8626,1.0,1,Caucasian,Male,25 - 45,6,0.000832,1,0,0,0
2855,4370,1.0,0,African-American,Male,25 - 45,7,0.000970,0,1,0,0
2857,4372,1.0,1,African-American,Male,Less than 25,8,0.001109,1,0,0,0
2858,4375,1.0,1,Caucasian,Male,25 - 45,9,0.001248,1,0,0,0
2859,4376,1.0,0,Other,Female,Less than 25,10,0.001386,0,1,0,0


In [None]:
df

[Back to Top](#top_cell)
<a id='xtab_metrics'></a>
### What are bias metrics across groups?

Once you have run the `Group()` class, you'll have a dataframe of the group counts and group value bias metrics.

The `Group()` class has a **`list_absolute_metrics()`** method, which you can use for faster slicing to view just  counts or bias metrics.

In [5]:
absolute_metrics = g.list_absolute_metrics(xtab)

#### View calculated counts across sample population groups

In [6]:
xtab[[col for col in xtab.columns if col not in absolute_metrics]]

Unnamed: 0,attribute_name,attribute_value,k,model_id,score_threshold,pp,pn,fp,fn,tn,tp,group_label_neg,group_label_pos,group_size,total_entities
0,race,African-American,3317,1,binary 0/1,2174,1522,805,532,990,1369,1795,1901,3696,7214
1,race,Asian,3317,1,binary 0/1,8,24,2,3,21,6,23,9,32,7214
2,race,Caucasian,3317,1,binary 0/1,854,1600,349,461,1139,505,1488,966,2454,7214
3,race,Hispanic,3317,1,binary 0/1,190,447,87,129,318,103,405,232,637,7214
4,race,Native American,3317,1,binary 0/1,12,6,3,1,5,9,8,10,18,7214
5,race,Other,3317,1,binary 0/1,79,298,36,90,208,43,244,133,377,7214
6,sex,Female,3317,1,binary 0/1,591,804,288,195,609,303,897,498,1395,7214
7,sex,Male,3317,1,binary 0/1,2726,3093,994,1021,2072,1732,3066,2753,5819,7214
8,age_cat,25 - 45,3317,1,binary 0/1,1924,2185,741,706,1479,1183,2220,1889,4109,7214
9,age_cat,Greater than 45,3317,1,binary 0/1,394,1182,181,285,897,213,1078,498,1576,7214


#### View calculated absolute metrics for each sample population group

In [7]:
xtab[['attribute_name', 'attribute_value'] + absolute_metrics].round(2)

Unnamed: 0,attribute_name,attribute_value,tpr,tnr,for,fdr,fpr,fnr,npv,precision,ppr,pprev,prev
0,race,African-American,0.72,0.55,0.35,0.37,0.45,0.28,0.65,0.63,0.66,0.59,0.51
1,race,Asian,0.67,0.91,0.12,0.25,0.09,0.33,0.88,0.75,0.0,0.25,0.28
2,race,Caucasian,0.52,0.77,0.29,0.41,0.23,0.48,0.71,0.59,0.26,0.35,0.39
3,race,Hispanic,0.44,0.79,0.29,0.46,0.21,0.56,0.71,0.54,0.06,0.3,0.36
4,race,Native American,0.9,0.62,0.17,0.25,0.38,0.1,0.83,0.75,0.0,0.67,0.56
5,race,Other,0.32,0.85,0.3,0.46,0.15,0.68,0.7,0.54,0.02,0.21,0.35
6,sex,Female,0.61,0.68,0.24,0.49,0.32,0.39,0.76,0.51,0.18,0.42,0.36
7,sex,Male,0.63,0.68,0.33,0.36,0.32,0.37,0.67,0.64,0.82,0.47,0.47
8,age_cat,25 - 45,0.63,0.67,0.32,0.39,0.33,0.37,0.68,0.61,0.58,0.47,0.46
9,age_cat,Greater than 45,0.43,0.83,0.24,0.46,0.17,0.57,0.76,0.54,0.12,0.25,0.32
