## COMPAS Analysis using Aequitas
<a id='top_cell'></a>
Recent work in the Machine Learning community has raised concerns about the risk of unintended bias in Algorithmic Decision Making systems, affecting individuals unfairly. While many bias metrics and fairness definitions have been proposed in recent years, the community has not reached a consensus on which definitions and metrics should be used, and there has been very little empirical analyses of real-world problems using the proposed metrics. 

We present the Aequitas toolkit as an intuitive addition to the machine learning workflow, enabling users to to seamlessly test models for several bias and fairness metrics in relation to multiple population groups. We believe the tool will faciliate informed and equitable decision-making around developing and deploying predictive risk-assessment tools for both machine learnining practitioners and policymakers, allowing researchers and program managers to answer a host of questions related to machine learning models, including:

- [What biases exist in my model?](#existing_biases)
    - [What is the distribution of groups, predicted scores, and labels across my dataset?](#xtab)
    - [What are bias metrics across groups?](#xtab_metrics)
    - [How do I interpret biases in my model?](#interpret_bias)
    - [How do I visualize biases in my model?](#bias_viz)

- [What levels of disparity are there between population groups?](#disparities)
    - [How does the selected reference group affect disparity calculations?](#disparity_calc)
    - [How do I interpret calculated disparity ratios?](#interpret_disp)
    - [How do I visualize disparities in my model?](#disparity_viz) 

- [How do I assess model fairness??](#fairness)
    - [How do I interpret parities?](#interpret_fairness)
    - [How do I visualize bias metric parity?](#fairness_group_viz)
    - [How do I visualize fairness between groups in my model?](#fairness_disp_viz) 


We apply the toolkit to the COMPAS dataset reported on by ProPublica below.

### Background

In 2016, ProPublica reported on racial inequality in automated criminal risk assessment algorithms. The [report](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing) is based on [this analysis](https://github.com/propublica/compas-analysis). Using a clean version of the COMPAS dataset from the ProPublica GitHub repo, we demostrate the use of the Aequitas bias reporting tool.

Northpointe's COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is one of the widest used risk assessment tools, algorithms which are used in the criminal justice system to guide decisions such as how to set bail. The ProPublica dataset represents two years of COMPAS predicitons from Broward County, FL.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from aequitas.group import Group
from aequitas.bias import Bias
from aequitas.fairness import Fairness
from aequitas.plotting import Plotting


%matplotlib inline

In [None]:
df = pd.read_csv("../../../examples/data/compas_for_aequitas.csv")
df.head()

In [None]:
df.shape

## Pre-Aequitas: Exploring the COMPAS Dataset

__Risk assessment by race__

COMPAS produces a risk score that predicts a person's likelihood of commiting a crime in the next two years. The output is a score between 1 and 10 that maps to low, medium or high. For Aequitas, we collapse this to a binary prediction. A score of 0 indicates a prediction of "low" risk according to COMPAS, while a 1 indicates "high" or "medium" risk.

This categorization is based on ProPublica's interpretation of Northpointe's practioner guide:

    "According to Northpointe’s practitioners guide, COMPAS “scores in the medium and high range garner more interest from supervision agencies than low scores, as a low score would suggest there is little risk of general recidivism,” so we considered scores any higher than “low” to indicate a risk of recidivism."

In the bar charts below, we see a large difference in how these scores are distributed by race, with a majority of white and Hispanic people predicted as low risk (score = 0) and a majority of black people predicted high and medium risk (score = 1). We also see that while the majority of people in age categories over 25 are predicted as low risk (score = 0), the majority of people below 25 are predicted as high and medium risk (score = 1).

In [None]:
aq_palette = sns.diverging_palette(215, 35, n=2)

In [None]:
by_race = sns.countplot(x="race", hue="score", data=df[df.race.isin(['African-American', 'Caucasian', 'Hispanic'])], palette=aq_palette)

In [None]:
by_sex = sns.countplot(x="sex", hue="score", data=df, palette=aq_palette)

In [None]:
by_age = sns.countplot(x="age_cat", hue="score", data=df, palette=aq_palette)

__Levels of recidivism__

This dataset includes information about whether or not the subject recidivated, and so we can directly test the accuracy of the predictions. First, we visualize the recidivsm rates across race. 

Following ProPublica, we defined recidivism as a new arrest within two years. (If a person recidivates, `label_value` = 1). They "based this decision on Northpointe’s practitioners guide, which says that its recidivism score is meant to predict 'a new misdemeanor or felony offense within two years of the COMPAS administration date.'"




In [None]:
label_by_race = sns.countplot(x="race", hue="label_value", data=df[df.race.isin(['African-American', 'Caucasian', 'Hispanic'])], palette=aq_palette)

In [None]:
label_by_age = sns.countplot(x="sex", hue="label_value", data=df, palette=aq_palette)

In [None]:
label_by_sex = sns.countplot(x="age_cat", hue="label_value", data=df, palette=aq_palette)

## Putting Aequitas to the task

The graphs above show the base rates for recidivism are higher for black defendants compared to white defendants (.51 vs .39), though the predictions do not match the base rates. 

Practitioners face the challenge of determining whether or not such patterns reflect bias or not. The fact that there are multiple ways to measure bias adds complexity to the decisionmaking process. With Aequitas, we provide a tool that automates the reporting of various fairness metrics to aid in this process.

Applying Aequitas progammatically is a three step process represented by three python classes: 

`Group()`: Define groups 

`Bias()`: Calculate disparities

`Fairness()`: Assert fairness

Each class builds on the previous one expanding the output DataFrame.


### Data Formatting

Data for this example was preprocessed for compatibility with Aequitas. **The Aequitas tool always requires a `score` column and requires a binary `label_value` column for supervised metrics**, (i.e., False Discovery Rate, False Positive Rate, False Omission Rate, and False Negative Rate).

Preprocessing includes but is not limited to checking for mandatory `score` and `label_value` columns as well as at least one column representing attributes specific to the data set. See [documentation](../input_data.html) for more information about input data.

Note that while `entity_id` is not necessary for this example, Aequitas recognizes `entity_id` as a reserve column name and will not recognize it as an attribute column.

<a id='existing_biases'></a>
## What biases exist in my model?
### Aequitas Group() Class
Aequitas's Group() class enables researchers to evaluate biases across all subgroups in their dataset by assembling a confusion matrix of each subgroup, calculating commonly used metrics such as false positive rate and false omission rate, as well as counts by group and group prevelance among the sample population. 

The **`get_crosstabs()`** command tabulates a confusion matrix for each subgroup and calculates commonly used metrics such as false positive rate and false omission rate. It also provides counts by group and group prevelances.

#### Group Counts Calculated:

| Count Type | Column Name |
| --- | --- |
| False Positive Count | 'fp' |
| False Negative Count | 'fn' |
| True Negative Count | 'tn' |
| True Positive Count | 'tp' |
| Predicted Positive Count | 'pp' |
| Predicted Negative Count | 'pn' |
| Count of Negative Labels in Group | 'group_label_neg' |
| Count of Positive Labels in Group | 'group_label_pos' | 
| Group Size | 'group_size'|
| Total Entities | 'total_entities' |

#### Absolute Metrics Calcuated:

| Metric | Column Name |
| --- | --- |
| $True Positive Rate$ | 'tpr' |
| $True Negative Rate$ | 'tnr' |
| $False Omission Rate$ | 'for' |
| $False Discovery Rate$ | 'fdr' |
| $False Positive Rate$ | 'fpr' |
| $False Negative Rate$ | 'fnr' |
| $Negative Predictive Value$ | 'npv' |
| $Precision$ | 'precision' |
| $Predicted Positive Ratio_k$ | 'ppr' |
| $Predicted Positive Ratio_g$ | 'pprev' |
| $Group Prevalence$ | 'prev' |


**Note**: The **`get_crosstabs()`** expects a dataframe with predefined columns `score`, and `label_value` and treats other columns (with a few exceptions) as attributes against which to test for disparities. In this cases we include `race`, `sex` and `age_cat`. 

<a id='xtab'></a>
### What is the distribution of groups, predicted scores, and labels across my dataset?

In [None]:
g = Group()
xtab, _ = g.get_crosstabs(df)

In [None]:
absolute_metrics = g.list_absolute_metrics(xtab)

In [None]:
# View calculated counts across sample population groups
xtab[[col for col in xtab.columns if col not in absolute_metrics]]

<a id='xtab_metrics'></a>
### What are bias metrics across groups?

In [None]:
# View calculated absolute metrics for each sample population group
xtab[['attribute_name', 'attribute_value'] + absolute_metrics].round(2)

<a id='interpret_bias'></a>
### How do I interpret the crosstab to evaluate bias?
We see that African-Americans have a false positive rate (`fpr`) of 45%, while Caucasians have a false positive rate of only 23%. This means that African-American people are far more likely to be falsely labeled as high-risk than white people. On the other hand, false ommision rates (`for`) and false discovery rates (`fdr`) are much closer for those two groups.

[Back to Top](#top_cell)
<a id='bias_viz'></a>
## How do I visualize bias in my model?

For visualizing absolute metric fairness, a particular metric can be specified with **`plot_group_metric()`**. A list of particular metrics of interest or 'all' metrics can be plotted with **`plot_group_metric_all()`**.

In [None]:
aqp = Plotting()

### Visualizing a single absolute group metric across all population groups
The chart below displays group metric Predicted Positive Rate Disparity (ppr) calculated across each attribute, colored based on number of samples in the attribute group.

In [None]:
fnr = aqp.plot_group_metric(xtab, 'fnr', min_group=0.01)

### Visualizing default absolute group metrics across all population groups
#### Default absolute group metrics
When visualizing more than one absolute group metric, you can specify a list of metrics, `'all'` metrics, or use the Aequitas default metrics by not supplying an argument:
- Predicted Positive Group Rate Disparity (pprev), 
- Predicted Positive Rate Disparity (ppr),  
- False Discovery Rate (fdr), 
- False Omission Rate (for)
- False Positive Rate (fpr)
- False Negative Rate (fnr)

The charts below display the default group metrics calculated across each attribute, colored based on number of samples in the attribute group.

In [None]:
a = aqp.plot_group_metric_all(xtab)

### Visualizing multiple user-specified absolute group metrics across all population groups

The charts below display the all calculated group metrics across each attribute, colored based on absolute metric magnitude.

In [None]:
p = aqp.plot_group_metric_all(xtab, metrics=['fdr', 'tpr', 'npv', 'fpr', 'precision', 'pprev', 'ppr'], ncols=4)

[Back to Top](#top_cell)
<a id='disparities'></a>
## What levels of disparity are there between population groups?
### Aequitas Bias() Class
We calculate disparities as a ratio of a metric for a group of interest compared to a base group. For example, the False Negative Rate Disparity for black defendants vis-a-vis whites is:
$$Disparity_{FNR} =  \frac{FNR_{black}}{FNR_{white}}$$ 

Below, we use **`get_disparity_predefined_groups()`** which allows us to choose reference groups that clarify the output for the practitioner. 

The Aequitas `Bias()` class includes two additional get disparity functions: **`get_disparity_major_group()`** and **`get_disparity_min_metric()`**, which automate base group selection based on sample majority (across each attribute) and minimum value for each calculated bias metric, respectively.  

#### Disparities Calculated Calcuated:

| Metric | Column Name |
| --- | --- |
| $True Positive Rate Disparity$ | 'tpr_disprity' |
| $True Negative Rate$ | 'tnr_disparity' |
| $False Omission Rate$ | 'for_disparity' |
| $False Discovery Rate$ | 'fdr_disparity' |
| $False Positive Rate$ | 'fpr_disparity' |
| $False NegativeRate$ | 'fnr_disparity' |
| $Negative Predictive Value$ | 'npv_disparity' |
| $Precision Disparity$ | 'precision_disparity' |
| $Predicted Positive Ratio_k Disparity$ | 'ppr_disparity' |
| $Predicted Positive Ratio_g Disparity$ | 'pprev_disparity' |


Columns for each disparity are appended to the crosstab dataframe, along with a column indicating the reference group for each calculated metric (denoted by `[metric name]_ref_group_value`). We see a slice of the dataframe with calculated metrics below.

In [None]:
b = Bias()

<a id='disparity_calc'></a>
### How does the selected reference group affect disparity calculations?

#### Disparities calculated in relation to a user-specified group for each attribute

In [None]:
bdf = b.get_disparity_predefined_groups(xtab, {'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'})

In [None]:
calculated_disparities = b.list_disparities(bdf)

In [None]:
# View disparity metrics added to dataframe
bdf[['attribute_name', 'attribute_value'] +  calculated_disparities]

In [None]:
hbdf = b.get_disparity_predefined_groups(xtab, {'race':'Hispanic', 'sex':'Male', 'age_cat':'25 - 45'})

In [None]:
# View disparity metrics added to dataframe
hbdf[['attribute_name', 'attribute_value'] +  calculated_disparities]

<a id='interpret_disp'></a>
### How do I interpret calculated disparity ratios?
The differences in False Positive Rates, noted in the discussion of the Group() class above, are clarified using the disparity ratio (`fpr_disparity`). Black people are falsely identified as being high or medium risks 1.9 times the rate for white people. 

As seen above, False Discovery Rates have much less disparity (`fdr_disparity`), or fraction of false postives over predicted positive in a group. **Reference groups have disparity = 1 by design in Aequitas**, so the lower disparity is highlighted by the `fdr_disparity` value close to 1.0 (0.906) for the race attribute group 'African-American' when disparities are calculated using predefined base group 'Caucasian'. Note that COMPAS is calibrated to  balance False Positive Rate and False Discovery Rates across groups.

#### Evaluating disparities calculated in relation to a different 'race' reference group
When a differnet pre-defined group 'Hispanic' is used, we can see that Black people are 2.1 times more likely to be falsely identified as being high or medium risks as Hispanic people are (compared to 1.9 times more likely than white people), and even less likely to be falsely identified as low risk when compared to Hispanic people rather than white people.

#### Disparities calculated in relation to sample population majority group (in terms of group prevalence) for each attribute
The majority population groups for each attribute ('race', 'sex', 'age_cat') in the COMPAS dataset are 'African American', 'Male', and '25 - 45'. Using the **`get_disparity_major_group()`** method of calculation allows researchers to quickly evaluate how much more (or less often) other groups are falsely or correctly identified as high- or medium-risk.

In [None]:
majority_bdf = b.get_disparity_major_group(xtab)

In [None]:
majority_bdf[['attribute_name', 'attribute_value'] +  calculated_disparities]

#### Disparities calculated in relation to the minimum value for each metric

Note that disparities are much more varied, and may have larger magnitude, when the minimum value per metric is used as a reference group versus one of the other two methods.

In [None]:
min_metric_bdf = b.get_disparity_min_metric(xtab)

In [None]:
# View disparity metrics added to dataframe
min_metric_bdf[['attribute_name', 'attribute_value'] +  calculated_disparities]

[Back to Top](#top_cell)
<a id='disparity_viz'></a>
## How do I visualize disparities in my model?
To visualize disparities, a particular disparity metric can be specified with **`plot_disparity()`**. To plot a single disparity, a metric and an attribute must be specified.

Disparities related to a list of particular metrics of interest or `'all'` metrics can be plotted with **`plot_disparity_all()`**.  At least one metric or at least one attribute must be specified when plotting multiple disparities (or the same disparity across multiple attributes). For example, to plot PPR and and Precision disparity for all attributes, specify `metrics=['ppr', 'precision']` with no attribute specified, and to plot default metrics by race, specify `attributes=['race']` and with no metrics specified.

**Reference groups are displayed in grey, and always have a disparity = 1.** Note that disparities greater than 10x reference group will are visualized as 10x, and disparities less than 0.1x reference group are visualized as 0.1x.

### Visualizing disparities between groups in a single user-specified attribute for a single user-specified disparity metric

The treemap below displays precision disparity values calculated using the minimum value of the metric, in this case, the 'Caucasian' group within the race attribute, colored based on disparity magnitude.

In [None]:
aqp.plot_disparity(bdf, group_metric='fpr', attribute_name='race', min_group=0.01)

Note the differences in the visualiztion of disparity values when another group, 'Hispanic', is the reference group.

In [None]:
aqp.plot_disparity(min_metric_bdf, group_metric='fpr', attribute_name='race')

### Visualizing disparities between all groups for a single user-specified disparity metric

The treemaps belows display False Positive Rate disparities calculated based on predefined reference groups ('race' attribute: Hispanic, 'sex' attribute: Male, 'age_cat' attribute: 25-45), colored based on disparity magnitude.

In [None]:
j = aqp.plot_disparity_all(hbdf, metrics=['precision'])

### Visualizing disparities between groups in a single user-specified attribute for default metrics
##### Default Metrics
When visualizing more than one disparity, you can specify a list of disparity metrics, `'all'` disaprity metrics, or use the Aequitas default disparity metrics by not supplying an argument:
- Predicted Positive Group Rate Disparity (pprev_disparity),
- Predicted Positive Rate Disparity (ppr_disparity),
- False Discovery Rate Disparity (fdr_disparity),
- False Omission Rate Disparity (for_disparity)
- False Positive Rate Disparity (fpr_disparity)
- False Negative Rate Disparity (fnr_disparity)

The treemaps below display the default disparities between 'age_cat' groups calculated based on the minimum value of each metric, colored based on disparity magnitude.

In [None]:
min_met = aqp.plot_disparity_all(min_metric_bdf, attributes=['age_cat'])

### Visualizing disparities between groups in a single user-specified attribute for all calculated disparity metrics

The treemaps below display disparities between 'race' attribute groups calculated based on predefined reference groups ('race' attribute: Hispanic, 'sex' attribute: Male, 'age_cat' attribute: 25-45) for all 10 disparity metrics, colored based on disparity magnitude.

In [None]:
tm_capped = aqp.plot_disparity_all(hbdf, attributes=['race'], metrics = 'all')

### Visualizing disparity between all groups for multiple user-specified disparity metrics

The treemaps below display False Omission Rate, False Discovery Rate, and False Positive Rate disparities (calculated in relation to the sample majority group for each attribute) between groups acorss all three attributes, colored based on disparity magnitude.

In [None]:
dp = aqp.plot_disparity_all(majority_bdf, metrics=['for', 'fdr', 'fpr'], ncols=3)

[Back to Top](#top_cell)
<a id='fairness'></a>
## How do I assess model fairness?
### Aequitas Fairness() Class
Finally, the Aequitas `Fairness()` class provides three functions that provide a high level summary. Using FPR disparity as an example and the default fairness threshold, we have:

$$ 0.8 < Disparity_{FNR} =  \frac{FPR_{group}}{FPR_{base group}} < 1.25 $$ 

We can assess fairness at various levels of detail:

### Group Level Fairness
The **`get_group_value_fairness()`** function builds on the previous dataframe. The **`get_group_value_fairness()`**
function gives us attribute group-level statistics for fairness determinations:

#### Pairities Calcuated:

| Parity | Column Name |
| --- | --- |
| $True Positive Rate Parity$ | 'TPR Parity' |
| $True Negative Rate Parity$ | 'TNR Parity' |
| $False Omission Rate Parity$ | 'FOR Parity' |
| $False Discovery Rate Parity$ | 'FDR Parity' |
| $False Positive Rate Parity$ | 'FPR Parity' |
| $False Negative Rate Parity$ | 'FNR Parity' |
| $Negative Predictive Value Parity$ | 'NPV Parity' |
| $Precision Parity$ | 'Precision Parity' |
| $Predicted Positive Ratio_k Parity$ | 'Statistical Parity' |
| $Predicted Positive Ratio_g Parity$ | 'Impact Parity' |

#### Also assessed:
- **_Type I Parity_**: Fairness in both FDR Parity and FPR Parity
- **_Type II Parity_**: Fairness in both FOR Parity and FNR Parity
- **_Equalized Odds_**: Fairness in both FPR Parity and TPR Parity
- **_Unsupervised Fairness_**: Fairness in both Statistical Parity and Impact Parity
- **_Supervised Fairness_**: Fairness in both Type I and Type II Parity
- **_Overall Fairness_**: Fairness across all parities for all attributes

In [None]:
f = Fairness()
fdf = f.get_group_value_fairness(bdf)

In [None]:
fairness_grid = fdf[['attribute_name', 'attribute_value','Statistical Parity',
       'Impact Parity', 'FDR Parity', 'FPR Parity', 'FOR Parity', 'FNR Parity',
       'TypeI Parity', 'TypeII Parity', 'Unsupervised Fairness',
       'Supervised Fairness']]
fairness_grid

<a id='interpret_fairness'></a>
### How do I interpret parities?

In this case, our base groups are Caucasian for race, Male for gender, and 25-45 for age_cat. By construction, the base group has supervised fairness. (The disparity ratio is 1). Relative to the base groups, the COMPAS predictions only provide supervised fairness to one group, Hispanic.

Above, the African-American false omission and false discovery are within the bounds of fairness. This result is expected because COMPAS is calibrated. (Given calibration, it is surprising that Asian and Native American rates are so low. This may be a matter of having few observations for these groups.)

On the other hand, African-Americans are roughly twice as likely to have false positives and 40 percent less likely to false negatives. In real terms, 44.8% of African-Americans who did not recidivate were marked high or medium risk (with potential for associated penalties), compared with 23.4% of Caucasian non-reoffenders. This is unfair and is marked False below.

These findings mark an inherent trade-off between FPR Fairness, FNR Fairness and calibration, which is present in any decision system where base rates are not equal. See [Chouldechova (2017)](https://www.andrew.cmu.edu/user/achoulde/files/disparate_impact.pdf). Aequitas helps bring this trade-off to the forefront with clear metrics and asks system designers to make a reasoned decision based on their use case.

### Attribute Level Fairness
Use the **`get_group_attribute_fairness()`** function to view only the calculated parities from the **`get_group_value_fairness()`** function at the attribute level.

In [None]:
gaf = f.get_group_attribute_fairness(fdf)
gaf

### Overall Fairness
The **`get_overall_fairness()`** function gives a quick boolean assessment of the output of **`get_group_attribute_fairness()`**, returning a determination across all attributes for each of:
- Unsupervised Fairness
- Supervised Fairness
- Overall Fairness

In [None]:
gof = f.get_overall_fairness(gaf)
gof

[Back to Top](#top_cell)
<a id='fairness_group_viz'></a>
## How do I visualize bias metric parity?

For visualizing absolute metric fairness, a particular metric can be specified with **`plot_fairness_group()`**. A list of particular metrics of interest or 'all' metrics can be plotted with **`plot_fairness_group_all()`**.

### Visualizing parity of a single absolute group metric across all population groups

The chart below displays absolute group metric Predicted Positive Rate Disparity (ppr) across each attribute, colored based on fairness determination for that attribute group (green = 'True' and red = 'False').

In [None]:
z = aqp.plot_fairness_group(fdf, group_metric='ppr')

### Visualizing all absolute group metrics across all population groups


The charts below display all calculated absolute group metrics across each attribute, colored based on fairness determination for that attribute group (green = 'True' and red = 'False').

In [None]:
fg = aqp.plot_fairness_group_all(fdf, ncols=5, metrics = "all")

<a id='fairness_disp_viz'></a>
## How do I visualize parity between groups in my model? 

To visualize disparity fairness, a particular disparity metric can be specified with **`plot_fairness_disparity()`**. To plot a single disparity, a metric and an attribute must be specified.

Disparities related to a list of particular metrics of interest or `'all'` metrics can be plotted with **`plot_fairness_disparity_all()`**. At least one metric or at least one attribute **must** be specified when plotting multiple fairness disparities (or the same disparity across multiple attributes).

### Visualizing parity between groups in a single user-specified attribute for all calculated disparity metrics

The treemap below displays False Discovery Rate disparity values between race attribute groups calculated based on a predefined reference group ('Caucasian'), colored based on fairness determination for that attribute group (green = 'True' and red = 'False').

In [None]:
def quickcheck(df, attribute, metric, related_parity=None):
    related_disparity = metric + '_disparity'
    if not related_parity:
        special = {'ppr': 'Statistical Parity', 'pprev': 'Impact Parity'}
        if metric in special.keys():
            related_parity = special[metric]
        else:
            related_parity = metric.upper() + ' Parity'
    relevant_cols = ['attribute_value', metric, related_disparity, related_parity]
    return df.loc[df.attribute_name == attribute, relevant_cols]

In [None]:
quickcheck(fdf, attribute='race', metric='fdr', related_parity=None)

In [None]:
m = aqp.plot_fairness_disparity(fdf, group_metric='fdr', attribute_name='race')

### Researcher Check: Is the disparity I am seeing due to small group sizes in my sample?

Use the `min_group` parameter to vizualize parities for only those sample population groups above a user-specified percentage of the total sample size.

In [None]:
m = aqp.plot_fairness_disparity(fdf, group_metric='fdr', attribute_name='race', min_group=0.01)

### Visualizing parity between groups in a single user-specified attribute for all calculated disparity metrics

The treemaps below display disparities between race attribute groups calculated based on a predefined reference group ('Caucasian') for all 10 disparity metrics, colored based on fairness determination for that attribute group (green = 'True' and red = 'False').

In [None]:
a_tm = aqp.plot_fairness_disparity_all(fdf, attributes=['race'], metrics='all')

### Visualizing parity between all groups for multiple user-specified disparity metrics

The treemaps below display Predicted Positive Group Rate (pprev) and Predicted Positive Rate (ppr) disparities between attribute groups for all three attributes (race, sex, age category) calculated based on predefined reference groups ('race' attribute: Caucasian, 'sex' attribute: Male, 'age_cat' attribute: 25-45), colored based on fairness determination for that attribute group (green = 'True' and red = 'False').

In [None]:
r_tm = aqp.plot_fairness_disparity_all(fdf, metrics=['pprev', 'ppr'])

### Visualizing parity between groups in multiple user-specified attributes

The treemaps below display disparities between attribute groups for all two attributes (sex, age category) calculated based on predefined reference groups ('sex' attribute: Male, 'age_cat' attribute: 25-45) for the six default disparity metrics, colored based on fairness determination for that attribute group (green = 'True' and red = 'False').

In [None]:
n_tm = aqp.plot_fairness_disparity_all(fdf, attributes=['sex', 'age_cat'])

## The Aequitas Effect

By breaking down the COMPAS predictions using a variety of bias and disparity metrics calculated using different reference groups, we are able to surface the specific metrics for which our models are imposing bias on given attribute groups, and have a clearer lens when evaluating models and making recommendations for intervention.

[Back to Top](#top_cell)