# Addressing Bias and Fairness in Machine Learning: A Practical Guide and Hands-on Tutorial
## KDD 2023 Hands-on Tutorial
### Rayid Ghani, Kit Rodolfa, Pedro Saleiro, Sérgio Jesus

# <font color=red>Auditing a Single Model using [Aequitas](http://www.datasciencepublicpolicy.org/projects/aequitas/)</font>
A more in-depth demo notebook is at https://github.com/dssg/aequitas/blob/master/docs/source/examples/compas_demo.ipynb

### 1. Install dependencies, import packages and data
This is needed every time you open this notebook in colab to install dependencies

In [None]:
!pip install aequitas==0.42.0
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:75% !important; }</style>"))
import yaml
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from aequitas.group import Group
from aequitas.bias import Bias
from aequitas.fairness import Fairness
import aequitas.plot as ap
DATAPATH = 'https://github.com/dssg/fairness_tutorial/raw/master/data/'
DPI = 200


## What has already happened?

We've already cleaned data, generated features, created train-test sets, built 1000s of models on each training set and scored each test set with them, and calculated various evaluation metrics. 

As described earlier, the goal here is to select top 1000 project submissions that are likely to not get funded in order to prioritize resource allocation. That corresponds to the metric **Precision at top 1000**.


## Let's take a look at the performance of the models on one test set based on  **Precision at top 1000**

In [None]:
# code to load results and plot histogram with p@1000 for all models
evals_df = pd.read_csv(DATAPATH +'split2_evals.csv.gz', compression='gzip')

ax = sns.distplot(evals_df['model_precision'])
ax.set_title('Precision at 1000 across all the models')
plt.gcf().set_size_inches((5, 3))
plt.gcf().set_dpi(DPI)
plt.show()

## We're now going to take the "best" model based on precision at top 1000 and audit its predicitons

# <font color=green>Auditing the Model with Highest Precision at top 1000</font>

### What do we need to audit the predictions?
1. predictions (scores or thresholded based on top 1000)
2. labels
3. attributes to audit (and a reference group within each attribute)
4. fairness metric(s)
5. disparity tolerance

## Load predictions, labels, and attributes to audit

In [None]:
# load pre-computed predictions, labels, attributes dataframe
df = pd.read_csv(DATAPATH + 'single_audit_df.csv.gz', compression='gzip')

Aequitas needs predictions (binary score), the label value, and the attributes to audit

In [None]:
# take a look at the dataframe we just loaded
df.head(10)

In [None]:
# The score has been binarized (0/1) by taking the top 1000 highest scored predictions and calling them 1 
# because we care about selecting the top 1000 projects)
df['score'].value_counts()

## Define Attributes to Audit and Reference Group for each Attribute

In [None]:
attributes_and_reference_groups={'poverty_level':'lower', 'metro_type':'suburban_rural', 'teacher_sex':'male'}
attributes_to_audit = list(attributes_and_reference_groups.keys())

## Select fairness metric(s) that we care about

In [None]:
metrics = ['tpr']

## Define  Disparity Tolerance 

In [None]:
disparity_tolerance = 1.30

## Run Aequitas (based on the settings above)

In [None]:
# Initialize Aequitas
g = Group()
b = Bias()

# get_crosstabs returns a dataframe of the group counts and group value bias metrics.
xtab, _ = g.get_crosstabs(df, attr_cols=attributes_to_audit)
bdf = b.get_disparity_predefined_groups(xtab, original_df=df, ref_groups_dict=attributes_and_reference_groups)

## Look at Audit Results

Now we are going to focus our analysis on the fairness metric(s) of interest in this case study: TPR across different groups. The aequitas plot module exposes the disparities_metrics() plot, which displays both the disparities and the group-wise metric results side by side.

### Check for Fairness in Poverty Level 

In [None]:
ap.disparity(bdf, metrics, 'poverty_level', fairness_threshold = disparity_tolerance)

In [None]:
ap.absolute(bdf, metrics, 'poverty_level', fairness_threshold = disparity_tolerance)

### Check for Fairness in Metro_Type (where the school is based)

In [None]:
ap.disparity(bdf, metrics, 'metro_type', fairness_threshold = disparity_tolerance)

In [None]:
ap.absolute(bdf, metrics, 'metro_type', fairness_threshold = disparity_tolerance)

### Check for Fairness in the Sex of the Teacher submitting the project 

In [None]:
ap.disparity(bdf, metrics, 'teacher_sex', fairness_threshold = disparity_tolerance)

In [None]:
ap.absolute(bdf, metrics, 'teacher_sex', fairness_threshold = disparity_tolerance)

### Deeper Dive into the audit results

#### Look at the underlying data: Disparities for all metrics 

In [None]:
bdf[['attribute_name', 'attribute_value'] + b.list_disparities(bdf)]

#### Look at the underlying data: All Metrics

In [None]:
absolute_metrics = g.list_absolute_metrics(xtab)
xtab[['attribute_name', 'attribute_value'] + absolute_metrics]

#### Look at the underlying data: All raw counts

In [None]:
xtab[[col for col in xtab.columns if col not in absolute_metrics]]