# Lab 6 - Operational Fairness in DA/ML
Week 6 - Q3, 22/23 <br>
SEN163B: Responsible Data Analytics<br>
 

By <b> Nadia Metoui* </b> <br>
TA <b> Darsh Modi</b> <br> 
Faculty of Technology, Policy, and Management (TPM)<br>

***Learning Objectives***<br>
At the end of this lab, you will be able to 

- Use data analytics tools to measure disaparities in a Data Analytics Project
- Use data analytics tools to mitigate disaparities in a Data Analytics Project


***Structure***
- Part I. Measuring Disparities with Aequitas
- Part II Mitigating Bias/Disparities with FAI360 (will be provided in a separate notebook for the second part of the Lab)
- Part III: Mitigating Bias/Disparities with FairLearn (a tutorial will be provided for homework exploration)


##Part I: Measuring Disparities with Aequitas*



In this first part of the lab, we will explore [**Aequitas**](http://www.datasciencepublicpolicy.org/our-work/tools-guides/aequitas/), Bias and fairness auditing toolkit with a python library implementation. Aequitas does NOT have any mitigation tools. It only detects bias, defined as disaparities in data. It gives very good visualizations you can use to highlight the importance of disaparities to your data team.


We will also explore **Intersectionality**, and reflect on its effects.

Then, we will explore **The Fairness Tree**, a flow chart created by the [Data Science and Public Policy Lab at CMU](http://www.datasciencepublicpolicy.org/) to help find the fairness measures relevant to your own responsible data analytics project, when you do don't forget to think about different stakeholders.

---
Acknowledgement: this Part of  lab is inspired from the offical Aequitas tutorial. You can find the tutorial [*HERE*](https://dssg.github.io/aequitas/examples/compas_demo.html). Take a look and use some of the code to help you.

The Official publication on the Aequitas toolkit can be found [*HERE*](https://arxiv.org/pdf/1811.05577.pdf). It is also on Brightspace. Read it! it is very relevant to your project.

Analysis Steps
- Step 0: Understanding the Use Case (COMPAS)
- Step 1: Set-up 
- Step 2: Explore and familiarize with the data
- Step 3: Explore Aequitas Toolkit

###Step 0: Understanding the Use Case (COMPAS)

In 2016, ProPublica reported on racial inequality in automated criminal risk assessment algorithms. The [report](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing) is based on [this analysis](https://github.com/propublica/compas-analysis). Using a clean version of the COMPAS dataset from the ProPublica GitHub repo, we demostrate the use of the Aequitas bias reporting tool.

Northpointe's COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is one of the most widesly utilized risk assessment tools/ algorithms within the criminal justice system for guiding decisions such as how to set bail. The ProPublica dataset represents two years of COMPAS predicitons from Broward County, FL.

COMPAS produces a risk score that predicts a person's likelihood of commiting a crime in the next two years. The output is a score between 1 and 10 that maps to low, medium or high. For Aequitas, we collapse this to a binary prediction. A score of 0 indicates a prediction of "low" risk according to COMPAS, while a 1 indicates "high" or "medium" risk.

This categorization is based on ProPublica's interpretation of Northpointe's practioner guide:

    "According to Northpointe’s practitioners guide, COMPAS “scores in the medium and high range 
    garner more interest from supervision agencies than low scores, as a low score would suggest 
    there is little risk of general recidivism,” so we considered scores any higher than “low” to 
    indicate a risk of recidivism."

###Step 1: Set-up

You will first install the library *aequitas* 

<div class="alert alert-block alert-danger">
<b>Note:</b> Uncomment and run the next cell if you have not previously installed aequitas.
</div>

In [None]:
#!pip install aequitas

In [None]:
#in case you have Attributeerror: module ‘pil.image’ has no attribute ‘resampling’!
!pip install Pillow==9.1.0 

Then you will load  the required libraries for this part (usually no installation is needed).  The main libraries we will use in Part I are `pandas`, `seaborn` and `aequitas`. 

In [None]:
import pandas as pd
import seaborn as sns
from aequitas.group import Group
from aequitas.bias import Bias
from aequitas.fairness import Fairness
import aequitas.plot as ap

# import warnings; warnings.simplefilter('ignore')

%matplotlib inline

###Step 2: Explore and familiarize with the data

**Load and undestaind the data**

In [None]:
df = pd.read_csv("https://github.com/dssg/aequitas/raw/master/examples/data/compas_for_aequitas.csv")


The data provided for this Part of the Lab is not the "full" training data used for COMPAS. It is a dataset composed of: sensitive attributes used to train COMPAS `race`,	`sex`,	`age_cat`, The original labels used to train COMPAS, and the output score or predictions of COMPAS.

COMPAS uses a logistic regression-based model to predict a recidivism score. We will not concern ourselves with training the model in this part. We will only use the output data after training and testing


For your project, you will build such a table before using Aequitas. To do so you will use only your test data. You select the sensitive attributes and their proxies the ground-truth labels and the predictions made by your model
Aequitas is picky about the names of columns given to it. You have to provide the following columns:
1. `score` is the prediction made by your model
1. `label_value` is the ground-truth label
1. The only other columns you should keep are sensitive attribute (and proxy) labels

**Note:** remember Aequitas only uses binary scores so if you are using regressions you have to collapse predictions to a binary setting. Here a score of 0 indicates a prediction of a "low" risk of recidivism 1 indicates a "high" or "medium" risk.

In [None]:
df.head()

**Explore the data**
 

<b>Q1: Explore and understaind the data</b>

Produce as many visualisation and table exploration as you wish. The goal is to get a sence of the data destributions. E.g., 
- What are the attributs and featuers?
- What is the score destribution by attribute (race, sex or age) etc...

**Tip**: You can use `sns.countplot()`

In [None]:
# add your code here

In [None]:
# add your code here and add any number of cells you need

In [None]:
aq_palette = sns.color_palette("Set1", 2)
by_race = sns.countplot(x="race", hue="score", data=df[df.race.isin(['African-American', 'Caucasian', 'Hispanic'])], palette=aq_palette)

In [None]:
by_sex = sns.countplot(x="sex", hue="score", data=df, palette=aq_palette)

<b>Q2: Could you identify any disparities in these destributions</b>

Discuss the disparities e.g., based on sex, race or age category

Add your answer here

###Step 3: Explore Aequitas Toolkit

At this point if your exploration you should have identified patternes of disparities or destribution skew in COMPAS output predictions

As a data scientist you face the challenge of determining whether or not such patterns reflect bias/unfairness or not. The fact that there are multiple ways to measure bias adds complexity to the decision-making process. With Aequitas, You can build a report of various fairness metrics to aid in this process.

Applying Aequitas progammatically is a three step process represented by three python classes: 

`Group()`: Define groups 

`Bias()`: Calculate disparities

`Fairness()`: Assert fairness

Each class builds on the previous one expanding the output DataFrame.

**Preprocessing and Data Formatting**

as mentioned above Aequitas The Aequitas tool always requires a binary **`score`** column and a binary **`label_value`** these will be used to generate metrics such as False Discovery Rate, False Positive Rate, False Omission Rate, and False Negative Rate).

Preprocessing includes but is not limited to checking for mandatory `score` and `label_value` columns as well as at least one column representing sensitive attributes specific to the dataset. See [documentation](../input_data.html) for more information about input data.

**Note** `entity_id` is not necessary for this example, Aequitas recognizes `entity_id` as a reserve column name and will not recognize it as an attribute column.

*** Explore Biases exist in COMPAS model***

_Aequitas Group() Class_

Aequitas's `Group()` class enables researchers to evaluate biases across all subgroups in their dataset by assembling a confusion matrix of each subgroup, calculating commonly used metrics such as false positive rate and false omission rate, as well as counts by group and group prevelance among the sample population. 

<a id='counts_description'></a>
The **`get_crosstabs()`** method tabulates a confusion matrix for each subgroup and calculates commonly used metrics such as false positive rate and false omission rate. It also provides counts by group and group prevelances.

#### Group Counts Calculated:

| Count Type | Column Name |
| --- | --- |
| False Positive Count | 'fp' |
| False Negative Count | 'fn' |
| True Negative Count | 'tn' |
| True Positive Count | 'tp' |
| Predicted Positive Count | 'pp' |
| Predicted Negative Count | 'pn' |
| Count of Negative Labels in Group | 'group_label_neg' |
| Count of Positive Labels in Group | 'group_label_pos' | 
| Group Size | 'group_size'|
| Total Entities | 'total_entities' |

#### Absolute Metrics Calculated:

| Metric | Column Name |
| --- | --- |
| True Positive Rate | 'tpr' |
| True Negative Rate | 'tnr' |
| False Omission Rate | 'for' |
| False Discovery Rate | 'fdr' |
| False Positive Rate | 'fpr' |
| False Negative Rate | 'fnr' |
| Negative Predictive Value | 'npv' |
| Precision | 'precision' |
| Predicted Positive Ratio$_k$ | 'ppr' |
| Predicted Positive Ratio$_g$ | 'pprev' |
| Group Prevalence | 'prev' |


**Q3: Use `get_crosstabs()` method expects a dataframe with predefined columns `score`, and `label_value` and treats other columns (with a few exceptions) as attributes against which to test for disparities. In this case, we include `race`, `sex` and `age_cat`. **

**Note**: The **`get_crosstabs()`** method expects a dataframe with predefined columns `score`, and `label_value` and treats other columns (with a few exceptions) as attributes against which to test for disparities. In this case, we include `race`, `sex` and `age_cat`. 

**Q4: What are bias metrics across groups?**

Once you have run the `Group()` class **`get_crosstabs()`** method, you'll have a dataframe of the [group counts](#counts_description) and [group value bias metrics](#counts_description).

The `Group()` class has a **`list_absolute_metrics()`** method, which you can use for faster slicing to view just  counts or bias metrics.

A. Display counts across sample population groups

B. calculated absolute metrics for each sample population group

C. Comment the results

*** How do I interpret biases in my model? ***

In the slice of the crosstab dataframe created by the `Group()` class **`get_crosstabs()`** method directly above, we see that African-Americans have a false positive rate (`fpr`) of 45%, while Caucasians have a false positive rate of only 23%. This means that African-American people are far more likely to be falsely labeled as high-risk than white people. On the other hand, false ommision rates (`for`) and false discovery rates (`fdr`) are much closer for those two groups.


## What levels of disparity exist between population groups?

### _Aequitas Bias() Class_
We use the Aequitas `Bias()` class to calculate disparities between groups based on the crosstab returned by the `Group()` class **`get_crosstabs()`** method described above. Disparities are calculated as a ratio of a metric for a group of interest compared to a base group. For example, the False Negative Rate Disparity for black defendants vis-a-vis whites is:
$$Disparity_{FNR} =  \frac{FNR_{black}}{FNR_{white}}$$ 

Below, we use **`get_disparity_predefined_groups()`** which allows us to choose reference groups that clarify the output for the practitioner. 

The Aequitas `Bias()` class includes two additional get disparity functions: **`get_disparity_major_group()`** and **`get_disparity_min_metric()`**, which automate base group selection based on sample majority (across each attribute) and minimum value for each calculated bias metric, respectively.  

The **`get_disparity_predefined_groups()`** allows user to define a base group for each attribute, as illustrated below. 

#### Disparities Calculated Calcuated:

| Metric | Column Name |
| --- | --- |
| True Positive Rate Disparity | 'tpr_disprity' |
| True Negative Rate | 'tnr_disparity' |
| False Omission Rate | 'for_disparity' |
| False Discovery Rate | 'fdr_disparity' |
| False Positive Rate | 'fpr_disparity' |
| False NegativeRate | 'fnr_disparity' |
| Negative Predictive Value | 'npv_disparity' |
| Precision Disparity | 'precision_disparity' |
| Predicted Positive Ratio$_k$ Disparity | 'ppr_disparity' |
| Predicted Positive Ratio$_g$ Disparity | 'pprev_disparity' |


Columns for each disparity are appended to the crosstab dataframe, along with a column indicating the reference group for each calculated metric (denoted by `[METRIC NAME]_ref_group_value`). We see a slice of the dataframe with calculated metrics in the next section.

In [None]:
b = Bias()

***Q5: write the code to calculate disparities in relation to a user-specified group for each attribute using `get_disparity_predefined_groups`***

In [None]:
#code here

The `Bias()` class includes a method to quickly return a list of calculated disparities from the dataframe returned by the **`get_disparity_`** methods.

In [None]:
Q6: Add disparity metrics to the dataframe

***Q7: Aequitas Visualizations***
Use `summary` to displya the over all fairness of the COMPAS model

In [None]:
# your code

***Q8: Check for disparities in Race***

Use `disparity` to display disparities by sensitive attribut  `Race`, `Sex` and `Age`

**Q 9: Intersectionality**

A. Starting from the original dataframe (uploaded at the beginning of the lab), write code to create a new column in `data` called `race_sex` which concatenates the `race` and `sex` columns with an underscore, so it has entries like `African-American-Male` or `Hispanic_Female`, etc.

B. List the new intersectional categories

In [None]:
# A. add code

In [None]:
# add code

B. Your answer

**Q 10: Bias in Intersectionality**

A. Repeat the analysis above focusing in this new intersectional categories. 

B. Write down your observations.

In [None]:
# add code

***How do we understand biases in our model?***

What metrics are more relevant to your scenario?

One of the major contributions of the Aequitas toolkit and enviornment is the `Fairness Tree`.

While a lot of bias metrics and fairness definitions have been proposed, there is no consensus on which definitions and metrics should be used in practice to evaluate and audit these systems. To help with this, the authors of Aequitas also created a guide for understanding when and what metrics might apply in a given situation, called the `Fairness Tree`. It can be found here: http://www.datasciencepublicpolicy.org/our-work/tools-guides/aequitas/



Consider the following 2 scenarios:

- **Scenario 1**
You are using the COMPAS model as described at the beginning of the Lab. The carseral system will use COMPAS to optimise the organisation of hearings for parol and early release. The output of your model is going to be used to determine if you grant a hearing requeste for convicts early release or if you reject these requestes. Low risk scors will be grated a hearing for early release. High and Medium scores will not be granted a hearing 

- **Scenario 2**
Suppose a certain country in Europe is building a model to predict the risk that a criminal is going to recidivate, similiar to the prediction made by the COMPAS model discussed in class.  However, in contrast to COMPAS, the goal is not to use these scores to determine whether or not an individual is allowed to be free; instead, almost all individuals with a high risk score will be admitted into a special program that has three components:  (1) the individuals recieve one-on-one counseling, (2) the individuals will recieve a monthly stipend, and (3) the individuals are set up with housing.  The program is a benefit to the individuals and is aimed at reducing recividism.

Think about the confusion matrix for the model.  How would you define the positive and negative class?

Suppose the country is trying to decide on the proper fairness metric to use for their machine learning model, and cares about the protected attribute race.  In terms of representation, they would either accept a model that has equal nominal representation of different races, or equal proportional representation in the special program.  They are also concerned about errors made by the model, and want to make sure that predictive equity among groups is acheieved for people with need.

Review the Aequitas guidelines for fairness metrics, and think about them in the context of this problem:  http://www.datasciencepublicpolicy.org/projects/aequitas/.


**Q:** Which of the following is or is not a good choice for a fairness metric for the model in each of the scenarios?

> a) False Negative Rate Parity \
> b) False Positive Rate Parity \
> c) Equal Selection Parity \
> d) Demographic Parity \

Justify your answers and compare the two scenarios