# **Workshop exercise 2 - Mitigating Bias**

## **Purpose**

The scope of this exerecise is to quantify bias and subsequently to mitigate it to obtain a fair algorithm.
Before we begin with the analysis we need to set up our environment and install required libraries

## **Libraries**

We will start by running the following cell to install and import some necessary.

In [None]:
# Install the fairlearn library
!pip install fairlearn

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

## **Import the data**

We proceed by importing the COMPAS dataset into a pandas dataframe.

In [None]:
# Load the dataset
url = "https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv"

# Read the data into a pandas dataframe
data = pd.read_csv(url)

# Select the appropriate columns needed for our analysis
columns = ["sex", "age_cat", "race", "v_score_text", "c_charge_degree", "priors_count"]
data = data[columns]

# Display the first few rows of the dataframe
data.head(10)

To build our ML model, we will use the following set of features:

- **sex**: Gender of the individual (e.g., Male, Female)
- **age_cat**: Age category of the individual (e.g., 25-45)
- **race**: Racial background of the individual (e.g., African-American, Caucasian, Hispanic)
- **priors_count**: Number of prior offenses
- **c_charge_degree**: Degree of the current charge (e.g., Felony, Misdemeanor)

Our target (label) will be:
- **score_text**: Textual description of the COMPAS risk score (e.g., Low, Medium, High)

## **Preparing the data**

Next, we prepare the data for further processing. Notably, we perform the following steps:
* Drop samples cassified as "Medium" risk because we want to focus on the "High" and "Low" risk categories.
* Focus on the races "African-American" and "Caucasian" due to the low representation of the other races.

In [None]:
# Drop the medium risk records
data = data.drop(data[data.v_score_text == "Medium"].index)

# Only keep the races "African-American" and "Caucasian"
data = data.loc[data['race'].isin(["African-American", "Caucasian"])]

data.head(10)

## **Building our ML model**

We will now build a simple ML model and treat "race" as our protected variable. The purpose of the model is to predict whether a defendant has a high or low risk of recidivism. We need again to follow some steps.

* Encoding of the risk score column
* Encoding of race column
* Encoding of other columns
* Split of the dataset in training and test set (we will use a 80-20 split)
* Train a classifier on our (training) data
* Compute our accuracy on test data
* Calculate demographic parity difference
* Calculate equalized odds difference


 Let's train it and see if we can identify any biases in our predictions.

In [None]:
# Encoding the risk score to a numerical value
mapping = {"Low": 0, "High": 1}
data = data.replace({"v_score_text": mapping}).rename(columns={"v_score_text": "high_risk"})
data.head(10)

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
import warnings
warnings.filterwarnings('ignore')

# Label encode 'race' column
le = LabelEncoder()
data['race'] = le.fit_transform(data['race'])
y = data['high_risk']

# One-hot encode other categorical columns, but exclude 'race'
X = pd.get_dummies(data.drop(['high_risk', 'race'], axis=1), drop_first=True)
X['race'] = data['race']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Train a simple machine learning model (Random Forest Classifier)
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Now that we have trained our classifier, we can evaluate it with respect to model performance and fairness metrics.

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score
from fairlearn.metrics import demographic_parity_difference, equalized_odds_difference

# Calculate the accuracy of the baseline model
baseline_accuracy = accuracy_score(y_test, y_pred)

# Calculate demographic parity difference
dp_difference = demographic_parity_difference(y_test, y_pred, sensitive_features=X_test['race'])

# Calculate equalized odds difference
eo_difference =  equalized_odds_difference(y_test, y_pred, sensitive_features=X_test['race'])

# Print results
print(f"Baseline Model Accuracy: {baseline_accuracy:.4f}")
print(f"Demographic Parity Difference: {dp_difference:.4f}")
print(f"Equalized Odds Difference: {eo_difference:.4f}")

## ***Question:***
*How would you interpret the results for Demographic Parity Difference and Equalized Odds Difference?*

**Explanation of metrics**

- *Baseline Model Accuracy*: 0.8773 (or 87.73%): This tells us that your model correctly predicted the risk score for approximately 87.73% of the instances in our test set.


- *Demographic Parity Difference:  0.1319*: is a fairness metric that measures the difference in the rates of positive predictions (in this case the high-risk class) between two groups. In this case, we are looking at the race attribute, so the result suggests there is a difference in the rate at which African-Americans and Caucasians receive high-risk predictions.
A value of 0 would indicate perfect demographic parity, meaning both groups receive positive predictions at the same rate.
- *Equalized Odds Difference 0.3571*: Specifically, it calculates the largest disparity between either the True Positive Rates or the False Positive Rates for these groups. A value of 0 for the Equalized Odds Difference would indicate perfect fairness, meaning that both African-Americans and Caucasians have identical rates of true positives and false positives. However, our current value of 0.3571 suggests there's some inequality in the model's predictions for African-Americans versus Caucasians.

- While our model performs reasonably well with an accuracy of 87.73%, there are concerns related to fairness, particularly concerning the race attribute. **That we are analysing today. The model is not treating both races equally in terms of positive predictions and correct positive predictions. These disparities can be a result of underlying biases in the training data, imbalances in sample sizes between groups, or other factors.**

Let's compute some more key metrics of our classifier:

**Precision:** What proportion of positive predicitons was acutally correct?

$$
\text{Precision} = \frac{TP}{TP + FP}
$$

**Recall:** What proportion of actual positives was identified correctly?

$$
\text{Recall} = \frac{TP}{TP + FN}
$$

**Selection Rate:** What proportion of the samples received a positive predition?
$$
\text{Selection Rate} = \frac{TP + FP}{P + N}
$$

 *Note: Demographic parity difference is the absolute difference between the selection rates*

In [None]:
from fairlearn.metrics import MetricFrame
from fairlearn.metrics import selection_rate

race = X_test['race']
gm = MetricFrame(metrics=accuracy_score, y_true=y_test, y_pred=y_pred, sensitive_features=race)


metrics = {
    'precision': precision_score,
    'recall': recall_score,
    'selection_rate': selection_rate
}

metric_frame = MetricFrame(metrics=metrics,
                           y_true=y_test,
                           y_pred=y_pred,
                           sensitive_features=race)

ax = metric_frame.by_group.plot.bar(
    subplots=True,
    layout=[1, 3],
    legend=False,
    figsize=[15, 6],
    title="Show all metrics",
)
# Remap the labels
label_mapping = {0: 'African-American', 1: 'Caucasian'}

# Update x-tick labels for each subplot
for subplot in ax[0]:
    labels = [item.get_text() for item in subplot.get_xticklabels()]
    new_labels = [label_mapping[int(label)] if label.isdigit() else label for label in labels]
    subplot.set_xticklabels(new_labels, rotation=45)

plt.tight_layout()

## ***Question***

- What is the underlying assumption about the data when evaluating demographic parity?
- Why may this not not be justified here?

Let's look at some more metrics and how they differ between the races.

In [None]:
from fairlearn.metrics import false_positive_rate, true_positive_rate

metrics = {
    'true_positive_rate': true_positive_rate,
    'false_positive_rate': false_positive_rate,
    'selection_rate': selection_rate
}

metric_frame = MetricFrame(metrics=metrics,
                           y_true=y_test,
                           y_pred=y_pred,
                           sensitive_features=race)

ax = metric_frame.by_group.plot.bar(
    subplots=True,
    layout=[1, 3],
    legend=False,
    figsize=[15, 6],
    title="Show all metrics",
)

# Remap the labels
label_mapping = {0: 'African-American', 1: 'Caucasian'}

# Update x-tick labels for each subplot
for subplot in ax[0]:
    labels = [item.get_text() for item in subplot.get_xticklabels()]
    new_labels = [label_mapping[int(label)] if label.isdigit() else label for label in labels]
    subplot.set_xticklabels(new_labels, rotation=45)

plt.tight_layout()

 *Note: Equalized Odds difference is the largest absolute difference between the TPRs and FPRs.*

## ***Question***

- What is the implication of the difference in true-positive-rates and false-positive-rates with respect to racial fairness?
- Why may the equalized odds difference be misleading in some scenarios? What does it fail to capture?

## **Bias Mitigation Techniques**

In this part we will try to mitigate the bias in our predictions. For this we will try three methods: the simple "fairness through unawareness", the Exponentiated Gradient method to mitigate bias, and a threshold optimization technique.

# 1.  Fairness through unawareness


In the following cell we will try to mitigate the bias in the dataset by using the most simple approach that is usually called **Fairness through unawareness** and it involves removing the sensitive variables such as `race` from the dataset.
The idea is that if the algorithm doesn't have access to these sensitive attributes, it cannot be biased against them.



In [None]:
y = data['high_risk']

# One-hot encode other categorical columns, but exclude 'race'
X = pd.get_dummies(data.drop(['high_risk', 'race'], axis=1), drop_first=True)
X['race'] = data['race']

In [None]:
# Split the data into training and test sets. Use as a test size 20% and random state =42 .
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# Drop race to obscure it from our model
X_train_with_race = X_train.copy() # make a copy of the X train set with race included
X_train = X_train_with_race.drop(['race'], axis=1)


X_test_with_race = X_test.copy() # make a copy of the X test set with the race included
X_test = X_test_with_race.drop(['race'], axis=1)

In [None]:
# Train a simple machine learning model (Random Forest Classifier)
# ... first the full model
model_with_race = RandomForestClassifier(random_state=42)
model_with_race.fit(X_train_with_race, y_train)
y_pred_with_race = model_with_race.predict(X_test_with_race)

# ... and then the model without the race attribute.
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

In [None]:
# Calculate the accuracy of both models
baseline_accuracy = accuracy_score(y_test, y_pred_with_race)
unawareness_accuracy = accuracy_score(y_test, y_pred)

# Calculate demographic parity difference for baseline model
dp_difference_baseline = demographic_parity_difference(y_test, y_pred_with_race, sensitive_features=X_test_with_race['race'])

# Calculate demographic parity difference for unawareness model.
dp_difference = demographic_parity_difference(y_test, y_pred, sensitive_features=X_test_with_race['race'])


# Calculate equalized odds difference
eo_difference_baseline =  equalized_odds_difference(y_test, y_pred_with_race, sensitive_features=X_test_with_race['race'])
eo_difference =  equalized_odds_difference(y_test, y_pred, sensitive_features=X_test_with_race['race'])

# Print results

# Print Accuracy
print(f"Baseline Model Accuracy: {baseline_accuracy:.4f}")
print(f"Unawareness Model Accuracy: {unawareness_accuracy:.4f}")
print(40*"-")

# Print Baseline Demographic parity difference.
print(f"Baseline Demographic Parity Difference: {dp_difference_baseline:.4f}")
print(f"Unawareness Demographic Parity Difference: {dp_difference:.4f}")
print(40*"-")

# Print Baseline Equalized odds difference.
print(f"Baseline Equalized odds Difference: {eo_difference_baseline:.4f}")
print(f"Unawareness Equalized odds Difference: {eo_difference:.4f}")

## ***Question***

- Was this approach effective in removing racial bias?
- In what situations would **fairness through unawareness** not be effective?

# 2. In-Processing bias mitigation with Exponantiated Gradient

Now we are going to try a bias mitigation algorithm that is offered by the **fairlearn** library. The algorithm is called **Exponantiated Gradient**


The Exponentiated Gradient method refines the model iteratively, adjusting sample weights to ensure fairness in decisions. It is using an iterative appraoch to do that:

1. Initialization: All samples start with equal weights.

2. Iterative Process:

 - Train the model with the current weights.
 - Evaluate for fairness violations, e.g., disparate selection rates between African-Americans and Caucasians.
 - If unfairness is detected, increase the weights for samples that would help correct the fairness violation and decreases the weights for samples that would exacerbate it.
3. Convergence: Iterate until the model's decisions balance accuracy with fairness constraints.

To do this we also need to provide the model with a fairness constraint that we want to optimize for, for this demonstration we will chose the demographic parity as the fairness constraint.

*Note: Running the next cell may take a minute.*


In [None]:
from fairlearn.reductions import ExponentiatedGradient, DemographicParity
np.random.seed(42)  # set seed for consistent results with ExponentiatedGradient

# Define the constraint
constraint = DemographicParity()

# Select the classifier
classifier = RandomForestClassifier()

# Construct the mitigator
mitigator = ExponentiatedGradient(classifier, constraint, max_iter=5)
race_train = X_train_with_race['race']

# Train the random forest classifier with the Exponentiated Gradient mitigator
mitigator.fit(X_train_with_race, y_train, sensitive_features=race_train)

# Now lets make predictions in the test set.
y_pred_mitigated = mitigator.predict(X_test_with_race)

# Calculate the accuracy of the model with the fairness mitigation algorithm
exp_grad_accuracy = accuracy_score(y_test, y_pred_mitigated)

# Calculate demographic parity difference
exp_grad_dp_difference = demographic_parity_difference(y_test, y_pred_mitigated, sensitive_features=X_test_with_race['race'])

# Calculate equalized odds difference
exp_grad_eo_difference =  equalized_odds_difference(y_test, y_pred_mitigated, sensitive_features=X_test_with_race['race'])

# Print results
print(f"Baseline Model Accuracy: {baseline_accuracy:.4f}")
print(f"Exp. Grad. Model Accuracy: {exp_grad_accuracy:.4f}")
print(40*"-")

print(f"Baseline Demographic Parity Difference: {dp_difference_baseline:.4f}")
print(f"Exp. Grad. Demographic Parity Difference: {exp_grad_dp_difference:.4f}")
print(40*"-")

print(f"Baseline Equalized odds Difference: {eo_difference_baseline:.4f}")
print(f"Exp. Grad. Equalized odds Difference: {exp_grad_eo_difference:.4f}")

## ***Question***

- Which are your observations from a first glance at these results?
- How does this approach stack up against the **fairness through unawareness** method?

Let's now dive a bit deeper into the metrics with visualizations as we did for the baseline model.

In [None]:
metric_frame_mitigated = MetricFrame(metrics=metrics,
                           y_true=y_test,
                           y_pred=y_pred_mitigated,
                           sensitive_features=race)
ax = metric_frame_mitigated.by_group.plot.bar(
    subplots=True,
    layout=[1, 3],
    legend=False,
    figsize=[15, 6],
    title="Show all metrics",
)
# Remap the labels
label_mapping = {0: 'African-American', 1: 'Caucasian'}

# Update x-tick labels for each subplot
for subplot in ax[0]:
    labels = [item.get_text() for item in subplot.get_xticklabels()]
    new_labels = [label_mapping[int(label)] if label.isdigit() else label for label in labels]
    subplot.set_xticklabels(new_labels, rotation=45)

plt.tight_layout()

## ***Question***

- How did these metrics change compared to the baseline?

# 3. Post-Processing bias mitigation with Threshold Optimizer
Another approach to mitigating bias is at the post-processing stage by altering the classification threshold for positive predictions.

We will use the `ThresholdOptimizer` contained in the Fairlearn library to improve Demographic Parity.

`ThresholdOptimizer` creates seperate classification thresholds for each race. It decides on the best classification thresholds by generating all possible thresholds and selecting the best compination in terms of the objective (in our case balanced_accuracy_score) and the fairness constraint (in our case demographic_parity).


In [None]:
from fairlearn.postprocessing import ThresholdOptimizer, plot_threshold_optimizer
np.random.seed(42)  # set seed for consistent results with ThresholdOptimizer
classifier = RandomForestClassifier()
race_train = X_train_with_race['race']
race_test = X_test_with_race['race']


threshold_optimizer = ThresholdOptimizer(
    estimator=classifier,
    constraints="demographic_parity",
    objective="balanced_accuracy_score",
    predict_method="predict_proba",
    prefit=False
    )

# Train the threshold optimizer
threshold_optimizer.fit(X_train_with_race, y_train, sensitive_features=race_train)

# Now lets make predictions in the test set.
y_pred_mitigated = threshold_optimizer.predict(X_test_with_race, sensitive_features=race_test)

# Calculate the accuracy of the model with the fairness mitigation algorithm
threshold_optimizer_accuracy = accuracy_score(y_test, y_pred_mitigated)

# Calculate demographic parity difference
threshold_optimizer_dp_difference = demographic_parity_difference(y_test, y_pred_mitigated, sensitive_features=X_test_with_race['race'])

# Calculate equalized odds difference
threshold_optimizer_eo_difference =  equalized_odds_difference(y_test, y_pred_mitigated, sensitive_features=X_test_with_race['race'])

# Print results
print(f"Baseline Model Accuracy: {baseline_accuracy:.4f}")
print(f"Threshold Optimizer Model Accuracy: {threshold_optimizer_accuracy:.4f}")
print(40*"-")

print(f"Baseline Demographic Parity Difference: {dp_difference_baseline:.4f}")
print(f"Threshold Optimizer Demographic Parity Difference: {threshold_optimizer_dp_difference:.4f}")
print(40*"-")

print(f"Baseline Equalized odds Difference: {eo_difference_baseline:.4f}")
print(f"Threshold Optimizer Equalized odds Difference: {threshold_optimizer_eo_difference:.4f}")

In [None]:
metric_frame_mitigated = MetricFrame(metrics=metrics,
                           y_true=y_test,
                           y_pred=y_pred_mitigated,
                           sensitive_features=race)
ax = metric_frame_mitigated.by_group.plot.bar(
    subplots=True,
    layout=[1, 3],
    legend=False,
    figsize=[15, 6],
    title="Show all metrics",
)
# Remap the labels
label_mapping = {0: 'African-American', 1: 'Caucasian'}

# Update x-tick labels for each subplot
for subplot in ax[0]:
    labels = [item.get_text() for item in subplot.get_xticklabels()]
    new_labels = [label_mapping[int(label)] if label.isdigit() else label for label in labels]
    subplot.set_xticklabels(new_labels, rotation=45)

plt.tight_layout()

## ***Question***

- How do the results stack up against the previous two bias mitigation strategies?