###**Fairness Auditing and Mitigation in a Criminal Justice Setting (COMPAS)**

###**PART 1: Fairness Definitions in Context**

**Scenario A:**

The concern in this scenario is that if the model continually fails to identify true high-risk individuals within a particular racial group, it would deny them access to helpful resources. To minimize this, we need to equalize the True Positive Rate across the groups. This will ensure that we can focus on providing high-risk individuals in each group with the opportunity to receive the necessary help, while avoiding overprediction.

**Scenario B:**

The concern in this scenario is that if a racial group has a large number of false positives, individuals within this group will be subjected to repeated check-ins even when they do not require it. To bring this down, we can equalize the False Positive Rate across the groups. This ensures that the false positive rates for each racial group are comparable and no one racial group bears the burden of the repeated check-ins. Lowering FPR may increase FNR; however, in this scenario, the concern is that a higher FPR for a group can burden its members, and the requirement is to lower that. Hence, in the tradeoff between lowering FPR and increasing FNR, I believe lowering FPR should be favoured.

In [1]:
!pip install fairlearn

Collecting fairlearn
  Downloading fairlearn-0.12.0-py3-none-any.whl.metadata (7.0 kB)
Downloading fairlearn-0.12.0-py3-none-any.whl (240 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m240.0/240.0 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fairlearn
Successfully installed fairlearn-0.12.0


In [3]:
import os
import json
import warnings

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

from fairlearn.metrics import MetricFrame, selection_rate, false_positive_rate, false_negative_rate
from fairlearn.reductions import ExponentiatedGradient, EqualizedOdds

from fairlearn.reductions import TruePositiveRateParity, FalsePositiveRateParity

In [4]:
# -----------------------
# Config & Data
# -----------------------
DATA_PATH = "compas.csv"
RANDOM_STATE = 42
TEST_SIZE = 0.3
TARGET_COL = "ReoffendedWithinTwoYears"
SENSITIVE_COL = "Race"

REQUIRED_COLS = [
    "Sex", "Race", "Prior Offenses", "Under 25", "ChargeDegree",
    "COMPASPredictedDecileScore", "ReoffendedWithinTwoYears"
]

if not os.path.exists(DATA_PATH):
    raise FileNotFoundError(f"Missing file: {DATA_PATH}")

df = pd.read_csv(DATA_PATH)
missing = [c for c in REQUIRED_COLS if c not in df.columns]
if missing:
    raise ValueError(f"Dataset missing columns: {missing}")

df = df[REQUIRED_COLS].dropna(subset=[SENSITIVE_COL, TARGET_COL]).copy()
y = df[TARGET_COL].astype(int).values
A = df[SENSITIVE_COL].astype(str).values
X = df.drop(columns=[TARGET_COL])

X_tr, X_te, y_tr, y_te, A_tr, A_te = train_test_split(
    X, y, A, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y
)

In [5]:
# Preprocessing

numeric_features = ["COMPASPredictedDecileScore"]
categorical_features = [c for c in X.columns if c not in numeric_features]

ohe = OneHotEncoder(handle_unknown="ignore", sparse_output=False)

preprocess = ColumnTransformer(
    transformers=[
        ("num", StandardScaler(), numeric_features),
        ("cat", ohe, categorical_features),
    ],
    remainder="drop"
)

In [6]:
# Metric helpers

def compute_metrics(y_true, y_pred, sens):
    """Return (MetricFrame, gaps_dict) for accuracy, selection_rate, FPR, FNR."""
    metrics = {
        "accuracy": accuracy_score,
        "selection_rate": selection_rate,
        "false_positive_rate": false_positive_rate,
        "false_negative_rate": false_negative_rate
    }
    mf = MetricFrame(metrics=metrics, y_true=y_true, y_pred=y_pred, sensitive_features=sens)
    gaps = {
        "gap_selection_rate": mf.difference(method="between_groups")["selection_rate"],
        "gap_fpr": mf.difference(method="between_groups")["false_positive_rate"],
        "gap_fnr": mf.difference(method="between_groups")["false_negative_rate"],
    }
    return mf, gaps


##Helper method to print metrics and gaps in metrics as a table
def print_metrics(mf, gaps, n):
  print("\n==========" + n + " Model Metrics ==========")
  bg = mf.by_group if isinstance(mf.by_group, pd.DataFrame) else pd.DataFrame(mf.by_group)
  bg["true_positive_rate"] = 1 - bg["false_negative_rate"] # Corrected line
  o = pd.DataFrame({k: [v] for k, v in mf.overall.items()});
  o["true_positive_rate"] = 1 - o["false_negative_rate"]
  o.index = ["overall"]
  print(pd.concat([o, bg], axis=0).to_string())

  print("\n==========  Gaps in Metrics ==========  ")

  # print(gaps_base)
  gaps_pd = pd.DataFrame({k: [v] for k, v in gaps.items()}, index = ["gaps_base"]);
  print()
  print(gaps_pd.to_string())

###**PART 2: Baseline Model and Fairness Audit**



**Step 1: Build the pipeline**

In [9]:
baseline = Pipeline(steps=[("pre", preprocess), ("clf", LogisticRegression(max_iter=200, C=1.0, solver="lbfgs", random_state=RANDOM_STATE))])

# Fit on training data
baseline.fit(X_tr, y_tr)


**Step 2: Predict on test data**

In [10]:
y_pred_base = baseline.predict(X_te)

# Compute metrics
mf_base, gaps_base = compute_metrics(y_te, y_pred_base, A_te)

# Print results
print_metrics(mf_base, gaps_base, "Baseline")


                  accuracy  selection_rate  false_positive_rate  false_negative_rate  true_positive_rate
overall           0.665589        0.394457             0.253154             0.433402            0.566598
African-American  0.665480        0.535587             0.368705             0.301056            0.698944
Asian             0.500000        0.000000             0.000000             1.000000            0.000000
Caucasian         0.675214        0.272080             0.177677             0.570342            0.429658
Hispanic          0.616915        0.184080             0.121739             0.732558            0.267442
Native American   0.571429        0.142857             0.000000             0.750000            0.250000
Other             0.707317        0.186992             0.055556             0.627451            0.372549


           gap_selection_rate   gap_fpr   gap_fnr
gaps_base            0.535587  0.368705  0.698944


**Step 3: Interpretation**

The disparities are detailed below.

For this interpretation I will look at mostly African-American and Caucasian scores to show the disparity.

Consider the selcetion metrics for African-American community vs. the Caucasian community.

| Racial Group | Selection Rate |
|----------|----------|
| African-American | 0.535587    |
| Caucasian   | 0.272080 |

The selection rate for the African-American community is atleast twice as larger than the selection rate for the Caucasian community.


According to the FPR rate metrics for African-American vs. Caucasian,

| Racial Group | FPR |
|----------|----------|
| African-American | 0.368705    |
| Caucasian   | 0.177677   |

Similar to the selection rates, here the FPR value for African-Americans is more than twice the same value for Caucasians, i.e., the model is predicting more African-Americans as high-risk individuals when they are not.

According to the FNR metrics for African-American vs. Caucasian,

| Racial Group | FNR |
|----------|----------|
| African-American | 0.301056   |
| Caucasian   | 0.570342  |

This indicates that the number of individuals in the Caucasian community that are incorrectly labeled as low-risk when they are actually high-risk is much more than in the African-American community.



###**PART 3: Mitigating Bias**

**Step 1: Setup**

In [12]:
Xtr_t = baseline.named_steps["pre"].transform(X_tr)
Xte_t = baseline.named_steps["pre"].transform(X_te)

In [15]:
def fit_mitigated(constraint_name, constraint_ctor, Xtr_t, Xte_t, y_tr, y_te, A_tr, A_te):
    """Fit a mitigated model and return (name, MetricFrame, gaps)."""
    constraint = constraint_ctor()
    mit = ExponentiatedGradient(
        estimator=LogisticRegression(max_iter=200, C=1.0, solver="lbfgs", random_state=RANDOM_STATE),
        constraints=constraint,
        eps=0.01,
        max_iter=50,
        sample_weight_name="sample_weight",
    )
    mit.fit(Xtr_t, y_tr, sensitive_features=A_tr)
    y_hat = mit.predict(Xte_t)
    return y_hat


**Step 2: Train & Predict**

In [21]:
mitigation_TPR_yhat = fit_mitigated(
    "TPR Parity", TruePositiveRateParity, Xtr_t, Xte_t, y_tr, y_te, A_tr, A_te
)

In [29]:
mitigation_FPR_yhat = fit_mitigated(
    "FPR Parity", FalsePositiveRateParity, Xtr_t, Xte_t, y_tr, y_te, A_tr, A_te
)

**Step 3: Re-Audit**

Comparing model using TPR fairness constraint vs. baseline model

In [22]:
m_tprmodel, gs_tprmodel = compute_metrics(y_te, mitigation_TPR_yhat, A_te)
print_metrics(m_tprmodel, gs_tprmodel, "TPR Parity")


                  accuracy  selection_rate  false_positive_rate  false_negative_rate  true_positive_rate
overall           0.664203        0.381062             0.242220             0.449795            0.550205
African-American  0.659253        0.406584             0.244604             0.434859            0.565141
Asian             0.625000        0.375000             0.250000             0.500000            0.500000
Caucasian         0.663818        0.349003             0.248292             0.482890            0.517110
Hispanic          0.641791        0.368159             0.260870             0.488372            0.511628
Native American   0.714286        0.285714             0.000000             0.500000            0.500000
Other             0.747967        0.357724             0.166667             0.372549            0.627451


           gap_selection_rate  gap_fpr   gap_fnr
gaps_base            0.120869  0.26087  0.127451


I used True Positive Rate for the scenario A because we needed to prioritize identitfication of high risk individuals across all racial groups so that they can get the helpful resources and not overpredict.

Now, we can calculate TPR by calculating 1 - FNR.

|         |African-American TPR Values    | Caucasian TPR Values      |
|----------| ----------|----------|
|Baseline Model| 0.698944 | 0.429658   |
| TPRParity Model| 0.565141 | 0.517110  |

We can see how the TPR value has evened out in the TPRParity Model now compared to the baseline model.

THe gap in the TPR value has also come down from 0.698944 in baseline model to 0.127451 in the TPRParity model.

In [30]:
m_fprmodel, gs_fprmodel = compute_metrics(y_te, mitigation_FPR_yhat, A_te)
print_metrics(m_fprmodel, gs_fprmodel, "FPR Parity")


                  accuracy  selection_rate  false_positive_rate  false_negative_rate  true_positive_rate
overall           0.660970        0.416628             0.277544             0.413934            0.586066
African-American  0.654804        0.453737             0.296763             0.392606            0.607394
Asian             0.750000        0.250000             0.000000             0.500000            0.500000
Caucasian         0.659544        0.401709             0.293850             0.418251            0.581749
Hispanic          0.661692        0.328358             0.208696             0.511628            0.488372
Native American   0.714286        0.285714             0.000000             0.500000            0.500000
Other             0.715447        0.325203             0.166667             0.450980            0.549020


           gap_selection_rate   gap_fpr   gap_fnr
gaps_base            0.203737  0.296763  0.119022


I used the False Positive Rate as my fairness metric for Scenario B.

|     |African-American FPR Values     | Caucasian FPR Values     |
|----------| ----------|----------|
|Baseline Model| 0.368705 | 0.177677   |
| FPRParity Model| 0.296763  | 0.293850 |

When performing the mitigation based on the FalsePositiveRateParity, from the model that had mitigation performed we see that the values for false positives is much more evened out amongst the African Americans and the Caucasians. This a good improvement over the FPR values in the baseline model. We also see an improvement in the greatest gap between FPR in the baseline model(0.368705) and the model on which mitigation was run(0.296763).

**Conclusion:** Both mitigation methods significantly reduced the disparities in TPR and FPR values across groups observed in the baseline model.

###**Part 4: Analysis & Recommendation**

In [32]:
rows = [{
    "model": "Baseline",
    "overall_accuracy": mf_base.overall["accuracy"],
    "gap_selection_rate": gaps_base["gap_selection_rate"],
    "gap_fpr": gaps_base["gap_fpr"],
    "gap_fnr": gaps_base["gap_fnr"],
}]

mitigation_runs = [
    ("TruePositiveRateParity", m_tprmodel, gs_tprmodel),
    ("FalsePositiveRateParity", m_fprmodel, gs_fprmodel),
]

for name, mf_m, gaps_m in mitigation_runs:
    rows.append({
        "model": name,
        "overall_accuracy": mf_m.overall["accuracy"],
        "gap_selection_rate": gaps_m["gap_selection_rate"],
        "gap_fpr": gaps_m["gap_fpr"],
        "gap_fnr": gaps_m["gap_fnr"],
    })

compare_df = pd.DataFrame(rows)

print("\n=== Comparison (Baseline vs two mitigations) ===")
print(compare_df.to_string(index=False))


=== Comparison (Baseline vs two mitigations) ===
                  model  overall_accuracy  gap_selection_rate  gap_fpr  gap_fnr
               Baseline          0.665589            0.535587 0.368705 0.698944
 TruePositiveRateParity          0.664203            0.120869 0.260870 0.127451
FalsePositiveRateParity          0.660970            0.203737 0.296763 0.119022


(a) **The Comparison**

As can be observed from the table above, we can see the gap between the selection rate, FRP and FNR have gone down significantly in the models on which mitigation was performed over the baseline model. We also observe that there has not been a drastic change in the accuracy between the three models.

(b) **The Trade-off**

From the metrics, the disparaties in FPR and TPR have come down significantly in the models which had included some fairness constraint when compared to the baseline model.

For example, for the largest gap between groups in the baseline model for FPR value was 0.368705. However, when applying the FalsePositiveRateParity mitigation we see that this value has come down to 0.296763.

Between the baseline model and the models which has undergone some mitigation techniques, there seems to be not much difference in the accuracy. The baseline model has an accuracy of 0.665589 while the model which had TruePositiveRateParity mitigation applied to it has a accuracy of 0.664203(which is an increase of 0.0013).
Similarly for the model which had FalsePositiveRateParity mitigation applied to it has an accuracy of 0.660970 (which is a drop of 0.0046)

(c) **The Recommendation**

Based on the metrics, for scenario A, I would recommend the model that has undergone the mitigation technique with the True Positive Rate Parity fairness constraint. This is because we need to focus on identifying the right individuals across all groups who require the specialized help. Furthermore, using this fairness constraint in the mitigation technique, we observe a negligible change in the model's accuracy, as indicated by the metrics table, which suggests that we are not overpredicting and we need not over about a dramatic drop in accuracy.


For scenario B, I recommend the model that has undergone the mitigation technique with the False Positive Rate Parity fairness constraint. This is because we need to ensure that we are not wasting our resources on individuals who may not be at high risk. If this is not followed, pne particular racial group may have a lot of false positive and thus individuals from a particular race may receive these continuous check-ins even if they are not actually at high risk. By equalizing this across groups, we ensure that this burden is not ill-proportioned on people from one racial background. It is observed from the metrics table that the mitigation technique using this metric will even out the FPR value across the races.

(d) **Limitations & Next Steps**

 (1) One limitation I find is based on how ethnicity is defined in this dataset. Ethnicity is increasingly becoming a subjective term, as individuals often belong to multiple ethnicities or racial origins. A simple change to address this is to stop creating rigid categories into which we place people and adopt a more fluid format, allowing individuals to identify with multiple racial groups.

 (2) Another limitation is the use of "ReoffendedWithinTwoYears" as an accurate measure of a high-risk individual. Let us consider two individuals. One individual is charged with theft of an item from a store and has reoffended with the same charge within a two-year period. The second individual is accused of a much bigger crime that result in the loss of a life and they have reoffended within 3 or 4 years. In this case, the model might consider the second individual to be less high-risk than the first individual. To address this, the target variable needs to be a function of the degree of the charge and whether the individual has reoffended in the past 'x' number of years. This could create a much more robust prediction target.

