# Evaluating Fairness of AI Models in Radiology

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/Sulam-Group/AI-Deep-Learning-Lab-2023/blob/bbjt-nb-fairness_interpretability/sessions/ai-fairness/fairness.ipynb)

---

**Before we start**

1. Change Colab runtime to GPU,
2. Add a shortcut to the shared Google Drive folder: [https://drive.google.com/drive/folders/1p90aGBS8vIX54x9ytaW8h-vk4NHXDhpR?usp=sharing](https://drive.google.com/drive/folders/1p90aGBS8vIX54x9ytaW8h-vk4NHXDhpR?usp=sharing)

In [None]:
from google.colab import drive
drive.mount("/content/drive", force_remount=True)

import os
import sys
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from torch.utils.data import DataLoader
from tqdm import tqdm

LAB_PATH = os.path.join("drive/MyDrive/RSNA2023-FAIRNESS-LAB")
sys.path.append(LAB_PATH)

!python -m pip install entmax
from utils import get_dataset, get_slice_predictor

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


Mounted at /content/drive
Collecting entmax
  Downloading entmax-1.1-py3-none-any.whl (12 kB)
Installing collected packages: entmax
Successfully installed entmax-1.1


In [None]:
model = get_slice_predictor(device)

dataset = get_dataset()
dataloader = DataLoader(dataset, batch_size=1, shuffle=False)

predictions = []
for i, data in enumerate(tqdm(dataloader)):
    series, target, labels, bio = data

    series = series.squeeze()
    labels = labels.squeeze()

    series = series.to(device)

    with torch.no_grad():
        logits = model(series)

    for logit, label in zip(logits, labels):
        predictions.append(
            {
                "logit": logit.item(),
                "label": label.item(),
                "bio": bio.item(),
            }
        )

data = pd.DataFrame(predictions)
data.to_csv(os.path.join("data.csv"), index=False)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 70.4MB/s]
100%|██████████| 75/75 [01:28<00:00,  1.17s/it]


In [None]:
# Necessary functions
def metrics(y, y_hat):
  tn, fp, fn, tp = confusion_matrix(y, y_hat).ravel()
  tpr = tp/(tp+fn)
  fpr = fp/(tn+fp)
  return (fpr, tpr)

# Part 1: Evaluating the model and generating predictions
In this section, we will
1. Write a function to plot the ROC curve of our model
2. Calculate an optimal value to threshold our predicted probabilites to generate predictions $\hat{Y}$
___

1. Given a function `metrics` that returns the `fprs` and `tprs` for predictions `y_hat` and labels `y`, write a function `ROC` that takes as input the predicted probabilites `y_prob` and labels `y` and returns a tuple `(thresholds, fprs, tprs)` of `thresholds`, `fprs`, and `tprs`, metrics that are necessary to plot the ROC curve.

In [None]:
# Function to calculate metrics to plot the ROC

def ROC(y_prob, y):
  ## YOUR CODE HERE ##
  return (thresholds, np.array(fprs), np.array(tprs))


In [None]:
#@title Solution

# Function to calculate metrics to plot the ROC

def ROC(y_prob, y):
  tprs = []
  fprs = []
  thresholds = np.linspace(0,1,1000)
  for t in thresholds:
      y_hat = (y_prob >= t).astype('float')
      fpr, tpr = metrics(y, y_hat)
      tprs.append(tpr)
      fprs.append(fpr)
  return (thresholds, np.array(fprs), np.array(tprs))


2. Fill in the blanks below to plot the ROC curve

In [None]:
# Fill in the blanks to plot the ROC curve
y_prob = data['logits']
y = data['label']
thresholds, fprs, tprs = ROC(y_prob, y)
plt.figure(figsize=(4,4))
plt.plot(_, _)
plt.xlabel('_')
plt.ylabel('_')

In [None]:
#@title Solution

# Plot the ROC curve
y_prob = data['logit']
y = data['label']
thresholds, fprs, tprs = ROC(y_prob, y)
plt.figure(figsize=(4,4))
plt.plot(fprs, tprs)
plt.xlabel('FPR')
plt.ylabel('TPR')

3. To generate actual predictions `y_hat`, we need to calculate the optimal value to threshold our predicted probabilites `y_prob`. For this tutorial, lets find the threshold that maximizes the Youden's J-statistic and then use this threshold to generate predictions `y_hat` and store them in a new column.

$$
J = TPR - FPR
$$

In [None]:
# Calculate the optimal threshold and generate predictions y_hat

# YOUR CODE TO CALCULATE THE OPTIMAL THRESHOLD
y_hat = (y_prob >= 'your calculate threshold will go here').astype('float')
data['predictions'] = y_hat

In [None]:
#@title Solution

# Calculate the optimal threshold and generate predictions y_hat
J = tprs - fprs
optimal_threshold = thresholds[np.argmax(J)]
y_hat = (y_prob >= optimal_threshold).astype('float')
data['predictions'] = y_hat

4. With the optimal threshold now calculated, print the `FPR`, `TPR`, and `J-statistic` of this model

In [None]:
# Calculate the FPR, TPR, and J-statistic of the current model
# YOUR CODE HERE

print('FPR = ', 'your answer here')
print('TPR = ', 'your answer here')
print('J-statistic = ', 'your answer here')

In [None]:
#@title Solution

# Calculate the FPR, TPR, and J-statistic of the current model
fpr = fprs[np.argmax(J)]
tpr = tprs[np.argmax(J)]
J = tpr - fpr
print('FPR = ', fpr)
print('TPR = ', tpr)
print('J-statistic = ', J)

## Part 2: Evaluating and controlling a models fairness
In this section we will
1. Describe 2 common notions of fairness and learn how to evaluate them
2. Evaluate how controlling for fairness may impact a model's overall performance
___

**Definition 1: Equal Oppurtunity**

A predictor $\hat{Y}$ satisfies equal opportunity with respect to a binary sensitive attribute $A = \{0,1\}$ if

$$
\mathbb{P}[\hat{Y} = 1 \mid A = 1, Y = 1] = \mathbb{P}[\hat{Y} = 1 \mid A = 0, Y = 1]
$$

We can evaluate to what extent $\hat{Y}$ satisfies this condition by evaluating the following metric

$$
\Delta_{FPR} = |\mathbb{P}[\hat{Y} = 1 \mid A = 1, Y = 1] - \mathbb{P}[\hat{Y} = 1 \mid A = 0, Y = 1]|
$$


**Definition 2: Predictive Equality**

A predictor $\hat{Y}$ satisfies predictive equality with respect to a binary sensitive attribute $A = \{0,1\}$ if

$$
\mathbb{P}[\hat{Y} = 1 \mid A = 1, Y = 0] = \mathbb{P}[\hat{Y} = 1 \mid A = 0, Y = 0]
$$

We can evaluate to what extent $\hat{Y}$ satisfies this condition by evaluating the following metric

$$
\Delta_{TPR} = |\mathbb{P}[\hat{Y} = 1 \mid A = 1, Y = 1] - \mathbb{P}[\hat{Y} = 1 \mid A = 0, Y = 1]|
$$
___




1. Lets start by calculating $\Delta_{TPR}$ and $\Delta_{FPR}$. Write some code to calculate the following quantities and print them below

In [None]:
# Calculate \Delta_TPR
## YOUR CODE HERE (hint: isolate data to samples that have A = 1 and Y = 1)

print('Delta_TPR = ', 'your answer here')


# Calculate \Delta_FPR
## YOUR CODE HERE (hint: isolate data to samples that have A = 1 and Y = 0)

print('Delta_FPR = ', 'your answer here')

In [None]:
#@title Solution
# Calculate \Delta_TPR
data_A1Y1 = data.loc[(data['bio'] == 1) & (data['label'] == 1)]
data_A0Y1 = data.loc[(data['bio'] == 0) & (data['label'] == 1)]
p_a1y1 = sum(data_A1Y1['predictions'])/len(data_A1Y1)
p_a0y1 = sum(data_A0Y1['predictions'])/len(data_A0Y1)
print('Delta_TPR = ', abs(p_a1y1 - p_a0y1))

# Calculate \Delta_FPR
data_A1Y0 = data.loc[(data['bio'] == 1) & (data['label'] == 0)]
data_A0Y0 = data.loc[(data['bio'] == 0) & (data['label'] == 0)]
p_a1y0 = sum(data_A1Y0['predictions'])/len(data_A1Y0)
p_a0y0 = sum(data_A0Y0['predictions'])/len(data_A0Y0)
print('Delta_FPR = ', abs(p_a1y0 - p_a0y0))

2. $\Delta_{TPR}$ and $\Delta_{FPR}$ being greater than 0 indicate that the predictions are unfair. The cell below plots the FPR and TPR per group.

In [None]:
# Plot
plt.figure(figsize=(4,4))
plt.plot([0, p_a1y0, 1], [0, p_a1y1, 1])
plt.plot([0, p_a0y0, 1], [0, p_a0y1, 1])
plt.xlim([0,1.01])
plt.ylim([0,1.01])
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.legend(['A = 1', 'A = 0'])

3. Based off the plot above, is there a coordinate where `A = 1` and `A=0` have the same `TPR` and `FPR`? What is that point? Calculate the `FPR`, `TPR` and `J-statistic` of that coordinate and print them below.

In [None]:
# Calculate the corrected FPR, TPR, and J-statistic
# YOUR CODE HERE

print('Corrected FPR = ', 'your answer here')
print('Corrected TPR = ', 'your answer here')
print('Corrected J-statistic = ', 'your answer here')

In [None]:
#@title Solution

# Calculate corrected FPR and TPR
m0 = p_a0y1/p_a0y0
m1 = (1-p_a1y1)/(1-p_a1y0)
new_FPR = (1 - m1)/(m0-m1)
new_TPR = (m0)*new_FPR
new_J = new_TPR - new_FPR
print('Corrected FPR = ', new_FPR)
print('Corrected TPR = ', new_TPR)
print('Corrected J-statistic = ', new_J)