# Introduction

TODO: Explain what is bias and fairness


This notebook focus on the implementation and exploration of metrics as a way to compare bias in toxic comment classification models. The validation metrics follow the work of [Borkan et al., 2019](https://arxiv.org/abs/1903.04561), including

**AUC based metrics**:
- Subgroup AUC
- Background Positive Subgroup Negative (BPSN) AUC
- Background Negative Subgroup Positive (BNSP) AUC

**Average Equality Gap**:
- Positive AEG
- Negative AEG

an explanation of the metrics are found below.

### Imports

In [1]:
import base64
import io
import os
import re

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as stats
import seaborn as sns
from sklearn import metrics

### Defines

In [None]:
SUBGROUP_AUC = 'subgroup_auc'
NEGATIVE_CROSS_AUC = 'bpsn_auc'
POSITIVE_CROSS_AUC = 'bnsp_auc'
NEGATIVE_AEG = 'negative_aeg'
POSITIVE_AEG = 'positive_aeg'

SUBGROUP_SIZE = 'subgroup_size'
SUBGROUP = 'subgroup'

METRICS = [
    SUBGROUP_AUC, NEGATIVE_CROSS_AUC, POSITIVE_CROSS_AUC, NEGATIVE_AEG,
    POSITIVE_AEG
]
AUCS = [SUBGROUP_AUC, NEGATIVE_CROSS_AUC, POSITIVE_CROSS_AUC]
AEGS = [NEGATIVE_AEG, POSITIVE_AEG]

# Evaluation Metrics

## AUC-Based Metrics

These three metrics are based on the Area Under the Receiver Operating Characteristic Curve (ROC-AUC, or AUC) metric. For any classifier, AUC measures the probability that a randomly chosen negative example will receive a lower score than a randomly chosen positive sample. An AUC of 1.0 means that all negative/positive pairs are all correctly ordered with all negative items receiving lower scores than all positive items.

A core benefit of AUC is that is **threshold agnostic**. And AUC of 1.0 also means that is possible to select a threshold that perfectly distinguishes from negative and positive examples.

Here, we calculate the metrics by dividing the test data by subgroup $D_{g}$ and comparing its metric with the rest of the data $D$, which its called **"background"** data.

*New terms:*\
Subgroup data $D_g$: Subset of full data containing examples of subgroup $g$ \
Background data $D$: Set of all examples that does not contain the specific subgroup. ($D \cap D_g = \emptyset$)

As an example, consider the following hypothetical score distributions for the *background data* (top) and *identity subgroup* (bottom), both divided into negative green examples and positive purple examples.

<img src="../images/large_score_shift_right.png">

We can see clearly that the examples within the identity receive higher scores, both for positive and negative examples. This score shift is one way that unintended bias can manifest in a model. Many types of unintended bias can be uncovered by looking at differences in the score distribution between background data and data from within a sepcific identity. The following three metrics based on AUC can specifically measure variations in the distribution that cause misordering between negative and positive examples.

*(Optional reading)*\
*DEFINITION: Let $D^-$ be the negative examples in the backgroundset, $D^+$ be the positive examples in the background set, $D_{g}^-$ be the negative examples in the identity subgroup, and $D_{g}^+$ be the positive examples in the identity subgroup.*\

$$\begin{aligned}
\textrm{Subgroup AUC} = \textrm{AUC}(D_{g}^- + D_{g}^+), \\
\textrm{BPSN AUC} = \textrm{AUC}(D^+ + D_{g}^-), \\
\textrm{BNSP AUC} = \textrm{AUC}(D^- + D_{g}^+).
\end{aligned}
$$


**Why not use the normal ROC-AUC?**\
As we can see in the previous example, both AUC($D_g$) and AUC($D$) are close to 1.0, however AUC($D_g \cup D$) is not, since the subgroup negative examples intersect the background positive examples. So why not use the AUC of the full data? Simply because the ROC-AUC does not striclty captures the unintended bias in a model, even tough the AUC score in the example is poor, in many other cases might just mean poor model performance in classification. 

### AUC

Uses the scikit-learn implementation of the ROC-AUC

In [4]:
def compute_auc(y_true, y_pred) -> float:
    """Computes the area under the ROC curve (AUC) for the given true and predicted labels.
    
    Parameters
    ----------
        y_true: array-like of shape (n_samples, ) - True binary labels.
        y_pred: array-like of shape (n_samples, ) - Target scores.

    Returns
    -------
        auc: float - The AUC score
    """
    try:
        return metrics.roc_auc_score(y_true, y_pred)
    except ValueError as e:
        return np.nan

### Subgroup AUC

Calculates the AUC on only the examples from the subgroups. This represents model understanding and separability within the subgroup itself.

<img src="../images/subgroup_auc.png">

In [None]:
def compute_subgroup_auc(df: pd.DataFrame, subgroup: str, label: str, pred_col: str) -> float:
    """Computes the AUC for a specific subgroup within the dataset.
    The dataframe must have the predicted scores and true labels for the subgroup.

    Parameters
    ----------
        df: pd.DataFrame - The DataFrame containing the data.
        subgroup: str - The name of the subgroup column to filter on.
        label: str - The name of the true label column.
        pred_col: str - The name of the predicted scores column.

    Returns
    -------
        auc: float - The AUC score for the specified subgroup.
    """
    # Filters the DataFrame o include only specific subgroup examples
    subgroup_examples = df[df[subgroup]]
    # Computes the AUC for the subgroup
    return compute_auc(subgroup_examples[label], subgroup_examples[pred_col])

### Background Positive Subgroup Negative (BPSN) AUC

Calculates AUC on the positive examples from the background and the negative examples from the subgroup. This value would be reduced when scores for negative examples in the subgroup are higher than scores for other positive examples.

<img src="../images/bpsn_auc.png">

In [None]:
def compute_bpsn_auc(df: pd.DataFrame, subgroup: str, label: str, pred_col: str) -> float:
    """Computes the AUC of the background positive examples and the within-subgroup negative examples.
    
    Parameters
    ----------
        df: pd.DataFrame - The DataFrame containing the data.
        subgroup: str - The name of the subgroup column to filter on.
        label: str - The name of the true label column.
        pred_col: str - The name of the predicted scores column.

    Returns
    -------
        bpsn_auc: float - The AUC score for the background positive examples and subgroup negative examples.
    """
    # Filters the DataFrame to include only the subgroup NEGATIVE examples...
    subgroup_negative_examples = df[df[subgroup] & ~df[label]]
    # And the background POSITIVE examples
    non_subgroup_positive_examples = df[~df[subgroup] & df[label]]
    examples = pd.concat(subgroup_negative_examples, non_subgroup_positive_examples)
    return compute_auc(examples[label], examples[pred_col])

### Background Negative Subgroup Positive (BNSP) AUC

Calculates AUC on the negative examples from the background and the positive examples from the subgroup. This value would be reduced when scores for positive examples in the subgroup are lower than scores for other negative examples.

<img src="../images/bnsp_auc.png">

In [None]:
def compute_bnsp_auc(df: pd.DataFrame, subgroup: str, label: str, pred_col: str) -> float:

    """Computes the AUC of the subgroup positive examples and the background negative examples.
    
    Parameters  
    ----------
    df: pd.DataFrame - The DataFrame containing the data.
        subgroup: str - The name of the subgroup column to filter on.
        label: str - The name of the true label column.
        pred_col: str - The name of the predicted scores column.

    Returns
    -------
        bnsp_auc: float - The AUC score for the background negative examples and subgroup positive examples.
    """
    # Filters the DataFrame to include only the subgroup POSITIVE examples...
    subgroup_positive_examples = df[df[subgroup] & df[label]]
    # And the background NEGATIVE examples
    non_subgroup_negative_examples = df[~df[subgroup] & ~df[label]]
    examples = pd.concat(subgroup_positive_examples, non_subgroup_negative_examples)
    return compute_auc(examples[label], examples[pred_col])

## Average Equality Gap (AEG)

These are two addicional threshold agnostic metrics, built from a generalization of the Equality Gap metric.

The Equality gap is the difference between the true positive rate of the subgroup $\textrm{TPR}(D_{g})$, and the background $\textrm{TPR}(D)$, at a specific threshold.

### Mann-Whitney U metric (auxiliary)

In [None]:
def normalized_mwu(data1: pd.DataFrame, data2: pd.DataFrame, pred_col: str) -> float:
  """Calculate number of datapoints with a higher score in data1 than data2."""
  scores_1 = data1[pred_col]
  scores_2 = data2[pred_col]
  n1 = len(scores_1)
  n2 = len(scores_2)
  if n1 == 0 or n2 == 0:
    return None
  u, _ = stats.mannwhitneyu(scores_1, scores_2, alternative='less')
  return u / (n1 * n2)

### Negative AEG

In [None]:
def compute_negative_aeg(df: pd.DataFrame, subgroup: str, label: str, pred_col: str) -> float:
  mwu = normalized_mwu(df[~df[subgroup] & ~df[label]],
                       df[df[subgroup] & ~df[label]], pred_col)
  if mwu is None:
    return None
  return 0.5 - mwu

### Postive AEG

In [None]:
def compute_positive_aeg(df: pd.DataFrame, subgroup: str, label: str, pred_col: str) -> float:
  mwu = normalized_mwu(df[~df[subgroup] & df[label]],
                       df[df[subgroup] & df[label]], pred_col)
  if mwu is None:
    return None
  return 0.5 - mwu

# Putting it All Together

### Compute Subgroup Bias

In [None]:
def compute_bias_metrics_for_subgroup_and_model(dataset: pd.DataFrame,
                                                subgroup: str,
                                                pred_col: str,
                                                label_col: str) -> dict:
  """Computes per-subgroup metrics for one model and subgroup."""
  record = {
      SUBGROUP: subgroup,
      SUBGROUP_SIZE: len(dataset[dataset[subgroup]])
  }
  record[column_name(model, SUBGROUP_AUC)] = compute_subgroup_auc(
      dataset, subgroup, label_col, model)
  record[column_name(model, NEGATIVE_CROSS_AUC)] = compute_negative_cross_auc(
      dataset, subgroup, label_col, model)
  record[column_name(model, POSITIVE_CROSS_AUC)] = compute_positive_cross_auc(
      dataset, subgroup, label_col, model)
  record[column_name(model, NEGATIVE_AEG)] = compute_negative_aeg(
      dataset, subgroup, label_col, model)
  record[column_name(model, POSITIVE_AEG)] = compute_positive_aeg(
      dataset, subgroup, label_col, model)
      
  return record

### Compute Model Unintended Bias

In [None]:
def compute_bias_metrics_for_model(dataset: pd.DataFrame,
                                   subgroups: list[str],
                                   pred_col: str,
                                   label_col: str) -> pd.DataFrame:
  """Computes per-subgroup metrics for all subgroups and one model."""
  records = []
  for subgroup in subgroups:
    subgroup_record = compute_bias_metrics_for_subgroup_and_model(
        dataset, subgroup, pred_col, label_col)
    pd.concat(records, subgroup_record)
    
  return pd.DataFrame(records)

# Visualization

### Confusion Matrix

In [None]:
def confusion_matrix_counts(df: pd.DataFrame, score_col: str, label_col: str, threshold: float) -> dict:
  return {
      'tp': len(df[(df[score_col] >= threshold) & df[label_col]]),
      'tn': len(df[(df[score_col] < threshold) & ~df[label_col]]),
      'fp': len(df[(df[score_col] >= threshold) & ~df[label_col]]),
      'fn': len(df[(df[score_col] < threshold) & df[label_col]]),
  }

  def compute_confusion_rates(df, score_col, label_col, threshold):
  """Compute confusion rates."""
  confusion = confusion_matrix_counts(df, score_col, label_col, threshold)
  actual_positives = confusion['tp'] + confusion['fn']
  actual_negatives = confusion['tn'] + confusion['fp']
  # True positive rate, sensitivity, recall.
  tpr = confusion['tp'] / actual_positives
  # True negative rate, specificity.
  tnr = confusion['tn'] / actual_negatives
  # False positive rate, fall-out.
  fpr = 1 - tnr
  # False negative rate, miss rate.
  fnr = 1 - tpr
  # Precision, positive predictive value.
  precision = confusion['tp'] / (confusion['tp'] + confusion['fp'])
  return {
      'tpr': tpr,
      'tnr': tnr,
      'fpr': fpr,
      'fnr': fnr,
      'precision': precision,
      'recall': tpr,
  }

In [None]:
def per_subgroup_scatterplots(df,
                              subgroup_col,
                              values_col,
                              title='',
                              y_lim=(0.8, 1.0),
                              figsize=(15, 5),
                              point_size=8,
                              file_name='plot'):
  """Displays a series of one-dimensional scatterplots, 1 scatterplot per subgroup.

  Args:
    df: DataFrame contain subgroup_col and values_col.
    subgroup_col: Column containing subgroups.
    values_col: Column containing collection of values to plot (each cell
      should contain a sequence of values, e.g. the AUCs for multiple models
      from the same family).
    title: Plot title.
    y_lim: Plot bounds for y axis.
    figsize: Plot figure size.
  """
  fig = plt.figure(figsize=figsize)
  ax = fig.add_subplot(111)
  for i, (_, row) in enumerate(df.iterrows()):
    # For each subgroup, we plot a 1D scatterplot. The x-value is the position
    # of the item in the dataframe. To change the ordering of the subgroups,
    # sort the dataframe before passing to this function.
    x = [i] * len(row[values_col])
    y = row[values_col]
    ax.scatter(x, y, s=point_size)
  ax.set_xticklabels(df[subgroup_col], rotation=90)
  ax.set_xticks(list(range(len(df))))
  ax.set_ylim(y_lim)
  ax.set_title(title)
  fig.tight_layout()