# Introduction

This notebook focus on the implementation and exploration of metrics as a way to compare bias in toxic comment classification models. The validation metrics follow the work of [Borkan et al., 2019](https://arxiv.org/abs/1903.04561), including

**AUC based metrics**:
- Subgroup AUC
- Background Positive Subgroup Negative (BPSN) AUC
- Background Negative Subgroup Positive (BNSP) AUC

**Average Equality Gap**:
- Positive AEG
- Negative AEG

an explanation of the metrics are found below

### Imports

In [None]:
import base64
import io
import os
import re

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as stats
import seaborn as sns
from sklearn import metrics

### Defines

In [None]:
SUBGROUP_AUC = 'subgroup_auc'
NEGATIVE_CROSS_AUC = 'bpsn_auc'
POSITIVE_CROSS_AUC = 'bnsp_auc'
NEGATIVE_AEG = 'negative_aeg'
POSITIVE_AEG = 'positive_aeg'

SUBSET_SIZE = 'subset_size'
SUBGROUP = 'subgroup'

METRICS = [
    SUBGROUP_AUC, NEGATIVE_CROSS_AUC, POSITIVE_CROSS_AUC, NEGATIVE_AEG,
    POSITIVE_AEG
]
AUCS = [SUBGROUP_AUC, NEGATIVE_CROSS_AUC, POSITIVE_CROSS_AUC]
AEGS = [NEGATIVE_AEG, POSITIVE_AEG]

## AUC-Based Metrics

The following three metrics are based on the Area Under the Receiver Operating Characteristic Curve (ROC-AUC, or AUC) metric. For any classifier, AUC measures the probability that a randomly chosen negative example will receive a lower score than a randomly chosen positive sample. An AUC of 1.0 means that all negative/positive pairs are all correctly ordered with all negative items receiving lower scores than all positive items.

A core benefit of AUC is that is **threshold agnostic**. And AUC of 1.0 also means that is possible to select a threshold that perfectly distinguishes from negative and positive examples.

Here, we calculate the metrics by dividing the test data by subroup and comparing its metric with the rest of the data, which its called **"background"** data.

As an example, consider the following hypothetical score distributions

<img src="auc_metrics.png" width="900" height="450">



### AUC

Uses the scikit-learn implementation of the ROC-AUC

In [None]:
def compute_auc(y_true, y_pred) -> float:
    """Computes the area under the ROC curve (AUC) for the given true and predicted labels.
    
    Parameters
    ----------
        y_true: array-like of shape (n_samples, ) - True binary labels.
        y_pred: array-like of shape (n_samples, ) - Target scores.

    Returns
    -------
        auc: float - The AUC score
    """
    try:
        return metrics.roc_auc_score(y_true, y_pred)
    except ValueError as e:
        return np.nan

### Subgroup AUC

Calculates the AUC on only the examples from the subgroups. This represents model understanding and separability within the subgroup itself.

In [None]:
def compute_subgroup_auc(df: pd.DataFrame, subgroup: str, label: str, pred_column: str) -> float:
    """ Computes the AUC for a specific subgroup within the dataset.
    The dataframe must have the predicted scores and true labels for the subgroup.

    Parameters
    ----------
        df: pd.DataFrame - The DataFrame containing the data.
        subgroup: str - The name of the subgroup column to filter on.
        label: str - The name of the true label column.
        pred_column: str - The name of the predicted scores column.

    Returns
    -------
        auc: float - The AUC score for the specified subgroup.
    """
    # Filters the DataFrame o include only specific subgroup examples
    subgroup_examples = df[df[subgroup]]
    # Computes the AUC for the subgroup
    return compute_auc(subgroup_examples[label], subgroup_examples[pred_column])

### Background Positive Subgroup Negative (BPSN) AUC

Calculates AUC on the positive examples from the background and the negative examples from the subgroup. This value would be reduced when scores for negative examples in the subgroup are higher than scores for other positive examples.