In [2]:
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

## Prejudice

**Prejudice** means a statistical dependence between a sensitive variable,
$S$, and the target variable, $Y$, or a non-sensitive variable, $X$.

There are three types of prejudices:
## Direct prejudice

Direct prejudice is the use of a sensitive variable in a prediction model.

To eliminate direct prejudice, we can remove the sensitive variable from the model.


## Indirect prejudice

Indirect prejudice is statistical dependence between a sensitive variable and a target variable.

To remove this indirect prejudice, we must use a prediction model that satisfies the condition $Y \perp\!\!\!\perp \ S$.

We can quantify the degree of indirect prejudice using the following equation where $PI$ refers to the (indirect) prejudice index and $\cal{D}$ is the data set.

$$\text{PI} = \sum_{(y, s) \in \cal{D}}  \hat{\text{Pr}}[y, s] \ln \frac{\hat{\text{Pr}}[y, s]}{\hat{\text{Pr}}[y]\hat{\text{Pr}}[s]}$$

The application of the normalization technique for mutual information leads to a _normalized prejudice index_ (NPI)

$$\text{NPI} = \frac{\text{PI}}{\sqrt{\text{H}(Y)\text{H}(S)}}$$

where $\text{H}(\ \cdot\ )$ is the entropy function.

## Latent prejudice
Latent prejudice is a statistical dependence between a sensitive variable, $S$, and a non-sensitive variable, $X$.

Removal of potential prejudice is achieved by making $X$ and $Y$ independent from $S$ simultaneously.



## Underestimation

Underestimation is the state in which a learned model is not fully converged due to the finiteness of the size of a training data set.

Despite that a prediction model without indirect prejudice can learn to make a fair determination, this is only the case if we have an "infinitely large" training data set. In general, training sets are finite and limited to small quantities of data, hence the model could output even more unfair determinations than that observed in the training sample distribution.

To quantify the degree of underestimation, we assess the resultant difference between the training sample distribution over $\cal{D}$, $\tilde{\text{Pr}}$ using the underestimation index (UEI) which is calculated using the Hellinger distance:

$$\text{UEI} = \sqrt{\frac{1}{2}\sum_{(y, s) \in \cal{D}} \left(\sqrt{\tilde{\text{Pr}}[y, s]} - \sqrt{\hat{\text{Pr}}[y, s]}\right)^2} = \sqrt{1 - \sum_{(y, s) \in \cal{D}} \sqrt{\hat{\text{Pr}}[Y, S]\tilde{\text{Pr}}[Y, S]}}$$

where $\hat{\text{Pr}}$ is the distribution of the learned model.

## Negative Legacy

Negative legacy is unfair sampling or labeling in the training data. 

For example, if a bank has been refusing credit to minority people without
assessing them, the records of minority people are less sampled in a training data
set.

## General Framework

Given a training data set $\cal{D} = $ $\{(y, \textbf{x}, s)\}$, we can define the following terms:

- $\cal{M}$ $[ Y |X, S; \mathbb{\Theta}]$ conditional probability of a class given non-sensitive and sensitive features model.
- $\mathbb{\Theta}$ set of model parameters. These parameters are estimates based on the maximum likelihood principle:
$$\cal{L}(\cal{D}, \mathbb{\Theta}) = \sum_{(y_i, \textbf{x}_i, s_i) \in \cal{D}} \ln \cal{M} \ [y_i|\textbf{x}_i, s_i;\mathbb{\Theta}].$$

For the optimization process, we use two types of regularizers, the $L_2$ regularizer $||\mathbb{\Theta}||_2^2$ and a second regularizer $R(\cal{D}, \mathbb{\Theta})$, introduced to enforce fair classification. After applying both regularizing techniques, are objective function becomes:
$$-\cal{L}(\cal{D}, \mathbb{\Theta}) + \eta{} \text{R}(\cal{D}, \mathbb{\Theta}) + \frac{\lambda}{2} ||\mathbb{\Theta}||_2^2.$$

## Prejudice Remover

A _prejudice remover_ regularizer directly tries to reduce the prejudice index and is denoted by $\text{R}_{\text{PR}}$. Recall that the prejudice index is defined as

$$\text{PI} = \sum_{Y, S}  \hat{\text{Pr}}[Y, S] \ln \frac{\hat{\text{Pr}}[Y, S]}{\hat{\text{Pr}}[Y]\hat{\text{Pr}}[S]}$$

where

$$\hat{\text{Pr}}[y|s_i] \approx \frac{\sum_{(\textbf{x}_i, s_i) \in {\cal{D}} \text{ s.t. } s_i = s}{\cal{M}}[y|\textbf{x}_i, s; \mathbb{\Theta}]}{|\left\{(\textbf{x}_i, s_i) \in {\cal{D}} \text{ s.t. } s_i = s \right\}|}.$$

$$\hat{\text{Pr}}[y] \approx \frac{\sum_{(\textbf{x}_i, s_i) \in {\cal{D}}}{\cal{M}}[y|\textbf{x}_i, s_i; \mathbb{\Theta}]}{|{\cal{D}}|}.$$

And the prejudice remover regularizer $\text{R}_{\text{PR}}({\cal{D}}, \mathbb{\Theta})$ is defined as

$$\sum_{(\textbf{x}_i, s_i) \in {\cal{D}}}\sum_{y\in\{0, 1\}}{\cal{M}}[y|\textbf{x}_i, s_i;\mathbb{\Theta}]\ln\frac{\hat{\text{Pr}}[y|s_i]}{\hat{\text{Pr}}[y]}$$

This regularizer becomes increasingly large as a class $y$ becomes more likely to be predicted for a sensitive group $s$ than for the entire population, thus making the overall model is influenced less by the sensitive variables.


## Importing the dataset

For the analysis of the algorithm, we will be using the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) dataset over a period of two years. This dataset contains information about criminal defendants and their recidivism status. The complete dataset contains 52,000 records containing and 52 features. The dataset is available on the [ProPublica GitHub](https://github.com/propublica/compas-analysis) repository. 

In [3]:
# This URL corresponds to the ProPublica Compas Analysis dataset 
URL = "https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv"

In [4]:
df = pd.read_csv(URL, index_col=0)
df.head()

Unnamed: 0_level_0,name,first,last,compas_screening_date,sex,dob,age,age_cat,race,juv_fel_count,...,v_decile_score,v_score_text,v_screening_date,in_custody,out_custody,priors_count.1,start,end,event,two_year_recid
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,miguel hernandez,miguel,hernandez,2013-08-14,Male,1947-04-18,69,Greater than 45,Other,0,...,1,Low,2013-08-14,2014-07-07,2014-07-14,0,0,327,0,0
3,kevon dixon,kevon,dixon,2013-01-27,Male,1982-01-22,34,25 - 45,African-American,0,...,1,Low,2013-01-27,2013-01-26,2013-02-05,0,9,159,1,1
4,ed philo,ed,philo,2013-04-14,Male,1991-05-14,24,Less than 25,African-American,0,...,3,Low,2013-04-14,2013-06-16,2013-06-16,4,0,63,0,1
5,marcu brown,marcu,brown,2013-01-13,Male,1993-01-21,23,Less than 25,African-American,0,...,6,Medium,2013-01-13,,,1,0,1174,0,0
6,bouthy pierrelouis,bouthy,pierrelouis,2013-03-26,Male,1973-01-22,43,25 - 45,Other,0,...,1,Low,2013-03-26,,,2,0,1102,0,0


### Preliminary ETA and Data Cleaning

In [5]:
print("Dataset size: ", df.shape)
print(df.isna().sum())

Dataset size:  (7214, 52)
name                          0
first                         0
last                          0
compas_screening_date         0
sex                           0
dob                           0
age                           0
age_cat                       0
race                          0
juv_fel_count                 0
decile_score                  0
juv_misd_count                0
juv_other_count               0
priors_count                  0
days_b_screening_arrest     307
c_jail_in                   307
c_jail_out                  307
c_case_number                22
c_offense_date             1159
c_arrest_date              6077
c_days_from_compas           22
c_charge_degree               0
c_charge_desc                29
is_recid                      0
r_case_number              3743
r_charge_degree            3743
r_days_from_arrest         4898
r_offense_date             3743
r_charge_desc              3801
r_jail_in                  4898
r_jail_out    

In [6]:
def remove_missing_records(df: pd.DataFrame, threshold: float = 0.5) -> pd.DataFrame:
    """Remove records with missing values above a threshold."""
    # Get the number of missing values per column
    missing = df.isna().sum()
    # Get the columns with missing values above the threshold
    cols = missing[missing > threshold * df.shape[0]].index
    # Remove the columns
    df = df.drop(cols, axis=1)
    # Remove the rows with missing values
    df = df.dropna()
    return df

def inpute_missing_data(df):
    """Inpute missing data with the mean."""
    # Get the number of missing values per column
    missing = df.isna().sum()
    # Get the columns with missing values
    cols = missing[missing > 0].index
    # Inpute the missing values with the mean
    for col in cols:
        df[col] = df[col].fillna(df[col].mean())
    return df

def encode_categorical_features(df: pd.DataFrame, columns: list) -> pd.DataFrame:
    """Encode categorical features."""
    # Get the categorical features
    categorical = df.loc[:, columns].select_dtypes(include="object").columns
    # Encode the categorical features
    for col in categorical:
        df[col] = pd.factorize(df[col])[0]
    return df

def parse_dates(df: pd.DataFrame) -> pd.DataFrame:
    """Parse the dates."""
    # Convert the dates to datetime objects
    date_columns = df.columns.str.endswith("_date")
    for col in df.loc[:, date_columns].columns:
        df[col] = pd.to_datetime(df[col])

    return df

def preprocess_data(df: pd.DataFrame, columns: list) -> pd.DataFrame:
    """Preprocess the data."""
    # Remove the records with missing values
    df = remove_missing_records(df)
    # Inpute the missing values
    df = inpute_missing_data(df)
    # Encode the categorical features
    df = encode_categorical_features(df, columns)
    # Parse the dates
    df = parse_dates(df)
    return df

In [7]:
columns = ["sex", "race", "age_cat", "c_charge_degree", "score_text"]
df = preprocess_data(df, columns)
print("Missing values: ", df.isna().values.sum())
df.head()

Missing values:  0


Unnamed: 0_level_0,name,first,last,compas_screening_date,sex,dob,age,age_cat,race,juv_fel_count,...,v_decile_score,v_score_text,v_screening_date,in_custody,out_custody,priors_count.1,start,end,event,two_year_recid
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,miguel hernandez,miguel,hernandez,2013-08-14,0,1947-04-18,69,0,0,0,...,1,Low,2013-08-14,2014-07-07,2014-07-14,0,0,327,0,0
3,kevon dixon,kevon,dixon,2013-01-27,0,1982-01-22,34,1,1,0,...,1,Low,2013-01-27,2013-01-26,2013-02-05,0,9,159,1,1
4,ed philo,ed,philo,2013-04-14,0,1991-05-14,24,2,1,0,...,3,Low,2013-04-14,2013-06-16,2013-06-16,4,0,63,0,1
7,marsha miles,marsha,miles,2013-11-30,0,1971-08-22,44,1,0,0,...,1,Low,2013-11-30,2013-11-30,2013-12-01,0,1,853,0,0
8,edward riddle,edward,riddle,2014-02-19,0,1974-07-23,41,1,2,0,...,2,Low,2014-02-19,2014-03-31,2014-04-18,14,5,40,1,1


In [85]:
ns_feature_columns = [
    'age', 'c_charge_degree', 'age_cat',
    'score_text', 'sex', 'priors_count', 'days_b_screening_arrest',
    'decile_score', 'is_recid', 'two_year_recid'
]

labels = tf.cast(df["two_year_recid"],  dtype = tf.float32)
sensitive = tf.cast(df["race"],  dtype = tf.float32)
non_sensitive = tf.cast(df[ns_feature_columns], dtype=tf.float32)

In [87]:
D = tf.data.Dataset.from_tensor_slices(
    (labels, non_sensitive, sensitive)
)

split_ratio = 0.8
train_size = int(split_ratio * df.shape[0])

train_dataset = D.take(train_size)
test_dataset = D.skip(train_size).take(df.shape[0] - train_size)

In [88]:
D

<TensorSliceDataset element_spec=(TensorSpec(shape=(), dtype=tf.float32, name=None), TensorSpec(shape=(10,), dtype=tf.float32, name=None), TensorSpec(shape=(), dtype=tf.float32, name=None))>

## Model definition

In [114]:
def conditional_density(model, y, s, D):
    """Probability of label given sensitive variable."""
    obs = 0
    n = 0
    for _, xi, si in D:
        if si == s:
            obs += model(y, xi, s)
            n += 1
    if n == 0:
        return 0.001

    return obs / n

def density(model, y, D):
    """Probability of label."""
    obs = 0
    n = len(D)

    for _, xi, si in D:
        obs += model(y, xi, si)

    return obs / n


def prejudice_remover_regularizer(model, D):
    """Prejudice remover regularizer."""
    result = 0
    for _, xi, si in D:
        for y in [0, 1]:
            y = tf.cast(y, dtype=tf.float32)
            p1 = conditional_density(model, y, si, D)
            p2 = density(model, y, D)
            q = p1 / p2
            log_odds = np.log(q) 
            result += log_odds * model(y, xi, si)
            
    return result

In [116]:
class BinaryLogisticRegression(tf.keras.layers.Layer):

    def __init__(self, num_features):
        super(BinaryLogisticRegression, self).__init__()
        self.ws = tf.Variable(tf.random.normal(shape=(1, num_features)), name="w")
        self.b = tf.Variable(tf.zeros(shape=(1,)), name="b")
    
    def call(self, y, x, s = 0):

        x = tf.reshape(x, [-1, 1])
        
        # Logistic model
        matmul = tf.matmul(x, self.ws)
        sigmoid = tf.nn.sigmoid(matmul + self.b)
        model_output = y * sigmoid + (1 - y) * (1 - sigmoid)

        return model_output

class FairnessModel(tf.keras.Model):
    """Fairness model."""
    def __init__(self, D, num_features, num_sensitive = 0):
        super(FairnessModel, self).__init__()
        
        self.eta = tf.Variable(0.01)
        self.lam = tf.Variable(0.01)

        # self.model = BinaryLogisticRegression(num_features + num_sensitive)
        self.model = BinaryLogisticRegression(num_features)
        self.D = D  
        self.R_pr = prejudice_remover_regularizer



    def call(self, D):
        """Call the model."""
        for y, x, s in D:
            # Compute the model output
            model_output = self.model(y, x)

        # Prejudice remover regularizer
        R_pr = self.R_pr(self.model, self.D)
        regularizer = self.eta * R_pr

        # l2 regularization
        l2 = self.lam * tf.reduce_sum(tf.square(self.model.ws))

        log_model_output = tf.math.log(model_output)
        output = tf.reduce_sum(log_model_output) + regularizer + l2

        return output

    def fit(self, D, epochs=100):
        """Fit the model."""
        optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
        for _ in range(epochs):
            with tf.GradientTape() as tape:
                loss = self.call(D)
            grads = tape.gradient(loss, self.trainable_variables)
            optimizer.apply_gradients(zip(grads, self.trainable_variables))

    

In [118]:
model = FairnessModel(train_dataset, non_sensitive.shape[1])
model.fit(train_dataset, epochs=10)

KeyboardInterrupt: 

## Evaluation Metrics

In [22]:
def evaluation(model, test_labels, test_imgs):
    y_true = test_labels
    y_pred = []
    for image in test_imgs:
        y_pred.append(model(image))
    print(classification_report(y_true, y_pred))