<a href="https://colab.research.google.com/github/alheliou/Bias_mitigation/blob/main/TD_model_audit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this TD the aim is to analyse the decision made by a model (logistic regression) and study the impact of in-processing mitigation.
You will use different methods to audit/explain the model:
- feature importances with LIME
- black box auditing that consider the features by couple
- counter factual examples with dice-ml
- shapkit

You will use different in-processing mitigation approaches

Then, you will use the Prejudice Remover appraoch (based on a logistic regression) and analyse how the metrics are improved by this mitigation

## Installation of the environnement

We highly recommend you to follow these steps, it will allow every student to work in an environment as similar as possible to the one used during testing.

### Colab Settings
  The next two cells of code are too execute only once per colab environment


#### 1. Python env creation

        ```
        ! python -m pip install numpy fairlearn plotly nbformat ipykernel aif360["inFairness"] aif360['AdversarialDebiasing'] causal-learn BlackBoxAuditing cvxpy dice-ml lime shapkit
        ```

#### 2. Download MEPS dataset (for part2) it can take several minutes

        ```
        ! Rscript /usr/local/lib/python3.12/dist-packages/aif360/data/raw/meps/generate_data.R
        ! mv h181.csv /usr/local/lib/python3.12/dist-packages/aif360/data/raw/meps/
        ! mv h192.csv /usr/local/lib/python3.12/dist-packages/aif360/data/raw/meps/
        ```

  
### Local Settings ( you don't need to redo this, just put the second notebook in the same folder as the first)

#### 1. Uv installation


        https://docs.astral.sh/uv/getting-started/installation/


        `curl -LsSf https://astral.sh/uv/install.sh | sh`

        Python version 3.12 installation (highly recommended)
        `uv python install 3.12`

#### 2. R installation (needed for data download/pre-processing only of Part 2)

        In the command `Rscript` says 'command not found'

        `sudo apt install r-base-core`

#### 3. Python env creation

        ```
        mkdir TD_bias_mitigation
        cd TD_bias_mitigation
        uv python pin 3.12
        uv init
        uv pip install numpy fairlearn plotly nbformat ipykernel aif360["inFairness"] aif360['AdversarialDebiasing'] causal-learn BlackBoxAuditing cvxpy dice-ml lime shapkit
        ```

#### 4. Download MEPS dataset it can take several minutes

        ```
        cd TD_bias_mitigation/.venv/lib/python3.12/site-packages/aif360/data/raw/meps/
        Rscript generate_data.R
        ```
    


In [None]:
# To execute only in Colab
! python -m pip install numpy fairlearn plotly nbformat ipykernel aif360["inFairness"] aif360['AdversarialDebiasing'] causal-learn BlackBoxAuditing cvxpy dice-ml lime shapkit
! Rscript /usr/local/lib/python3.12/dist-packages/aif360/data/raw/meps/generate_data.R
! mv h181.csv /usr/local/lib/python3.12/dist-packages/aif360/data/raw/meps/
! mv h192.csv /usr/local/lib/python3.12/dist-packages/aif360/data/raw/meps/

In [None]:
# Code to compute fairness metrics using aif360

from aif360.sklearn.metrics import *
from sklearn.metrics import  balanced_accuracy_score


# This method takes lists
def get_metrics(
    y_true, # list or np.array of truth values
    y_pred=None,  # list or np.array of predictions
    prot_attr=None, # list or np.array of protected/sensitive attribute values
    priv_group=1, # value taken by the privileged group
    pos_label=1, # value taken by the positive truth/prediction
    sample_weight=None # list or np.array of weights value,
):
    group_metrics = {}
    group_metrics["base_rate_truth"] = base_rate(
        y_true=y_true, pos_label=pos_label, sample_weight=sample_weight
    )
    group_metrics["statistical_parity_difference"] = statistical_parity_difference(
        y_true=y_true, y_pred=y_pred, prot_attr=prot_attr, priv_group=priv_group, pos_label=pos_label, sample_weight=sample_weight
    )
    group_metrics["disparate_impact_ratio"] = disparate_impact_ratio(
        y_true=y_true, y_pred=y_pred, prot_attr=prot_attr, priv_group=priv_group, pos_label=pos_label, sample_weight=sample_weight
    )
    if not y_pred is None:
        group_metrics["base_rate_preds"] = base_rate(
        y_true=y_pred, pos_label=pos_label, sample_weight=sample_weight
        )
        group_metrics["equal_opportunity_difference"] = equal_opportunity_difference(
            y_true=y_true, y_pred=y_pred, prot_attr=prot_attr, priv_group=priv_group, pos_label=pos_label, sample_weight=sample_weight
        )
        group_metrics["average_odds_difference"] = average_odds_difference(
            y_true=y_true, y_pred=y_pred, prot_attr=prot_attr, priv_group=priv_group, pos_label=pos_label, sample_weight=sample_weight
        )
        if len(set(y_pred))>1:
            group_metrics["conditional_demographic_disparity"] = conditional_demographic_disparity(
                y_true=y_true, y_pred=y_pred, prot_attr=prot_attr, pos_label=pos_label, sample_weight=sample_weight
            )
        else:
            group_metrics["conditional_demographic_disparity"] =None
        group_metrics["smoothed_edf"] = smoothed_edf(
        y_true=y_true, y_pred=y_pred, prot_attr=prot_attr, pos_label=pos_label, sample_weight=sample_weight
        )
        group_metrics["df_bias_amplification"] = df_bias_amplification(
        y_true=y_true, y_pred=y_pred, prot_attr=prot_attr, pos_label=pos_label, sample_weight=sample_weight
        )
        group_metrics["balanced_accuracy_score"] = balanced_accuracy_score(
        y_true=y_true, y_pred=y_pred, sample_weight=sample_weight
        )
    return group_metrics

## Import and load the dataset

In [None]:
# imports
import numpy as np
import pandas as pd
import plotly.express as px
import warnings
import matplotlib.pyplot as plt


warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.simplefilter(action="ignore", append=True, category=UserWarning)
# Datasets
from aif360.datasets import MEPSDataset19

# Fairness metrics
from sklearn.metrics import accuracy_score, balanced_accuracy_score
from sklearn.preprocessing import StandardScaler

MEPSDataset19_data = MEPSDataset19()
(dataset_orig_panel19_train, dataset_orig_panel19_val, dataset_orig_panel19_test) = (
    MEPSDataset19().split([0.5, 0.8], shuffle=True)
)

### Observe fairness metrics in datasets

In [None]:
train_metrics = get_metrics(
    y_true=dataset_orig_panel19_train.labels[:,0],
    y_pred=None,
    priv_group=1,
    pos_label=0,
    prot_attr=dataset_orig_panel19_train.protected_attributes[:,0],
    sample_weight=dataset_orig_panel19_train.instance_weights)
val_metrics = get_metrics(
    y_true=dataset_orig_panel19_val.labels[:,0],
    y_pred=None,
    priv_group=1,
    pos_label=0,
    prot_attr=dataset_orig_panel19_val.protected_attributes[:,0],
    sample_weight=dataset_orig_panel19_val.instance_weights)
test_metrics = get_metrics(
    y_true=dataset_orig_panel19_test.labels[:,0],
    y_pred=None,
    priv_group=1,
    pos_label=0,
    prot_attr=dataset_orig_panel19_test.protected_attributes[:,0],
    sample_weight=dataset_orig_panel19_test.instance_weights)

for k, v in train_metrics.items():
    print(f"{k}: train {v}, val {val_metrics[k]}, test {test_metrics[k]}")

base_rate_truth: train 0.7843113407893261, val 0.7808653966877895, test 0.7925663329522626
statistical_parity_difference: train 0.13647991784360136, val 0.1432319693902525, test 0.11921166597551425
disparate_impact_ratio: train 1.1869289902308444, val 1.1980019769846824, test 1.160288017038563


## Part 1 model auditing

### Question 1 using LIME
#### Question 1.1 - Learn a Logistic Regression to predict UTILIZATION, evaluate the fairness of the predictions

#### Question 1.2 (optional) – Observe the impact of the threshold on the performance of logistic regression (balanced accuracy and disparate impact)


#### Question 1.3: Train a LimeEncoder (name the object `lime_data`) on the AIF360 training dataset, then use this LimeEncoder to transform both the training and valid datasets into `s_train` and `s_valid`.

#### Question 1.4 use LimeTabularExplainer to explain the decision made on several instances of the valid dataset.


In [None]:
from lime.lime_tabular import LimeTabularExplainer

explainer = LimeTabularExplainer(
        s_train,
        class_names=lime_data.s_class_names,
        feature_names=lime_data.s_feature_names,
        categorical_features=lime_data.s_categorical_features,
        categorical_names=lime_data.s_categorical_names,
        kernel_width=3, verbose=False, discretize_continuous=True)

### Question 2: Using BlackBoxAuditing

Be careful: this time, we are interested in indirect influences, as this method considers features in pairs.

Also, transforming categorical attributes using "one hot encoding" is not a good approach here because these columns will, by design, be highly correlated with each other.

Instead, we will use ordinal encoding, and only use sklearn classifiers that are compatible with categorical features (HistGradientBoostingClassifier).

First, you should convert the AIF dataset to a dataframe and group together the columns that were already one-hot encoded , then apply ordinal encoding to the categorical columns.



In [None]:
from BlackBoxAuditing.data import load_from_file
from BlackBoxAuditing.model_factories.AbstractModelFactory import AbstractModelFactory
from BlackBoxAuditing.model_factories.AbstractModelVisitor import AbstractModelVisitor

import BlackBoxAuditing as BBA

#### Question 2.1: Preprocess the data

To allow you to spend more time working with explanations, we provide the code to properly format the dataframe; you can move on to section 2.2.

In [None]:
from sklearn import preprocessing


def get_df(MepsDataset, encoders=None):
    data = MepsDataset.convert_to_dataframe()
    # data_train est un tuple, avec le data_frame et un dictionnaire avec toutes les infos (poids, attributs sensibles etc)
    df = data[0]
    df["WEIGHT"] = data[1]["instance_weights"]
    # Get categorical column from one hot encoding (specitic to MEPSdataset)
    # Here we create a dictionnary that links each categorical column name
    # to the list of corresponding one hot encoded columns
    categorical_columns_dic = {}
    for col in df.columns:
        col_split = col.split("=")
        if len(col_split) > 1:
            cat_col = col_split[0]
            if not (cat_col in categorical_columns_dic.keys()):
                categorical_columns_dic[cat_col] = []
            categorical_columns_dic[cat_col].append(col)
    categorical_features = categorical_columns_dic.keys()
    print(categorical_features)

    def categorical_transform(df, onehotencoded, cat_col):
        if len(onehotencoded) > 1:
            return df[onehotencoded].apply(
                lambda x: onehotencoded[np.argmax(x)][len(cat_col) + 1 :], axis=1
            )
        else:
            return df[onehotencoded]


    # Reverse the categorical one hot encoded
    for cat_col, onehotencoded in categorical_columns_dic.items():
        df[cat_col] = categorical_transform(df, onehotencoded, cat_col)
        df.drop(columns=onehotencoded, inplace=True)

    if encoders is None:
        encoders= {}
        #encoders = {cat_col:preprocessing.LabelEncoder() for cat_col in categorical_features}

    for cat_col in categorical_features:
        if not (cat_col in encoders.keys()):
            encoders[cat_col] = preprocessing.LabelEncoder().fit(df[cat_col])

        df[cat_col] = encoders[cat_col].transform(df[cat_col])
    return df, encoders


df, encoders = get_df(MEPSDataset19_data)

dict_keys(['REGION', 'SEX', 'MARRY', 'FTSTU', 'ACTDTY', 'HONRDC', 'RTHLTH', 'MNHLTH', 'HIBPDX', 'CHDDX', 'ANGIDX', 'MIDX', 'OHRTDX', 'STRKDX', 'EMPHDX', 'CHBRON', 'CHOLDX', 'CANCERDX', 'DIABDX', 'JTPAIN', 'ARTHDX', 'ARTHTYPE', 'ASTHDX', 'ADHDADDX', 'PREGNT', 'WLKLIM', 'ACTLIM', 'SOCLIM', 'COGLIM', 'DFHEAR42', 'DFSEE42', 'ADSMOK42', 'PHQ242', 'EMPST', 'POVCAT', 'INSCOV'])


#### Question 2.2  transform train, val and test dataset to dataframes

#### Question 2.3: Learn a HistGradientBoostingClassifier

#### Question 2.4 Use the BlackBoxAuditing library to "audit" the model by analyzing the indirect influences of age (the computation takes time, but feel free to try other attributes as well)

The code is provided again; you just need to adapt it to your own notation.

Here is the documentation for the library used:  
https://github.com/algofairness/BlackBoxAuditing/tree/master

In [None]:
import pickle

# Save your data and model (named clf here) on disk

data_val = df_X_val.copy(deep=True)
data_val["Y"] = df_y_val

data_val.to_csv("data_val.csv",
          index=False)

data_train = df_X_train.copy(deep=True)
data_train["Y"] = df_y_train

data_train.to_csv("data_train.csv",
          index=False)

with open( 'TD2_clf.pickle', 'wb' ) as f:
    pickle.dump(clf, f )

In [None]:
from BlackBoxAuditing.data import load_from_file
from BlackBoxAuditing.model_factories.AbstractModelFactory import AbstractModelFactory
from BlackBoxAuditing.model_factories.AbstractModelVisitor import AbstractModelVisitor

import BlackBoxAuditing as BBA


(_, train_BBA, _, _, _, _) = load_from_file("data_train.csv",
                      correct_types = [int if col_type=="int" else float for col_type in  data_train.dtypes],
                                response_header = 'Y',
                               train_percentage = 1.0)
(headers, _, val_BBA, response_header, features_to_ignore, correct_types) = load_from_file("data_val.csv",
                      correct_types = [int if col_type=="int" else float for col_type in  data_val.dtypes],
                                response_header = 'Y',
                               train_percentage = 0.0)
BBA_data = (headers, train_BBA, val_BBA, response_header, features_to_ignore, correct_types)

In [None]:
class HirePredictorBuilder(AbstractModelFactory):
    def __init__(self, *args, **kwargs):
        AbstractModelFactory.__init__(self, *args, **kwargs)
        self.verbose_factory_name = "HirePredictor"
    def build(self, train_set):
        return HirePredictor()

class HirePredictor(AbstractModelVisitor):
    def __init__(self):
        with open( 'TD2_clf.pickle', 'rb' ) as f:
            self.clf = pickle.load(f)

    def test(self, val_set, test_name=""):
        return [[v[-1], self.clf.predict(np.expand_dims(np.array(v[:-1]), axis = 0))] for v in val_set]

In [None]:
features_to_audit = [
    "AGE",
    "SEX",
    "RACE",
    "REGION"
    ]

In [None]:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

auditor = BBA.Auditor()
auditor.ModelFactory = HirePredictorBuilder
auditor(BBA_data, output_dir = "audit-output", features_to_audit=features_to_audit)

### Question 3: Generer des exemples contrefactuels en utilisant dice-ml

Voici la documentation de la librairie utilisée
https://github.com/interpretml/DiCE?tab=readme-ov-file

In [None]:
import dice_ml
from dice_ml.utils import helpers
# provide the trained ML model to DiCE's model object
# use the HistGradientBoostingClassifier from the BlackBoxAuditiing
backend = 'sklearn'
m = dice_ml.Model(model=clf, backend=backend)

#### Question 3.1 : Create a list with all numerical features

This question uses the variables created in the provided answer of question 2.1

#### Question 3.2 ceate a dice_ml Data with the dataframe.

#### Question 3.3  use dice to create counterfactual example using the 'random' method

## Part 2 In-processing mitigation
###  Question1 : Learn a Standard Scaler on the training dataset features, its output will be used as input of the model learned

### Question2: Create a method to learn a Prejudice Remover on the train dataset and retrieve the model learned
Execute the method with the parameter eta arbitrarily set at 25.0

In [None]:
from aif360.algorithms.inprocessing import PrejudiceRemover

def train_pr_model(train_dataset, eta=25.0):
    model = PrejudiceRemover(sensitive_attr='RACE', eta=eta)
    pr_orig_panel19 = model.fit(train_dataset)
    return pr_orig_panel19

Le score du Prejudice Remover donne un sortie pour chaque instance une seule valeur, c'est un seuil, arbritrairement fixé à 0.5 par défault, qui permet à partir de ce score de décider la prédiction 1 ou 0.
Si le score est supérieur au seuil la prédiction est 1, sinon c'est 0.

### Question3: Validating: Choose the best parameters

Here there are two parameters :
- eta: fairness penalty parameter of the PR model
- thershold: the threshold of the binary classification

The threshold is used to obtains predictions from the model output.
The eta is used during the training

=> Create a method that will loop over 50 threshold ]0:0.5( and 5 values of ETA [1.0: 100.0], and outputs the metrics

### Question4 : Make plot to choose the best set of parameters

### Question 5: Evaluate : compute the metrics on the test dataset using the model learnt with the selected parameters