# Demo: Mitigate Disparity

This demo shows how to run the `mitigate_disparity` scripot on a development dataset. 
In addition to providing a dataset, the user should identify protected features by providing a list of column names corresponding to demographics and/or other variables over which fairness should be sought.

In [None]:
from mitigate_disparity import mitigate_disparity

est = mitigate_disparity(
    dataset='data/mimic/development_dataset.train.csv',
    protected_features=[
        'ethnicity',
        'gender',
        'insurance'
    ]
)

categorical features: ['insurance', 'ethnicity']
numeric features: ['temperature', 'heartrate', 'resprate', 'o2sat', 'sbp', 'dbp', 'pain', 'acuity', 'prev_adm']
dataset: data/mimic/development_dataset.train.csv
protected_features: ['ethnicity', 'gender', 'insurance']
running 64 processes
groups ['ethnicity', 'gender', 'insurance']
number of variables: 121
number of objectives: 2
checkpoint file: checkpoint.25e21a10-80f0-4c4c-8b26-83243e32bd2a.pkl
n_gen  |  n_eval  | n_nds  |      eps      |   indicator  
     1 |       64 |      5 |             - |             -
     2 |      128 |      6 |  0.0102040816 |         ideal
     3 |      192 |      5 |  0.0576923077 |         ideal
     4 |      256 |      7 |  0.0094339623 |         ideal
     5 |      320 |     11 |  0.0192307692 |         nadir
     6 |      384 |      7 |  0.0189659210 |         ideal
     7 |      448 |      7 |  0.0224444900 |         ideal
     8 |      512 |      4 |  0.0554088055 |             f
     9 |      576 

## Visualize final front

Once training is done, we can view a set of candidate models. 
The red dot indicates the model that was selected. 
In addition to the default "PseudoWeights" approach, FOMO provides other multi-criteria decsion making (MCDM) algorithms via pymoo.

In [None]:
import pickle
with open('estimator.pkl','rb') as f:
    est = pickle.load(f)
est.plot().show()

# save video of optimization

In [None]:
from pyrecorder.recorder import Recorder
from pyrecorder.writers.video import Video
# from pyrecorder.writers.streamer import Streamer
from pymoo.visualization.scatter import Scatter
import matplotlib.pyplot as plt
with open('estimator.pkl','rb') as f:
    est = pickle.load(f)
# use the video writer as a resource
filename = "xgb_nsga3_mlp.mp4"
# from pyrecorder.writers.gif import GIF
with Recorder(Video(filename, fps=10)) as rec:
# with Recorder(GIF(filename, duration=10)) as rec:
    # for each algorithm object in the history
    for entry in est.res_.history:
        sc = Scatter(title=("Gen %s" % entry.n_gen),
                     labels=['Overall False Positive Rate (FPR)', 'Subgroup False Negative Rate (FNR) Violation']
                    )
        sc.add(entry.pop.get("F"))
#         sc.add(entry.pop.get("F"), plot_type="line", color="black", alpha=0.7)
        sc.do()
        plt.xlim([0.045, 0.085])
        plt.ylim([0.04, 0.41])
        # finally record the current visualization to the video
        rec.record()
    

# check test set performance

In [None]:
from utils import make_measure_dataset
import pandas as pd

import pickle
with open('estimator.pkl','rb') as f:
    est = pickle.load(f)
    
df_test = pd.read_csv('data/mimic/development_dataset.test.csv')
X_test = df_test.drop(columns='binary outcome')
y_test = df_test['binary outcome']
make_measure_dataset(est, 'fomo', X_test, y_test)

In [None]:
from measure_disparity import measure_disparity
df_fairness = measure_disparity('fomo_model_mimic4_admission.csv')