# Demo: Mitigate Disparity

This demo shows how to run the `mitigate_disparity` scripot on a development dataset. 
In addition to providing a dataset, the user should identify protected features by providing a list of column names corresponding to demographics and/or other variables over which fairness should be sought.

In [None]:
%run ../mitigate_disparity.py \
    --dataset ../data/mimic/development_dataset.train.csv \
    --protected_features ethnicity,gender,insurance \
    --starting_point "checkpoint.25e21a10-80f0-4c4c-8b26-83243e32bd2a.pkl"

dataset: ../data/mimic/development_dataset.train.csv
protected_features: ('ethnicity', 'gender', 'insurance')
running 64 processes
groups ['ethnicity', 'gender', 'insurance']
number of variables: 121
number of objectives: 2
Loaded Checkpoint: <pymoo.algorithms.moo.nsga3.NSGA3 object at 0x7f74b420d210>
checkpoint file: checkpoint.4b889672-8cbc-40a3-b6c3-71fa5738efcd.pkl
    77 |     4928 |     14 |  0.0510104788 |         ideal
    78 |     4992 |     14 |  0.000000E+00 |             f


In [1]:
# from mitigate_disparity import mitigate_disparity

# est = mitigate_disparity(
#     dataset='data/mimic/development_dataset.train.csv',
#     protected_features=[
#         'ethnicity',
#         'gender',
#         'insurance'
#     ],
#     starting_point = 'checkpoint.25e21a10-80f0-4c4c-8b26-83243e32bd2a.pkl'
# )

ModuleNotFoundError: No module named 'mitigate_disparity'

## Visualize final front

Once training is done, we can view a set of candidate models. 
The red dot indicates the model that was selected. 
In addition to the default "PseudoWeights" approach, FOMO provides other multi-criteria decsion making (MCDM) algorithms via pymoo.

In [None]:
import dill
with open('checkpoint.25e21a10-80f0-4c4c-8b26-83243e32bd2a.pkl', 'rb') as f:
    alg = dill.load(f)
alg

In [None]:
import pickle
with open('estimator.pkl','rb') as f:
    est = pickle.load(f)
est.plot().show()

# save video of optimization

In [None]:
from pyrecorder.recorder import Recorder
from pyrecorder.writers.video import Video
# from pyrecorder.writers.streamer import Streamer
from pymoo.visualization.scatter import Scatter
import matplotlib.pyplot as plt
with open('estimator.pkl','rb') as f:
    est = pickle.load(f)
# use the video writer as a resource
filename = "xgb_nsga3_mlp.mp4"
# from pyrecorder.writers.gif import GIF
# result = alg.result()
result = est.res_

with Recorder(Video(filename, fps=10)) as rec:
# with Recorder(GIF(filename, duration=10)) as rec:
    # for each algorithm object in the history
    for entry in result.history:
        sc = Scatter(title=("Gen %s" % entry.n_gen),
                     labels=['Overall False Positive Rate (FPR)', 'Subgroup False Negative Rate (FNR) Violation']
                    )
        sc.add(entry.pop.get("F"))
#         sc.add(entry.pop.get("F"), plot_type="line", color="black", alpha=0.7)
        sc.do()
        plt.xlim([0.045, 0.085])
        plt.ylim([0.04, 0.41])
        # finally record the current visualization to the video
        rec.record()
    

# check test set performance

In [None]:
from utils import make_measure_dataset
import pandas as pd

import pickle
with open('estimator.pkl','rb') as f:
    est = pickle.load(f)
    
df_test = pd.read_csv('data/mimic/development_dataset.test.csv')
X_test = df_test.drop(columns='binary outcome')
y_test = df_test['binary outcome']
make_measure_dataset(est, 'fomo', X_test, y_test)

In [None]:
from measure_disparity import measure_disparity
df_fairness = measure_disparity('fomo_model_mimic4_admission.csv')