# Fairness Simulation Study 2

This example demonstrates the use of PiML for fairness testing. We first simulate a credit decisioning data with hypothesized features `Mortgage`, `Balance`, `AmountPastDue`, `CreditInquiry`, as well as demographic features `Gender` and `Race`. The response `Approved` is a binary indicator, and this is a classification problem. 

**[Optional for Google Colab] Installing PiML**

1. Run `!pip install piml` to install the latest version of PiML
2. In Colab, you'll need restart the runtime in order to use newly installed PiML version.

In [None]:
!pip install piml

## Generate Simulation Data

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

def sigmoid(beta, x):
    y = 1/(1 + np.exp(beta[0]+beta[1]*x))
    return y

def gen_data(seed=7):
    #simulation data generation
    np.random.seed(seed)
    N = 10000
    A = np.random.binomial(1,0.5, (N,1))
    B0 = np.random.binomial(1,0.5, (N,1))

    x1 = (0.3*A) + np.random.normal(0,0.5,(N,1))
    x20 = np.random.normal(0,0.5,(N,1))
    x2 = (x20 >0)*x20
    x30 = B0
    x3 = (x30 > 0)*1.0
    x40 = (np.random.lognormal(1,0.7,(N,1))-1)/2.0
    x4d = np.round(x40).astype(int)
    x4 = (x40 - np.mean(x40))/(2*np.std(x40))
    
    yl = x1 - sigmoid((0,2),x2) + x3/2 - x4 + np.random.normal(0,0.5,(N,1))
    y = np.reshape((yl>np.mean(yl))*1,(N,1))

    scaler = MinMaxScaler()
    scaler.fit(x1)
    x1d = np.round(scaler.transform(x1)*10000).astype(int)
    scaler.fit(x2)
    x2d = np.round(scaler.transform(x2)*10000).astype(int)
    x2f = (x2d > x1d)*x1d + (x2d<=x1d)*x2d
    v2 = np.random.binomial(1,0.15, (N,1))
    B = ((B0+v2)>1)*0 + ((B0+v2)<=1)*(B0+v2)

    x = np.hstack((x3,x1d,x2f,x4d,A,B))
    df = pd.DataFrame(data=x, columns = ['Mortgage', 'Balance', 'AmountPastDue', 'CreditInquiry', 'Gender','Race'])
    df = pd.concat([df, pd.DataFrame(data=y, columns = ['Approved'])], axis =1)
    return df

df = gen_data(seed=7)

## Load and Prepare data

In [2]:
from piml import Experiment
exp = Experiment()

In [3]:
# Manually load data to piml
exp.data_loader(data=df)

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

Unnamed: 0,Mortgage,Balance,AmountPastDue,CreditInquiry,Gender,Race,Approved
0,1.0,3635.0,482.0,2.0,0.0,0.0,0.0
1,0.0,6265.0,0.0,1.0,1.0,0.0,0.0
2,1.0,4432.0,0.0,4.0,0.0,1.0,0.0
3,1.0,3092.0,0.0,1.0,1.0,1.0,0.0
4,1.0,8025.0,135.0,1.0,1.0,0.0,1.0
...,...,...,...,...,...,...,...
9995,1.0,4503.0,687.0,3.0,0.0,1.0,0.0
9996,1.0,5201.0,0.0,0.0,1.0,1.0,1.0
9997,0.0,5566.0,0.0,1.0,1.0,0.0,1.0
9998,1.0,3755.0,0.0,1.0,1.0,1.0,0.0


In [4]:
# Exclude features one-by-one: "Mortgage", "Gender", "Race" (demographic variables); 
# Excluded features will show in grey color in the table.
exp.data_summary()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…

VBox(children=(HTML(value='Data Shape:(10000, 7)'), Tab(children=(Output(), Output()), _dom_classes=('data-sum…

In [5]:
# Prepare dataset with Test Ratio = 0.33
exp.data_prepare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(HBox(children=(VBox(children=(HTML(value='<p>Target Variable:</p>'), HTML(value='<p>Test Ratio:…

In [6]:
# Exploratory Data Analysis
exp.eda()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

HBox(children=(VBox(children=(HTML(value='<h4>Univariate:</h4>'), HBox(children=(Dropdown(layout=Layout(width=…

## Train ML Model(s)

In [7]:
# Choose GLM default settings, click run; 
# When training is finished, register the model.
exp.model_train()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Box(children=(Box(children=(HTML(value="<h4 style='margin: 10px 0px;'>Choose Model</h4>"), Box(…

## Fairness Testing

**Suggested Procudure:**

1. First select a registered model (in this case, GLM)
2. Group Setting:
    - Set Add Category = "Gender", select "1.0" as reference, select "0.0" as protected, then "Add"
    - Set Add Category = "Race",  select "1.0" as reference, select "0.0" as protected, then "Add" 
3. Metrics Tab:
    - Select a metric (AIR, by default) and set the threshold (e.g. 0.8)
    - Set the favorable threshold (0.5, by defaut) and favorable class (1 or 0).
4. Segmented Metrics:
    - Select the segment feature and the metric, and set the metric threshold
    - If the segment feature is numerical, set the number of bins (5 by default)
5. Debiasing/unfairness mitigation by Threshold Adjustment
   - Select a fairness metric (AIR by default) and a performance metric (ACC by default)
   - Set the favorable threshold and class  
   - The number of threshold values is 20 (default for low-code) 
   - Check the fairness and performance metrics for varying thresholds
5. Debiasing/unfairness mitigation by Feature Removal
   - Similar to Thresholding, select metrics and set favorable threshold/class
   - Check the fairness and performance metrics upon removal of one feature at a time

In [8]:
exp.model_fairness()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='40%'), options=('Select Model', 'GLM'), style=Desc…