# Machine Learning Model Validation

June 21-23, 2023

This demo (based on BikeSharing data, a regression task) covers:

- Credit scoring using fairness test

In [1]:
!pip install piml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting piml
  Downloading PiML-0.5.0.post1-cp310-none-manylinux_2_17_x86_64.whl (11.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.2/11.2 MB[0m [31m70.4 MB/s[0m eta [36m0:00:00[0m
Collecting lime>=0.2.0.1 (from piml)
  Downloading lime-0.2.0.1.tar.gz (275 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m275.7/275.7 kB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting shap>=0.39.0 (from piml)
  Downloading shap-0.41.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (572 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m572.6/572.6 kB[0m [31m29.2 MB/s[0m eta [36m0:00:00[0m
Collecting pygam==0.8.0 (from piml)
  Downloading pygam-0.8.0-py2.py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31

## Load and Prepare data

Initilaize a new experiment by `piml.Experiment()`

In [2]:
from piml import Experiment
exp = Experiment()

Choose SimuCredit

In [3]:
exp.data_loader()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Data', 'CoCircles', 'Friedman', 'BikeShar…

Exclude features one-by-one: "Gender", "Race" (demographic variables)

In [4]:
exp.data_summary()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…

VBox(children=(HTML(value='Data Shape:(20000, 10)'), Tab(children=(Output(), Output()), _dom_classes=('data-su…

Prepare dataset with default settings

In [5]:
exp.data_prepare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(HBox(children=(VBox(children=(HTML(value='<p>Target Variable:</p>'), HTML(value='<p>Split Metho…

## Train Intepretable Models

- Train GLM and XGB2 model with default settings

- Register the fitted models

In [6]:
exp.model_train()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Box(children=(Box(children=(HTML(value="<h4 style='margin: 10px 0px;'>Choose Model</h4>"), Box(…

Manually train and register an XGBoost with `max_depth`=7

In [7]:
from xgboost import XGBClassifier

exp.model_train(XGBClassifier(max_depth=7), name='XGB7')

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

## Fairness Testing

Choose XGB2.

- Switch to the "Setting" tab, and config the reference and protected groups.

    - Set **Add Category** = "Gender", select "1.0" as reference, select "0.0" as protected, then click "Add".
    - Set **Add Category** = "Race",  select "1.0" as reference, select "0.0" as protected, then click "Add".


In [8]:
exp.model_fairness()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='40%'), options=('Select Model', 'GLM', 'XGB2', 'XG…

- Repeat the model selection and setting configuration.

- Switch to the "Metrics" tab:

    - Select a metric (AIR, by default) and set the threshold. (e.g. 0.8)
    - Set the favorable threshold (0.5, by defaut) and favorable class. (1 or 0)

In [9]:
exp.model_fairness()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='40%'), options=('Select Model', 'GLM', 'XGB2', 'XG…

- Repeat the model selection and setting configuration.

- Switch to the "Segmented Metrics" tab:

    - Select the Balance as the segment feature and the metric AIR, and set the metric threshold.
    - If the segment feature is numerical, set the number of bins. (5 by default)

We can find that the higher balance, the lower AIR of Gender and Race.

In [10]:
exp.model_fairness()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='40%'), options=('Select Model', 'GLM', 'XGB2', 'XG…

- Repeat the model selection and setting configuration.

- Switch to the "Binning" tab:

   - Select a fairness metric (AIR by default) and a performance metric (F1)
   - Select an attribute (Balance), binning method (Quantile by default) and number of bins (5 by default)
   - Click the button "ADD" to apply the binning setting to the data.
   - Click button "CLEAR ALL" could remove all the record.

In [11]:
exp.model_fairness()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='40%'), options=('Select Model', 'GLM', 'XGB2', 'XG…

- Repeat the model selection and setting configuration.

- Switch to the "Thresholding" tab:

   - Select a fairness metric (AIR by default) and a performance metric. (ACC by default)
   - Set the favorable threshold and class.
   - The number of threshold values is 20. (default for low-code)
   - Check the fairness and performance metrics for varying thresholds.

For this model, when we choose threshold as 0.37, the model can get both good fairness and performance.

In [12]:
exp.model_fairness()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='40%'), options=('Select Model', 'GLM', 'XGB2', 'XG…

## Fairness Testing Comparison

Choose GLM, XGB2, and XGB7.

- Switch to the "Setting" tab, and config the reference and protected groups.

    - Set **Add Category** = "Gender", select "1.0" as reference, select "0.0" as protected, then click "Add".
    - Set **Add Category** = "Race",  select "1.0" as reference, select "0.0" as protected, then click "Add".

- Switch to the "Metrics" tab:

    - Select a metric (AIR, by default) and set the threshold. (e.g. 0.8)
    - Set the favorable threshold (0.5, by defaut) and favorable class. (1 or 0)

In [13]:
exp.model_fairness_compare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'GLM', 'XGB2', 'XG…

- Repeat the model selection and setting configuration.

- Switch to the "Segment" tab:
    - Select the segment feature and the metric, and set the metric threshold.
    - If the segment feature is numerical, set the number of bins. (5 by default)

In [14]:
exp.model_fairness_compare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'GLM', 'XGB2', 'XG…