# Machine Learning Model Validation</font>

June 21-23, 2023

This demo (based on BikeSharing data, a regression task) covers:

- Resilience test

# Install PiML Toolbox

- Run `!pip install piml` to install the latest version of PiML.
- In Google Colab, we need restart the runtime in order to use newly installed version.

In [1]:
!pip install piml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting piml
  Downloading PiML-0.5.0.post1-cp310-none-manylinux_2_17_x86_64.whl (11.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.2/11.2 MB[0m [31m66.2 MB/s[0m eta [36m0:00:00[0m
Collecting lime>=0.2.0.1 (from piml)
  Downloading lime-0.2.0.1.tar.gz (275 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m275.7/275.7 kB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting shap>=0.39.0 (from piml)
  Downloading shap-0.41.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (572 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m572.6/572.6 kB[0m [31m40.9 MB/s[0m eta [36m0:00:00[0m
Collecting pygam==0.8.0 (from piml)
  Downloading pygam-0.8.0-py2.py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31

## Load and Prepare Data

Initilaize a new experiment by `piml.Experiment()`

In [2]:
from piml import Experiment
exp = Experiment()

Choose BikeSharing

In [3]:
exp.data_loader()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Data', 'CoCircles', 'Friedman', 'BikeShar…

Exclude these features one-by-one: "yr", "mnth", "temp" (highly correlated with others)

In [4]:
exp.data_summary()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…

VBox(children=(HTML(value='Data Shape:(17379, 13)'), Tab(children=(Output(), Output()), _dom_classes=('data-su…

Prepare dataset with default settings

In [5]:
exp.data_prepare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(HBox(children=(VBox(children=(HTML(value='<p>Target Variable:</p>'), HTML(value='<p>Split Metho…

## Train Intepretable Models

- Train GLM, GAM, and XGB2 model with default settings

- Register the fitted models

In [6]:
exp.model_train()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Box(children=(Box(children=(HTML(value="<h4 style='margin: 10px 0px;'>Choose Model</h4>"), Box(…

## Resilience Test

Choose XGB2.

- Switch to the "Resilience" tab, see the details [here](https://selfexplainml.github.io/PiML-Toolbox/_build/html/guides/testing/resilience.html).

- Try the following options:

    - **Method**: choose the method for ranking the worst samples, available options include "worst-sample", "outer-sample", "worst-cluster", and "hard-sample".

    - **Immutable Feature**: specify the immutable feature, which means we select the worst samples separately within each bin of the immutable feature.

    - **Worst Ratio**: choose the worst ratio, within 0.1, 0.2, ... 0.9.

    - **Metric**: choose the performance metric, including MSE, MAE, or R2 for regression tasks.

    - **Plot Feature**: choose the feature of interest, to display the distributional difference of worst samples and full test sample.

    - **PSI Buckets**: choose the bucketing method when calculating the "PSI" distance metric, available options include "uniform" and "quantile".

    - **Distance Metric**: choose the distance metrics, available options include "PSI", "WD1", and "KS".

    - **Shown in original scale**: check box can be enabled to display the features in their original scale, instead of the Minmax scaled between 0 to 1.

- The displayed results include:

    - Marginal density / histogram plot showing the difference of full sample and worst samples.

    - Distribution distance per feature between full sample and worst sample.

    - Resilience performance against different ratio of worst samples.

In [7]:
exp.model_diagnose()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'GLM', 'GAM', 'XGB2'), style=Desc…

## Model Comparison

Choose GLM, XGB2, and XGB7.

- Switch to the "Resilience" tab.

- Customize the settings and get the results.

In [8]:
exp.model_compare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'GLM', 'GAM', 'XGB…