# Machine Learning Model Validation

June 21-23, 2023

This demo (based on BikeSharing data, a regression task) covers:

- Global explanation: PFI, PDP, ALE

- Local explanation: LIME and SHAP

## Install PiML Toolbox

- Run `!pip install piml` to install the latest version of PiML.
- In Google Colab, we need restart the runtime in order to use newly installed version.

In [1]:
!pip install piml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting piml
  Downloading PiML-0.5.0.post1-cp310-none-manylinux_2_17_x86_64.whl (11.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.2/11.2 MB[0m [31m63.8 MB/s[0m eta [36m0:00:00[0m
Collecting lime>=0.2.0.1 (from piml)
  Downloading lime-0.2.0.1.tar.gz (275 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m275.7/275.7 kB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting shap>=0.39.0 (from piml)
  Downloading shap-0.41.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (572 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m572.6/572.6 kB[0m [31m42.1 MB/s[0m eta [36m0:00:00[0m
Collecting pygam==0.8.0 (from piml)
  Downloading pygam-0.8.0-py2.py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31

## Load and Prepare Data

Initilaize a new experiment by `piml.Experiment()`

In [2]:
from piml import Experiment
exp = Experiment()

Choose BikeSharing

In [3]:
exp.data_loader()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Data', 'CoCircles', 'Friedman', 'BikeShar…

Exclude these features one-by-one: "yr", "mnth", "temp" (highly correlated with others)

In [5]:
exp.data_summary()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…

VBox(children=(HTML(value='Data Shape:(17379, 13)'), Tab(children=(Output(), Output()), _dom_classes=('data-su…

Prepare dataset with default settings

In [6]:
exp.data_prepare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(HBox(children=(VBox(children=(HTML(value='<p>Target Variable:</p>'), HTML(value='<p>Split Metho…

## Train Intepretable Models

- Choose XGB2 and click on the "RUN" button.
- As training is finished, choose XGB2 in the second dropdown and click on "Register" button.

In [7]:
exp.model_train()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Box(children=(Box(children=(HTML(value="<h4 style='margin: 10px 0px;'>Choose Model</h4>"), Box(…

## Global Explanation

Choose XGB2.

- Switch to the "Global-Explainability" tab.

- Availble options for PDP and ALE:

    - **Univariate Feature 1**: choose the feature of interest to display the one-way PDP or ALE.

    - **Bivariate Feature 1/2**: choose two features of interest to display the two-way PDP or ALE.

    - **Shown in original scale**: enable the check box to display the features in their original scale, instead of the Minmax scaled between 0 to 1.

- The displayed results include:

    - **Permutation Feature Importance** ([PFI](https://selfexplainml.github.io/PiML-Toolbox/_build/html/guides/explain/pfi.html)): measures the influence of individual features on the model by calculating the increase in loss when the feature set, typically one feature, is permuted.

    - **Partial Dependence Plot** ([PDP](https://selfexplainml.github.io/PiML-Toolbox/_build/html/guides/explain/pdp.html)): visualizes the relationship between a subset of features and the predicted response.

    - **Accumulated Local Effects** ([ALE](https://selfexplainml.github.io/PiML-Toolbox/_build/html/guides/explain/ale.html)): is similar to PDP, but offers a quicker and unbiased alternative to PDP when features are correlated.



In [8]:
exp.model_explain()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'XGB2'), style=DescriptionStyle(d…

## Local Explanation

Choose XGB2.

- Switch to the "Local-Explainability" tab.

- Availble options:

    - **Sample Index**: choose the sample index (in the training set) to be explained.

    - **Centered**: whether to display the LIME results by subtracting the mean for each feature.

    - **Shown in original scale**: enable the check box to display the features in their original scale, instead of the Minmax scaled between 0 to 1.

- The displayed results include:

    - **Local Interpretable Model-Agnostic Explanation** ([LIME](https://selfexplainml.github.io/PiML-Toolbox/_build/html/guides/explain/lime.html)): explains the model by surrogate interpretable model, such as a Lasso, to explain how the original model makes predictions for a given input sample.

    - **SHapley Additive exPlanations** ([SHAP](https://selfexplainml.github.io/PiML-Toolbox/_build/html/guides/explain/shap.html)): explains the output of any model by computing the contribution of each feature to the final prediction.

In [9]:
exp.model_explain()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'XGB2'), style=DescriptionStyle(d…