# Machine Learning Model Validation

June 21-23, 2023

This demo (based on TaiwanCredit data, a classification task) covers:

- ReLU deep feedforward neural networks (ReLU-DNN) and its interpretation by local linear models.

- A case study based on the TaiwanCredit dataset.

## Install PiML Toolbox

- Run `!pip install piml` to install the latest version of PiML.
- In Google Colab, we need restart the runtime in order to use newly installed version.

In [1]:
!pip install piml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## Load and Prepare Data

Initilaize a new experiment by `piml.Experiment()`

In [2]:
from piml import Experiment
exp = Experiment()

Choose TaiwanCredit

In [3]:
exp.data_loader()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Data', 'CoCircles', 'Friedman', 'BikeShar…

- Use only payment history attributes: Pay_1 to 6, BILL_AMT1 to 6 and PAY_AMT1 to 6

- Keep the response `FlagDefault`, while excluding all other variables, i.e., LIMIT_BAL, AGE, SEX, EDUCATION, and MARRIAGE

In [4]:
exp.data_summary()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…

VBox(children=(HTML(value='Data Shape:(30000, 24)'), Tab(children=(Output(), Output()), _dom_classes=('data-su…

Prepare dataset with default settings

In [5]:
exp.data_prepare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(HBox(children=(VBox(children=(HTML(value='<p>Target Variable:</p>'), HTML(value='<p>Split Metho…

## Train Intepretable Models

- Train a ReLU-DNN with default settings.
- Train another ReLU-DNN, and customize it with

    - **Model name**: Sparse-ReLU-DNN
    - **L1_regularization**: 0.0008

- Register ReLU-DNN and Sparse-ReLU-DNN.

In [6]:
exp.model_train()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Box(children=(Box(children=(HTML(value="<h4 style='margin: 10px 0px;'>Choose Model</h4>"), Box(…

## Global Model Interpretation

Choose ReLU-DNN or Sparse-ReLU-DNN, see the details [here](https://selfexplainml.github.io/PiML-Toolbox/_build/html/guides/models/reludnn.html).

- Switch to the "Global-Interpretability" tab.

- Try the following options to view the different aspects of the model:

    - **Feature**: the univeriate feature for the local linear profile plot.

    - **Feature 1/2**: the bivariate features for the pairwise local liner model plot.

    - **Shown in original scale**: check box can be enabled to display the features in their original scale, instead of the Minmax scaled between 0 to 1.

- The displayed results include:

    - **Feature Importance**: displays the top-10 features' importance.

    - **Parallel Corrodinate Plot**: visualizes coefficients of different local linear models (LLMs), where each line represents a single LLM.

    - **Local Linear Profiles**:

        - 1D: displays the marginal linear functions upon centering.

        - 2D: reveals how the LLM coefficient of one feature would change as another feature value changes.

In [7]:
exp.model_interpret()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'Sparse-ReLU-DNN', 'ReLU-DNN'), s…

## Local Model Interpretation

Choose ReLU-DNN or Sparse-ReLU-DNN.

- Switch to the "Local-Interpretability" tab.

- Try the following options to view the different aspects of the model:

    - **Sample Index**: choose the sample index (in the training set) to be interpreted.

    - **Centered**: whether to display the results by subtracting the mean for each feature.

    - **Shown in original scale**: enable the check box to display the features in their original scale, instead of the Minmax scaled between 0 to 1.

- The displayed results include:

    - **Local Exact Interpretability**: displays the top-10 features' contribution of the sample to be interpreted, see the details [here](https://selfexplainml.github.io/PiML-Toolbox/_build/html/guides/models/reludnn.html#local-interpretation).

In [9]:
exp.model_interpret()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'Sparse-ReLU-DNN', 'ReLU-DNN'), s…

## LLM Summary Table

Choose ReLU-DNN or Sparse-ReLU-DNN.

- Switch to the "LLM-Summary" tab.

- The displayed results include:

    - **LLM Summary Table**: displays the summary statistics for each LLM, see details [here](https://selfexplainml.github.io/PiML-Toolbox/_build/html/guides/models/reludnn.html#llm-summary-table).

    - **Violin Plot of LLM Coefficients**: shows the LLM coefficient distribution per feature weighted by the sample size of each LLM, see details [here](https://selfexplainml.github.io/PiML-Toolbox/_build/html/guides/models/reludnn.html#llm-summary-table).

In [8]:
exp.model_interpret()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'Sparse-ReLU-DNN', 'ReLU-DNN'), s…