# PiML Toolbox for Model Development and Validation: Low-code Demo

PiML (Python Interpretable Machine Learning) is a new Python toolbox for IML model development and validation. Through low-code automation and high-code programming, PiML supports various machine learning models in the following two categories:

- **Inherently interpretable models**: 
  1. EBM: Explainable Boosting Machine (Nori, et al. 2019; Lou, et al. 2013)
  2. GAMI-Net: Generalized Additive Model with Struatured Interactions (Yang, Zhang and Sudjianto, 2021)
  3. ReLU-DNN: Deep ReLU Networks using Aletheia Unwrapper (Sudjianto, et al. 2020)

- **Arbitrary black-box models**，e.g.
  1. LightGBM or XGBoost of varying depth
  2. RandomForest of varying depth
  3. DNNs with softmax/tanh activations

This example notebook demonstrates how to use PiML in its low-code mode for developing the above listed models, interpreting them and testing them. The toolbox has the following built-in datasets for demo purposes. 

- **CoCircles** classification data: simulated by `sklearn.datasets.make_make_circles(n_samples=2000, noise=0.1)`; see [details](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html)  
- **Friedman** regression data: simulated by `sklearn.datasets.make_friedman1(n_samples=2000, n_features=10, and noise=0.1)`; see [details](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman1.html)   
- **BikeSharing** regression data from UCI repository: consisting of 17,389 samples of hourly counts of rental bikes in Capital bikeshare system; see [details](https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset)
- **CaliforniaHousing** regression data: consisting of 20,640 samples and 9 features, fetched by `sklearn.datasets`; see [details](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html)  
- **TaiwanCredit** classification data fro UCI repository: consisting of 30,000 credit card clients in Taiwan from 200504 to 200509; see [details](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients)  


# Stage 0: Installing PiML package on Google Colab

1. Vist [https://github.com/SelfExplainML/PiML-Toolbox/releases/](https://github.com/SelfExplainML/PiML-Toolbox/releases/) to copy the address of the latest version of PiML wheel file;
2. Run the following piece of sript to download and install PiML v0.1.0;
3. In Colab, you may need restart the runtime in order to use newly installed PiML version.

In [1]:
!pip install wget
import wget
url = "https://github.com/SelfExplainML/PiML-Toolbox/releases/download/V0.1.0/PiML-0.1.0-cp37-cp37m-linux_x86_64.whl"
wget.download(url, 'PiML-0.1.0-cp37-cp37m-linux_x86_64.whl')
!pip install PiML-0.1.0-cp37-cp37m-linux_x86_64.whl

Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9675 sha256=e4b92d783957725627d952cb7fa25885cf70f33f7eee4a4c5176fc4b93cf5a8d
  Stored in directory: /root/.cache/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c13e3c02
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
Processing ./PiML-0.1.0-cp37-cp37m-linux_x86_64.whl
Collecting ipython==7.12.0
  Downloading ipython-7.12.0-py3-none-any.whl (777 kB)
[K     |████████████████████████████████| 777 kB 4.1 MB/s 
Collecting lime
  Downloading lime-0.2.0.1.tar.gz (275 kB)
[K     |████████████████████████████████| 275 kB 52.0 MB/s 
Collecting aletheia-dnn
  Downloading aletheia_dnn-1.3.4-cp37-none-manylinux_2_5_x86_64.whl (791 kB)
[K     |████████████████████████████████| 791 kB 47.2 MB/s 
Collecting xlrd>=1.2.0
  Dow

# Stage 1: Initialize an experiment, load and process data <a name="expdata"></a>

In [1]:
from piml import Experiment
exp = Experiment(platform="colab")

In [2]:
# Choose CaliforniaHousing
exp.data_loader()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

Dropdown(layout=Layout(width='20%'), options=('Select Data', 'CoCircles', 'Friedman', 'BikeSharing', 'TaiwanCr…

Output()

In [3]:
exp.data_summary()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…

VBox(children=(HTML(value='Data Shape:(20640, 9)'), Tab(children=(Output(), Output()), _dom_classes=('data-sum…

In [4]:
exp.eda()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

HBox(children=(VBox(children=(HTML(value='<h4>Univariate:</h4>'), HBox(children=(Dropdown(layout=Layout(width=…

In [5]:
exp.data_prepare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(HBox(children=(Box(children=(HTML(value='<p>Target Variable:</p>'),), layout=Layout(width='100p…

# Stage 2. Train intepretable models <a name="modeltrain"></a>



In [6]:
exp.model_train()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…

Box(children=(Box(children=(HTML(value='<h4>Choose Model</h4>'), Box(children=(HBox(children=(Checkbox(value=T…

# Stage 3. Interpret and explain <a name="modelinterpret"></a>

In [7]:
exp.model_explain()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'EBM', 'GAMI-Net', 'ReLU-DNN'), s…

In [8]:
exp.model_interpret()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'EBM', 'GAMI-Net', 'ReLU-DNN'), s…

# Stage 4. Diagnose and compare

In [9]:
exp.model_diagnose()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'EBM', 'GAMI-Net', 'ReLU-DNN'), s…

In [10]:
exp.model_compare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'EBM', 'GAMI-Net',…

#Stage 5. Register an arbitrary model ... 

In [11]:
from lightgbm import LGBMRegressor
pipeline = exp.make_pipeline(LGBMRegressor(max_depth=7))
pipeline.fit() 
exp.register(pipeline=pipeline, name='LGBM')

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value="<p class='notification info'>Register LGBM Done</p>")

In [12]:
# Choose EBM, GAMI-Net and LGBM
exp.model_compare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'EBM', 'GAMI-Net',…