# PiML Toolbox: Low-code Demo for CaliforniaHousing Data

This example notebook demonstrates how to use PiML in its low-code mode for developing machine learning models for the CaliforniaHousing data, which consists of 20,640 samples and 9 features, fetched by sklearn.datasets (see details here). PiML can load three versions of this data, including _raw, _trim1 (trimming only AveOccup) and _trim2 (trimming AveRooms, AveBedrms, Population and AveOccup). The _trim2 version is used in this example.

The response MedHouseVal (median house price per block in log scale) is continuous and it is a regression problem.

# Stage 0: Install PiML package on Google Colab

1. Run `!pip install piml` to install the latest version of PiML
2. In Colab, you'll need restart the runtime in order to use newly installed PiML version.

In [None]:
!pip install piml

# Stage 1: Initialize an experiment, Load and Prepare data <a name="expdata"></a>

In [None]:
from piml import Experiment
exp = Experiment(platform="colab")

In [None]:
# Choose CaliforniaHousing_trim2
exp.data_loader()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Data', 'CoCircles', 'Friedman', 'BikeShar…

In [None]:
exp.data_summary()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…

VBox(children=(HTML(value='Data Shape:(20640, 9)'), Tab(children=(Output(), Output()), _dom_classes=('data-sum…

In [None]:
exp.data_prepare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(HBox(children=(VBox(children=(HTML(value='<p>Target Variable:</p>'), HTML(value='<p>Sample Weig…

In [None]:
exp.feature_select()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(HBox(children=(VBox(children=(HTML(value='<p>Top Percentage:</p>'),), layout=Layout(width='100p…

In [None]:
exp.eda()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

HBox(children=(VBox(children=(HTML(value='<h4>Univariate:</h4>'), HBox(children=(Dropdown(layout=Layout(width=…

# Stage 2. Train intepretable models <a name="modeltrain"></a>



In [None]:
# First, choose EBM, GAMI-Net and ReLU-DNN models with default settings, click "RUN" to train;
# Then, choose only ReLU-DNN (unselect other models) and customize it with L1_regularization = `0.0005` and Model name = `ReLU-DNN-L1=0.0005`, click "RUN" to train;
# Finally, register the four trained models one by one.
exp.model_train()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Box(children=(Box(children=(HTML(value="<h4 style='margin: 10px 0px;'>Choose Model</h4>"), Box(…

# Stage 3. Explain and Interpret<a name="modelinterpret"></a>

In [None]:
# Model-agnostic post-hoc explanation by Permutation Feature Importance, PDP (1D and 2D) vs. ALE (1D and 2D), LIME vs. SHAP
exp.model_explain()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'EBM', 'GAMI-Net', 'ReLU-DNN-L1=0…

In [None]:
# Model-specific inherent interpretation including feature importance, main effects and pairwise interactions.
exp.model_interpret()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'EBM', 'GAMI-Net', 'ReLU-DNN-L1=0…

# Stage 4. Diagnose and Compare

In [None]:
exp.model_diagnose()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'EBM', 'GAMI-Net', 'ReLU-DNN-L1=0…

In [None]:
# Choose EBM, GAMI-Net and ReLU-DNN-L1=0.0005
exp.model_compare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'EBM', 'GAMI-Net',…

# Stage 5. Fit and/or register an arbitrary model

In [None]:
# Fit and register an arbitrary model with existing dataset generated from `exp.data_loader()` and `exp.data_prepare()`
from lightgbm import LGBMRegressor
lgbm_1 = LGBMRegressor(max_depth=2)
exp.model_train(lgbm_1, name='LGBM_1')

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value="<p class='notification info'>Register LGBM_1 Done</p>")

In [None]:
# Choose EBM, ReLU-DNN-L1=0.0005, LGBM_1
exp.model_compare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'EBM', 'GAMI-Net',…

In [None]:
# Register an arbitrary trained model with arbitrary external dataset

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from lightgbm import LGBMRegressor

lgbm_2 = LGBMRegressor(max_depth=7)
data = fetch_california_housing()
feature_names = data.feature_names
target_name = data.target_names[0]
train_x, test_x, train_y, test_y = train_test_split(data.data, data.target, test_size=0.2)
lgbm_2.fit(train_x, train_y)

pipeline = exp.make_pipeline(model=lgbm_2, train_x=train_x, train_y=train_y.ravel(),
                             normalize_strategy=None, encode_strategy=None,
                             test_x=test_x, test_y=test_y.ravel(),
                             feature_names=feature_names, feature_types=None, target_name=target_name) 
exp.register(pipeline=pipeline, name='LGBM_2')

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value="<p class='notification info'>Register LGBM_2 Done</p>")

In [None]:
# Choose LGBM_2
exp.model_diagnose()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'EBM', 'GAMI-Net', 'ReLU-DNN-L1=0…