# Working with other Models and Data in PiML

This notebook illustrates how to load data and fit models and incorporate work with them using PiML.

The models imported

It is based on `Example_TaiwanCredit.ipynb` notebook from the PiML repository, which uses the TaiwanCredit data from the UCI repository; details [here](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients).

The response `FlagDefault` is binary and it is a classification problem.


In [1]:
!pip install piml



## Load and Prepare Data

Rather than using the PiML data loader, in this example we read the data directly using Pandas, do some manipulation, and then load the data to PiML.

In [2]:
from piml import Experiment
import pandas as pd
import pickle

## Initialize PiML
exp = Experiment()

Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)


In [3]:
## Mount the Google Drive, and navigate to the location where the data is located
from google.colab import drive
drive.mount('/content/gdrive')

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [4]:
%cd 'gdrive/My Drive/'

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

/content/gdrive/My Drive


In [5]:
# Choose TaiwanCredit
data = pd.read_csv("piml/PiML-Toolbox/datasets/TaiwanCredit.csv")
data.columns

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

Index(['LIMIT_BAL', 'SEX', 'EDUCATION', 'MARRIAGE', 'AGE', 'PAY_1', 'PAY_2',
       'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2',
       'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1',
       'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6',
       'FlagDefault'],
      dtype='object')

In [6]:
## Since we will only use a subset of the columns, we will remove some variables
## before loading in PiML
data.drop(labels=['LIMIT_BAL', 'SEX', 'EDUCATION', 'MARRIAGE', 'AGE',],
          axis=1, inplace=True)
data.columns

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

Index(['PAY_1', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1',
       'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6',
       'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6',
       'FlagDefault'],
      dtype='object')

In [7]:
## Load the data in PiML
exp.data_loader(data=data)

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

Unnamed: 0,PAY_1,PAY_2,PAY_3,PAY_4,PAY_5,PAY_6,BILL_AMT1,BILL_AMT2,BILL_AMT3,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,FlagDefault
0,2.0,2.0,-1.0,-1.0,0.0,0.0,3.592621,3.491782,2.838849,0.000000,0.000000,0.000000,0.000000,2.838849,0.000000,0.000000,0.000000,0.000000,1.0
1,-1.0,2.0,0.0,0.0,0.0,2.0,3.428621,3.237041,3.428621,3.514946,3.538574,3.513484,0.000000,3.000434,3.000434,3.000434,0.000000,3.301247,1.0
2,0.0,0.0,0.0,0.0,0.0,0.0,4.465977,4.146996,4.132260,4.156307,4.174612,4.191731,3.181558,3.176381,3.000434,3.000434,3.000434,3.699057,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,4.672015,4.683353,4.692776,4.452016,4.461799,4.470528,3.301247,3.305351,3.079543,3.041787,3.029384,3.000434,0.0
4,-1.0,0.0,-1.0,0.0,0.0,0.0,3.935406,3.753660,4.554319,4.320997,4.282101,4.281760,3.301247,4.564453,4.000043,3.954291,2.838849,2.832509,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29995,0.0,0.0,0.0,0.0,0.0,0.0,5.276345,5.285143,5.318827,4.944507,4.494683,4.203604,3.929470,4.301052,3.699317,3.484015,3.699057,3.000434,0.0
29996,-1.0,-1.0,-1.0,-1.0,0.0,0.0,3.226342,3.262214,3.544440,3.953276,3.715251,0.000000,3.264345,3.547405,3.954194,2.113943,0.000000,0.000000,0.0
29997,4.0,3.0,2.0,-1.0,0.0,0.0,3.552181,3.525951,3.440752,4.319710,4.313509,4.286861,0.000000,0.000000,4.342442,3.623353,3.301247,3.491502,1.0
29998,1.0,-1.0,0.0,0.0,0.0,-1.0,-3.216430,4.894205,4.882553,4.722428,4.073938,4.689708,4.933998,3.532754,3.071514,3.284882,4.723989,3.256477,1.0


In [8]:
# Following the previous example, use only payment history attributes:
# Pay_1~6, BILL_AMT1~6 and PAY_AMT1~6
# Keep the response `FlagDefault`, while excluding all other variables
exp.data_summary()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…

VBox(children=(HTML(value='Data Shape:(30000, 19)'), Tab(children=(Output(), Output()), _dom_classes=('data-su…

In [9]:
# Prepare dataset with default settings
exp.data_prepare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

VBox(children=(HBox(children=(VBox(children=(HTML(value='<p>Target Variable:</p>'), HTML(value='<p>Split Metho…

In [10]:
exp.eda()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(VBox(children=(HTML(value='<h4>Univariate:</h4>'), HBox(children=(Dropdown(layou…

## Train Intepretable Models



In [11]:
# Train a GLM and GAM model using PiML
exp.model_train()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Box(children=(Box(children=(HTML(value="<h4 style='margin: 10px 0px;'>Choose Model</h4>"), Box(…

## Load

Here we load two of the models from our `TreeEnsembleExample.ipynb` example, and (re)train them using the PiML data sets, in order to use them in addition with the models trained using the interface above.

Retraining is one approach to register these models with PiML; there are other approaches, which are demonstrated in the [PiML Example](https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_ExternalModels.ipynb)

In [12]:
## Load two of the models
with open('piml/XGB1.pkl', 'rb') as handle:
  xgb1 = pickle.load(handle)

with open('piml/XGB2.pkl', 'rb') as handle:
  xgb2 = pickle.load(handle)


HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

In [13]:
## Train and Register this Model
exp.model_train(xgb1, name='ext_XGB_1')

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

In [14]:
## Train and Register this model
exp.model_train(xgb2, name='ext_XGB_2')

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

## PiML Analysis

Now we will use the models we trained in PiML and the ones we imported to do some analysis with the PiML functions.

## Model Comparison and Benchmarking

Here we compare our XGB models that we loaded to the models trained in PiML.

In [None]:
exp.model_compare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'GLM', 'GAM', 'ext…

## Interpretability and Explainability

Note that not all functionality may be available for all functionality.

In [15]:
# Model-specific inherent interpretation including feature importance, main effects and pairwise interactions.
## (To be discussed in future sessions)
exp.model_interpret()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'GAM', 'GLM'), style=DescriptionS…

In [25]:
# Model-agnostic post-hoc explanation by Permutation Feature Importance, PDP (1D and 2D) vs. ALE (1D and 2D), LIME vs. SHAP
## (To be discussed in future sessions)
exp.model_explain()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'GLM', 'GAM', 'ext_XGB_1', 'ext_X…