This is the notebook associated with the blog post titled Interactive Explainable Machine Learning with SAS Viya, Streamlit and Docker

Install SWAT if you haven't done so already. Import the required modules

In [1]:
#!pip install swat
from swat import CAS, options
import pandas as pd
import numpy as np

Connect to CAS and load the required action sets

In [2]:
host = ""
port = ""
username = ""
password = ""

In [3]:
s = CAS(host, port, username, password)
s.loadActionSet('autotune')
s.loadactionset('aStore')
s.loadactionset('decisionTree')
s.loadactionset("explainModel")
s.loadactionset('table')

NOTE: Added action set 'autotune'.
NOTE: Added action set 'aStore'.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'explainModel'.
NOTE: Added action set 'table'.


Load and inspect the dataset

In [4]:
hmeq = pd.read_csv('hmeq.csv')
hmeq

Unnamed: 0,BAD,LOAN,MORTDUE,VALUE,REASON,JOB,YOJ,DEROG,DELINQ,CLAGE,NINQ,CLNO,DEBTINC
0,1,1100,25860.0,39025.0,HomeImp,Other,10.5,0.0,0.0,94.366667,1.0,9.0,
1,1,1300,70053.0,68400.0,HomeImp,Other,7.0,0.0,2.0,121.833333,0.0,14.0,
2,1,1500,13500.0,16700.0,HomeImp,Other,4.0,0.0,0.0,149.466667,1.0,10.0,
3,1,1500,,,,,,,,,,,
4,0,1700,97800.0,112000.0,HomeImp,Office,3.0,0.0,0.0,93.333333,0.0,14.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
5955,0,88900,57264.0,90185.0,DebtCon,Other,16.0,0.0,0.0,221.808718,0.0,16.0,36.112347
5956,0,89000,54576.0,92937.0,DebtCon,Other,16.0,0.0,0.0,208.692070,0.0,15.0,35.859971
5957,0,89200,54045.0,92924.0,DebtCon,Other,15.0,0.0,0.0,212.279697,0.0,15.0,35.556590
5958,0,89800,50370.0,91861.0,DebtCon,Other,14.0,0.0,0.0,213.892709,0.0,16.0,34.340882


Load the dataframe to a CASTable and train a model and perform hyperparameter optimization

In [5]:
s.upload(hmeq,casout={'name' : 'hmeqTest', 'caslib' : 'public','replace' : True})

result = s.autotune.tuneGradientBoostTree(
    trainOptions = {
        "table"   : {"name":'hmeqTest', 'caslib' : 'public'},
        "inputs"  : {'LOAN','MORTDUE','VALUE','YOJ','DEROG','DELINQ','CLAGE','NINQ','CLNO','DEBTINC','REASON', 'JOB'},
        "target"  : 'BAD',
        "nominal" : {'BAD','REASON', 'JOB'},
        "casout"  : {"name":"gradboosthmeqtest", "caslib":"public",'replace':True},
        "varImp" : True
    },
    tunerOptions={"seed":12345, "maxTime":60}
)

NOTE: Cloud Analytic Services made the uploaded file available as table HMEQTEST in caslib public.
NOTE: The table HMEQTEST has been created in caslib public from binary data uploaded to Cloud Analytic Services.
NOTE: Autotune is started for 'Gradient Boosting Tree' model.
NOTE: Autotune option SEARCHMETHOD='GA'.
NOTE: Autotune option MAXTIME=60 (sec.).
NOTE: Autotune option SEED=12345.
NOTE: Autotune objective is 'Misclassification Error Percentage'.
NOTE: Early stopping is activated; 'NTREE' will not be tuned.
NOTE: Autotune number of parallel evaluations is set to 4, each using 0 worker nodes.
NOTE: Automatic early stopping is activated with STAGNATION=4;  set EARLYSTOP=false to deactivate.
         Iteration       Evals     Best Objective  Elapsed Time
                 0           1             19.966          1.08
                 1          25             7.6063         17.50
                 2          47              7.047         39.90
                 3          68           

Promote the table with training data, export the astore and promote the astore to global scope. Important for the Streamlit portion

In [7]:
s.table.promote(name="hmeqTest", caslib='public',target="hmeqTest",targetLib='public')
modelAstore = s.decisionTree.dtreeExportModel(modelTable = {"caslib":"public","name":"gradboosthmeqtest" }, 
                                        casOut = {"caslib":"public","name":'hmeqTestAstore','replace':True})

s.table.promote(name='hmeqTestAstore', caslib='public',target='hmeqTestAstore',targetLib='public')

Let's test out the model. Create a sample observation, convert it to a pandas dataframe, then a cas table and score against the model

In [8]:
#Convert dictonary of input data to pandas dataframe (a tabular data format for scoring)
datadict = {'LOAN':140,'MORTDUE':3000, 'VALUE':40000, 'REASON':'HomeImp','JOB':'Other','YOJ':12,
           'DEROG':0.0,'DELINQ':0.0, 'CLAGE':89,'NINQ':1.0, 'CLNO':10.0, 'DEBTINC':0.05} 

Create a small helper function to convert the python dictionary to a pandas DataFrame. This could be done with a single line of code but the data types end up changing. Hence this slightly verbose function

In [19]:
def dicttopd(datadict):
    for key in datadict:
        datadict[key] = [datadict[key]]
    return pd.DataFrame.from_dict(datadict)

In [20]:
samplepd = dicttopd(datadict)

In [21]:
samplepd

Unnamed: 0,LOAN,MORTDUE,VALUE,REASON,JOB,YOJ,DEROG,DELINQ,CLAGE,NINQ,CLNO,DEBTINC
0,140,3000,40000,HomeImp,Other,12,0.0,0.0,89,1.0,10.0,0.05


score this against the model

In [22]:
s.upload(samplepd,casout={'name' : 'realtime', 'caslib' : 'public','replace' : True})
s.aStore.score(rstore = {"caslib":"public","name":"hmeqTestAstore"},
                    table = {"caslib":'public',"name":'realtime'},
                    out = {"caslib":'public',"name":'realscore', 'replace':True})

NOTE: Cloud Analytic Services made the uploaded file available as table REALTIME in caslib public.
NOTE: The table REALTIME has been created in caslib public from binary data uploaded to Cloud Analytic Services.


Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,Public,realscore,1,4,"CASTable('realscore', caslib='Public')"

Unnamed: 0,Task,Seconds,Percent
0,Loading the Store,0.00017,0.002193
1,Creating the State,0.055206,0.712291
2,Scoring,0.021818,0.281504
3,Total,0.077505,1.0


Inspect the scores

In [23]:
scoredData = s.CASTable(name='realscore',caslib='public')
datasetDict = scoredData.to_dict()
scores = pd.DataFrame(datasetDict, index=[0])
scores

Unnamed: 0,P_BAD1,P_BAD0,I_BAD,_WARN_
0,0.992159,0.007841,1,


Convert this to a neat little function for later use in the app

In [24]:
def score(samplepd):
    s.upload(samplepd,casout={'name' : 'realtime', 'caslib' : 'public','replace' : True})
    s.aStore.score(rstore = {"caslib":"public","name":"hmeqTestAstore"},
                        table = {"caslib":'public',"name":'realtime'},
                        out = {"caslib":'public',"name":'realscore', 'replace':True})
    #scoretable2= s.table.fetch(score_tableName)
    scoredData = s.CASTable(name='realscore',caslib='public')
    datasetDict = scoredData.to_dict()
    scores = pd.DataFrame(datasetDict, index=[0])
    return scores
    

Test to make sure this works

In [25]:
score(samplepd)

NOTE: Cloud Analytic Services made the uploaded file available as table REALTIME in caslib public.
NOTE: The table REALTIME has been created in caslib public from binary data uploaded to Cloud Analytic Services.


Unnamed: 0,P_BAD1,P_BAD0,I_BAD,_WARN_
0,0.992159,0.007841,1,


Let's add the I_BAD value to the 'BAD' field in sample pd

In [26]:
samplepd['BAD'] = scores.I_BAD.to_list()
samplepd

Unnamed: 0,LOAN,MORTDUE,VALUE,REASON,JOB,YOJ,DEROG,DELINQ,CLAGE,NINQ,CLNO,DEBTINC,BAD
0,140,3000,40000,HomeImp,Other,12,0.0,0.0,89,1.0,10.0,0.05,1


Get interpretability scores using kernelshap algorithm in the linearexplainer action set

In [27]:
s.upload(samplepd,casout={'name' : 'realtime', 'caslib' : 'public','replace' : True})

shapvals = s.linearExplainer(
             table           = {"name" : 'hmeqTest','caslib':'public'},
             query           = {"name" : 'realtime','caslib':'public'},
             modelTable      = {"name" :"hmeqTestAstore",'caslib':'public'},
             modelTableType  = "ASTORE",
             predictedTarget = 'P_BAD1',
             seed            = 1234,
             preset          = "KERNELSHAP",
             inputs          = ['LOAN','MORTDUE','VALUE','YOJ','DEROG','DELINQ','CLAGE','NINQ','CLNO','DEBTINC','REASON', 'JOB','BAD'],
             nominals        = ['REASON', 'JOB','BAD']
            )
shap1 = shapvals['ParameterEstimates']
shap = shap1[['Variable','Estimate']][0:10]

NOTE: Cloud Analytic Services made the uploaded file available as table REALTIME in caslib public.
NOTE: The table REALTIME has been created in caslib public from binary data uploaded to Cloud Analytic Services.
NOTE: Starting the Linear Explainer action.
NOTE: The generated number of samples is automatically set to 6500.
NOTE: Generating kernel weights.
NOTE: Kernel weights generated.


Inspect the results

In [28]:
shap

Unnamed: 0,Variable,Estimate
0,Intercept,0.170354
1,LOAN,-0.262079
2,MORTDUE,-0.058375
3,VALUE,0.072899
4,YOJ,-0.016603
5,DEROG,-0.029429
6,DELINQ,-0.051329
7,CLAGE,0.091768
8,NINQ,-0.018553
9,CLNO,0.030581


In [29]:
!pip install altair

Defaulting to user installation because normal site-packages is not writeable


In [30]:
import altair as alt
alt.Chart(shap).mark_bar().encode(
    x='Variable',
    y='Estimate'
)