Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.png)

# Save and retrieve explanations via Azure Machine Learning Run History

_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to save and retrieve classification model explanations to/from Azure Machine Learning Run History.**_


## Table of Contents

1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Run model explainer locally at training time](#Explain)
    1. Apply feature transformations
    1. Train a binary classification model
    1. Explain the model on raw features
        1. Generate global explanations
        1. Generate local explanations
1. [Upload model explanations to Azure Machine Learning Run History](#Upload)
1. [Download model explanations from Azure Machine Learning Run History](#Download)
1. [Visualize explanations](#Visualize)
1. [Next steps](#Next)

## Introduction

This notebook showcases how to explain a classification model predictions locally at training time, upload explanations to the Azure Machine Learning's run history, and download previously-uploaded explanations from the Run History.
It demonstrates the API calls that you need to make to upload/download the global and local explanations and a visualization dashboard that provides an interactive way of discovering patterns in data and downloaded explanations.

We will showcase three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.



Problem: IBM employee attrition classification with scikit-learn (run model explainer locally and upload explanation to the Azure Machine Learning Run History)

1. Train a SVM classification model using Scikit-learn
2. Run 'explain_model' with AML Run History, which leverages run history service to store and manage the explanation data
---

Setup: If you are using Jupyter notebooks, the extensions should be installed automatically with the package.
If you are using Jupyter Labs run the following command:
```
(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager
```


## Explain

### Run model explainer locally at training time

In [1]:
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.svm import SVC
import pandas as pd
import numpy as np

from azureml.contrib.interpret.lime.lime_explainer import LIMEExplainer

Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (azureml-core 1.2.0.post1 (c:\users\utilisateur\anaconda3\lib\site-packages), Requirement.parse('azureml-core==1.0.85.*'), {'azureml-telemetry'}).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (azureml-core 1.2.0.post1 (c:\users\utilisateur\anaconda3\lib\site-packages), Requirement.parse('azureml-core==1.0.85.*')).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (azureml-core 1.2.0.post1 (c:\users\utilisateur\anaconda3\lib\site-packages), Requirement.parse('azureml-core==1.0.85.*')).
Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pipeline.core.run:S

Could not import lightgbm, required if using LGBMExplainableModel


### Load the IBM employee attrition data

In [2]:
# get the IBM employee attrition dataset
outdirname = 'dataset.6.21.19'
try:
    from urllib import urlretrieve
except ImportError:
    from urllib.request import urlretrieve
import zipfile
zipfilename = outdirname + '.zip'
urlretrieve('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)
with zipfile.ZipFile(zipfilename, 'r') as unzip:
    unzip.extractall('.')
attritionData = pd.read_csv('./WA_Fn-UseC_-HR-Employee-Attrition.csv')

# Dropping Employee count as all values are 1 and hence attrition is independent of this feature
attritionData = attritionData.drop(['EmployeeCount'], axis=1)
# Dropping Employee Number since it is merely an identifier
attritionData = attritionData.drop(['EmployeeNumber'], axis=1)

attritionData = attritionData.drop(['Over18'], axis=1)

# Since all values are 80
attritionData = attritionData.drop(['StandardHours'], axis=1)

# Converting target variables from string to numerical values
target_map = {'Yes': 1, 'No': 0}
attritionData["Attrition_numerical"] = attritionData["Attrition"].apply(lambda x: target_map[x])
target = attritionData["Attrition_numerical"]

attritionXData = attritionData.drop(['Attrition_numerical', 'Attrition'], axis=1)

In [3]:
# Split data into train and test
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(attritionXData, 
                                                    target, 
                                                    test_size = 0.2,
                                                    random_state=0,
                                                    stratify=target)

In [4]:
# Creating dummy columns for each categorical feature
categorical = []
for col, value in attritionXData.iteritems():
    if value.dtype == 'object':
        categorical.append(col)
        
# Store the numerical columns in a list numerical
numerical = attritionXData.columns.difference(categorical)        

### Transform raw features

We can explain raw features by either using a `sklearn.compose.ColumnTransformer` or a list of fitted transformer tuples. The cell below uses `sklearn.compose.ColumnTransformer`. In case you want to run the example with the list of fitted transformer tuples, comment the cell below and uncomment the cell that follows after. 

In [5]:
from sklearn.compose import ColumnTransformer

# We create the preprocessing pipelines for both numeric and categorical data.
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

transformations = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numerical),
        ('cat', categorical_transformer, categorical)])

# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', transformations),
                      ('classifier', SVC(kernel='linear', C = 1.0, probability=True))])

### Train a SVM classification model, which you want to explain

In [6]:
model = clf.fit(x_train, y_train)

### Explain predictions on your local machine

In [7]:
# clf.steps[-1][1] returns the trained classification model
explainer = LIMEExplainer(clf.steps[-1][1], 
                             initialization_examples=x_train, 
                             features=attritionXData.columns, 
                             classes=["Not leaving", "leaving"], 
                             transformations=transformations)

### Generate global explanations
Explain overall model predictions (global explanation)

In [8]:
# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate
global_explanation = explainer.explain_global(x_test)

# Note: if you used the PFIExplainer in the previous step, use the next line of code instead
# global_explanation = explainer.explain_global(x_test, true_labels=y_test)

294it [00:31,  9.47it/s]


In [9]:
# Sorted SHAP values
print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))
# Corresponding feature names
print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))
# Feature ranks (based on original order of features)
print('global importance rank: {}'.format(global_explanation.global_importance_rank))

# Note: PFIExplainer does not support per class explanations
# Per class feature names
print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))
# Per class feature importance values
print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))

ranked global importance values: [0.026288250324504114, 0.018052596847064083, 0.017569285585972982, 0.014402945382440328, 0.012415549561513382, 0.01218572549836201, 0.00890033932568918, 0.005834624362009417, 0.0049394458173303915, 0.0047158640596956635, 0.003872693548007219, 0.0036728451439132295, 0.003350389225780544, 0.0028008221231975335, 0.0025170017563532193, 0.002159784623168122, 0.0019198683936585926, 0.0018438076261085067, 0.0018057333168945808, 0.0017524288479985489, 0.0015482366851013085, 0.0007855161469340192, 0.00044396021209053445, 0.00016584588036985018, 5.885256722718144e-05, 2.1617129731540506e-05, 0.0, 0.0, 0.0, 0.0]
ranked global importance names: ['TotalWorkingYears', 'YearsSinceLastPromotion', 'JobRole', 'EducationField', 'YearsWithCurrManager', 'BusinessTravel', 'NumCompaniesWorked', 'YearsInCurrentRole', 'YearsAtCompany', 'JobSatisfaction', 'Age', 'EnvironmentSatisfaction', 'MaritalStatus', 'DistanceFromHome', 'JobInvolvement', 'RelationshipSatisfaction', 'DailyRa

In [10]:
# Print out a dictionary that holds the sorted feature importance names and values
print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))

global importance rank: {'TotalWorkingYears': 0.026288250324504114, 'YearsSinceLastPromotion': 0.018052596847064083, 'JobRole': 0.017569285585972982, 'EducationField': 0.014402945382440328, 'YearsWithCurrManager': 0.012415549561513382, 'BusinessTravel': 0.01218572549836201, 'NumCompaniesWorked': 0.00890033932568918, 'YearsInCurrentRole': 0.005834624362009417, 'YearsAtCompany': 0.0049394458173303915, 'JobSatisfaction': 0.0047158640596956635, 'Age': 0.003872693548007219, 'EnvironmentSatisfaction': 0.0036728451439132295, 'MaritalStatus': 0.003350389225780544, 'DistanceFromHome': 0.0028008221231975335, 'JobInvolvement': 0.0025170017563532193, 'RelationshipSatisfaction': 0.002159784623168122, 'DailyRate': 0.0019198683936585926, 'WorkLifeBalance': 0.0018438076261085067, 'HourlyRate': 0.0018057333168945808, 'StockOptionLevel': 0.0017524288479985489, 'TrainingTimesLastYear': 0.0015482366851013085, 'OverTime': 0.0007855161469340192, 'MonthlyRate': 0.00044396021209053445, 'Education': 0.00016584

### Explain overall model predictions as a collection of local (instance-level) explanations

In [11]:
# feature shap values for all features and all data points in the training data
print('local importance values: {}'.format(global_explanation.local_importance_values))

local importance values: [[[0.0, 0.0, -0.007809215001117396, 0.0, 0.0, 0.0, -0.03785085637220157, -0.005937088559363696, 0.0, 0.0, 0.0, 0.0, 0.005074402578812942, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.040502273813175685, 0.0, -0.008442183255351657, 0.016567848725368085, -0.0193230941240861, 0.02846758214272112, -0.02053731429943352], [-0.010805830074101273, 0.0, 0.002952792382991854, 0.0, 0.0, 0.003351135250160289, 0.0, 0.0, 0.0, 0.009293002092776994, -0.002938856763814191, 0.0, -0.031654066719520785, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0050856289371381605, 0.0, 0.0, 0.0, 0.0, -0.03586881501233268, 0.0, 0.0, 0.0, 0.0, 0.025548447047071925, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.007891695773672882, 0.0, 0.027172227156562166, -0.010415529587621332, 0.0, 0.0, 0.0, 0.0, -0.004903049470802945, 0.0, 0.0, 0.0, 0.0, -0.04548903747380864, -0.006011738569223218, -0.007371224932725837, 0.0, 0.0, 0.035992322202766394, -0.024299422917924085], [0.0, 0.0, -0.00237749659218




### Generate local explanations
Explain local data points (individual instances)

In [12]:
# Note: PFIExplainer does not support local explanations
# You can pass a specific data point or a group of data points to the explain_local function

# E.g., Explain the first data point in the test set
instance_num = 1
local_explanation = explainer.explain_local(x_test[:instance_num])

1it [00:00,  8.93it/s]


In [13]:
# Get the prediction for the first member of the test set and explain why model made that prediction
prediction_value = clf.predict(x_test)[instance_num]

sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]
sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]

print('local importance values: {}'.format(sorted_local_importance_values))
print('local importance names: {}'.format(sorted_local_importance_names))

local importance values: [[0.032122539455673206, 0.011643721240341187, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0039840281575602805, -0.004189419254535208, -0.007921845178972278, -0.009458187484064781, -0.019469952789731664, -0.021701269591777556, -0.03488871301154546, -0.039734889705300355]]
local importance names: [['YearsSinceLastPromotion', 'DistanceFromHome', 'MaritalStatus', 'MonthlyRate', 'BusinessTravel', 'DailyRate', 'Department', 'Education', 'Gender', 'JobLevel', 'JobRole', 'JobSatisfaction', 'MonthlyIncome', 'Age', 'NumCompaniesWorked', 'OverTime', 'PercentSalaryHike', 'PerformanceRating', 'RelationshipSatisfaction', 'StockOptionLevel', 'TrainingTimesLastYear', 'YearsAtCompany', 'HourlyRate', 'EnvironmentSatisfaction', 'WorkLifeBalance', 'JobInvolvement', 'YearsInCurrentRole', 'YearsWithCurrManager', 'EducationField', 'TotalWorkingYears']]


## Upload
Upload explanations to Azure Machine Learning Run History

In [14]:
import azureml.core
from azureml.core import Workspace, Experiment, Run
from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient
# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.2.0


In [15]:
from dotenv import load_dotenv
from pathlib import Path  # python3 only
import os
load_dotenv()
env_path = Path('.') / 'auth.env'
load_dotenv(dotenv_path=env_path)

SUBSCRIPTION_ID = os.getenv('SUBSCRIPTION_ID')
RESOURCE_GROUP = os.getenv('RESOURCE_GROUP')

# Load the workspace from the saved config file
print(Workspace.list(subscription_id=SUBSCRIPTION_ID, resource_group=RESOURCE_GROUP))

ws = Workspace.create(name='veille_workspace',
               subscription_id=SUBSCRIPTION_ID,
               resource_group=RESOURCE_GROUP,
               create_resource_group=False,
               location='westeurope'
               )

{'Realestate': [Workspace.create(name='Realestate', subscription_id='68bdd703-8837-469c-80bd-bfb35f3b886f', resource_group='ProjectGroup4')]}
Deploying AppInsights with name veillewoinsightsfbba6bf3.
Deployed AppInsights with name veillewoinsightsfbba6bf3. Took 4.5 seconds.
Deploying KeyVault with name veillewokeyvault383b14f1.
Deploying StorageAccount with name veillewostorage00349e48e.
Deployed KeyVault with name veillewokeyvault383b14f1. Took 17.98 seconds.
Deploying Workspace with name veille_workspace.
Deployed StorageAccount with name veillewostorage00349e48e. Took 21.86 seconds.
Deployed Workspace with name veille_workspace. Took 18.79 seconds.


In [16]:
experiment_name = 'explain_model'
experiment = Experiment(ws, experiment_name)
run = experiment.start_logging()
client = ExplanationClient.from_run(run)

In [17]:
# Uploading model explanation data for storage or visualization in webUX
# The explanation can then be downloaded on any compute
# Multiple explanations can be uploaded
client.upload_model_explanation(global_explanation, comment='global explanation: all features')
# Or you can only upload the explanation object with the top k feature info
#client.upload_model_explanation(global_explanation, top_k=2, comment='global explanation: Only top 2 features')

In [18]:
# Uploading model explanation data for storage or visualization in webUX
# The explanation can then be downloaded on any compute
# Multiple explanations can be uploaded
client.upload_model_explanation(local_explanation, comment='local explanation for test point 1: all features')

# Alterntively, you can only upload the local explanation object with the top k feature info
#client.upload_model_explanation(local_explanation, top_k=2, comment='local explanation: top 2 features')

## Download
Download explanations from Azure Machine Learning Run History

In [19]:
# List uploaded explanations
client.list_model_explanations()

In [20]:
for explanation in client.list_model_explanations():
    
    if explanation['comment'] == 'local explanation for test point 1: all features':
        downloaded_local_explanation = client.download_model_explanation(explanation_id=explanation['id'])
        # You can pass a k value to only download the top k feature importance values
        downloaded_local_explanation_top2 = client.download_model_explanation(top_k=2, explanation_id=explanation['id'])
    
    
    elif explanation['comment'] == 'global explanation: all features':
        downloaded_global_explanation = client.download_model_explanation(explanation_id=explanation['id'])
        # You can pass a k value to only download the top k feature importance values
        downloaded_global_explanation_top2 = client.download_model_explanation(top_k=2, explanation_id=explanation['id'])
    

## Visualize
Load the visualization dashboard

In [21]:
from interpret_community.widget import ExplanationDashboard

In [22]:
ExplanationDashboard(downloaded_global_explanation, model, datasetX=x_test)

<interpret_community.widget.explanation_dashboard.ExplanationDashboard at 0x21400612508>

## Next
Learn about other use cases of the explain package on a:
1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb)       
1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)
1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)
1. Explain models with engineered features:
    1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)
    1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)
1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)
1. Inferencing time: deploy a classification model and explainer:
    1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)
    1. [Deploy a remotely-trained model and explainer](../scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)