# Języki Programowania Python i R


## dr inż. Patryk Jasik
### Division of Theoretical Physics and Quantum Information
### Institute of Physics and Computer Science
### Faculty of Applied Physics and Mathematics
### Gdansk University of Technology

# DALEX
## https://dalex.drwhy.ai/

# Installation
## !pip install dalex -U

In [None]:
#loading the necessary packages
import dalex as dx

import pandas as pd
import numpy as np

from sklearn.neural_network import MLPClassifier, MLPRegressor
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer

import warnings
warnings.filterwarnings('ignore')

In [None]:
#version of Dalex
dx.__version__

In [None]:
#the dataset - measurements of physical and chemical properties of Portuguese Vinho Verde wines (white and red)
wine = pd.read_csv("data/winequality-all.csv", comment="#")
wine

# goal - creation of regression model and explanation of its local and global features
### target variable: alcohol
### predictors: fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulphates, response

In [None]:
#predictors
#we will leave out three observations (first, before last, and last) to test explanation methods on them

X = wine.iloc[1:-2, 0:12].drop(columns="alcohol")
X

In [None]:
#the target variable
y = wine.iloc[1:-2, -3]
y

In [None]:
X.columns

## We will create the pipeline in order to perform automatic preprocessing, which will allow us to prepare the dataset for modeling

In [None]:
#Let's create the transformer for numerical features.
#We can use any transformation of predictors here and in this case, it will be standardization.

numerical_features = ['fixed.acidity',
                      'volatile.acidity',
                      'citric.acid',
                      'residual.sugar',
                      'chlorides',
                      'free.sulfur.dioxide',
                      'total.sulfur.dioxide',
                      'density',
                      'pH',
                      'sulphates',
                      'response']

numerical_transformer = Pipeline(
    steps=[
        ('scaler', StandardScaler()) #the name of the step in pipeline 
    ]
)

#numerical_transformer = Pipeline(
#    steps=[
#        ('scaler', MinMaxScaler())
#    ]
#)

In [None]:
#creation of transformer for numerical features

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features)
    ]
)

In [None]:
#Let's create the regression model using Multi-layer Perceptron (MLP) 
#regMLP = MLPRegressor(hidden_layer_sizes=(150,100,50), max_iter=500, random_state=0)
regkNN = KNeighborsRegressor()

In [None]:
#The final pipeline consists of two steps: preprocessor and regressor

reg = Pipeline(steps=[('preprocessor', preprocessor),
                      ('regressor', regkNN)])

In [None]:
reg.steps

In [None]:
#Let's train the MLP model using the created pipeline.

reg.fit(X, y)

## Now it is time to create the Dalex explainer

Black-box models may have very different structures. This class creates a unified representation of a model, which can be further processed by various explanations. Methods of this class produce explanation objects, that contain the main result attribute, and can be visualised using the plot method.

![title](local_global_explanations.png)


In [None]:
exp = dx.Explainer(reg, X, y)

In [None]:
#We will check the local explanations using previously skipped observations.

first = wine[wine.index == wine.index.min()].iloc[:, 0:12].drop(columns='alcohol')
before_last = wine[wine.index == wine.index.max()-1].iloc[:, 0:12].drop(columns='alcohol')
last = wine[wine.index == wine.index.max()].iloc[:, 0:12].drop(columns='alcohol')

In [None]:
first

In [None]:
before_last

In [None]:
last

In [None]:
#Let's compare the original values of the target variable with the predicted ones.
y[0:15]

In [None]:
exp.predict(X)[0:15]

In [None]:
print("First original", wine.alcohol[0])
print("Before last original", wine.alcohol[5318])
print("Last original", wine.alcohol[5319])

In [None]:
print("First prediction", exp.predict(first).round(1))
print("Before last prediction", exp.predict(before_last).round(1))
print("Last prediction", exp.predict(last).round(1))

# Prediction level - Local explanations

## Break Down method 

The basic idea is to calculate the contribution of variable in prediction of f(x) as changes in the expected model response given other variables. This means that we start with the mean expected model response of the model, successively adding variables to the conditioning. Of course, the order in which the variables are arranged also influences the contribution values. If our model is additive, the arrangement of individual variables and values will be the same. If we have a non-additive model with p variables, we have p! orders, it is complicated by calculation.

In [None]:
#The predict_parts function calculates predict-level variable attributions as Break Down, Shapley Values, or Shap Values
#Let's look on the Break Down results

bd_first = exp.predict_parts(first, type='break_down', label="first")
bd_interactions_first = exp.predict_parts(first, type='break_down_interactions', label="first")

In [None]:
bd_first.result

In [None]:
bd_before_last = exp.predict_parts(before_last, type='break_down', label="before_last")
bd_interactions_before_last = exp.predict_parts(before_last, type='break_down_interactions', label="before_last")

In [None]:
bd_before_last.result

In [None]:
bd_last = exp.predict_parts(last, type='break_down', label="last")
bd_interactions_last = exp.predict_parts(last, type='break_down_interactions', label="last")

In [None]:
bd_last.result

In [None]:
bd_first.plot(max_vars=11)

In [None]:
bd_before_last.plot(max_vars=11)

In [None]:
bd_last.plot(max_vars=11)

## Shapley values

Shapley value is a model agnostic method, we can use it for any type of model. The benefit of Shapley values is additive feature attribution property. It is a local explanation. In the comparison with the Break Down, the Shapley value is a generalization because in Break Down method represents one of all variable orders. Now, we consider all orders for variables, so if we have the p features in our dataset, then we have p! orders. The output is averaging the possible orders.

The Shapley value method is based on Break Down predictions into parts. This is a slightly different approach than in the Break Down method. It is based on the idea of averaging the input value of a given variable overall or a large number of possible orders.

An important practical limitation of the general model-agnostic method is that, for large models, the calculation of Shapley values is time-consuming. In specific situations, they can be calculated very quickly. For example, for additonal models and for models based on trees.

In [None]:
#In the case of 10 predictors we have a lot of possibilities :)
import math
math.factorial(11)

In [None]:
#Let's look on the Shapley values
#B is number of random paths to calculate variable attributions (default is 25)
# B=100 :)
sh_first = exp.predict_parts(first, type='shap', label='first', B=100)

In [None]:
sh_first.result.head(60)

In [None]:
sh_first.result.tail(60)

In [None]:
sh_first.result.loc[sh_first.result.B == 0]

In [None]:
sh_first.plot(bar_width = 16, max_vars=11)

In [None]:
sh_before_last = exp.predict_parts(before_last, type='shap', label='before_last')

In [None]:
sh_before_last.result.head(60)

In [None]:
sh_before_last.result.loc[sh_before_last.result.B == 0]

In [None]:
sh_before_last.plot(bar_width = 16, max_vars=11)

In [None]:
sh_last = exp.predict_parts(last, type='shap', label='last')

In [None]:
sh_last.result.head(60)

In [None]:
sh_last.result.loc[sh_last.result.B == 0]

In [None]:
sh_last.plot(bar_width = 16, max_vars=11)

## Ceteris Paribus profiles
### "all other things being equal" or "other things held constant" or "all else unchanged"
### https://en.wikipedia.org/wiki/Ceteris_paribus

Ceteris Paribus is a Latin phrase meaning “other things held constant” or “all else unchanged”. Ceteris Paribus (CP) profiles are designed to show model response around a single point in the feature space. They show how the model response depends on changes in a single input variable, keeping all other variables unchanged. They work for any Machine Learning model and allow for model comparisons to better understand how a model is working.

In [None]:
#Let's create the Ceteris Paribus profiles
cp_first = exp.predict_profile(first, label="first")
cp_before_last = exp.predict_profile(before_last, label="before_last")
cp_last = exp.predict_profile(last, label="last")

In [None]:
cp_first.result

In [None]:
X.describe()

In [None]:
cp_before_last.result

In [None]:
cp_last.result

In [None]:
cp_first.plot([cp_before_last, cp_last])

# Model level - Global explanations

In [None]:
#Let's calculate the model-level model performance measures
mp = exp.model_performance(model_type = 'regression')
mp.result

In [None]:
#the extraction of R2 metric
mp.result.r2[0]

In [None]:
mp.residuals

In [None]:
mp.plot()

## Permutation-based variable importance 

The idea is very simple, to assess how important is the variable V we will compare the initial model with the model on which effect of the variable V is removed. How to remove the effect of variable V? In permutation-based variable-importance method, the effect of a variable is removed though a random reshuffling of the data. We take the original data, then we permutate, and we get “new” data, on which we calculate the prediction.

**If a variable is important in a model, then after its permutation the model prediction should be less precise.**

In [None]:
#Let's calculate model-level variable importance.
vi = exp.model_parts(loss_function='rmse', N=None, B=100)
vi.result

In [None]:
vi.plot(max_vars=11)

In [None]:
X.columns

In [None]:
vi_grouped = exp.model_parts(variable_groups={'sulfur': ['free.sulfur.dioxide', 'total.sulfur.dioxide', 'sulphates'],
                                              'acid': ['fixed.acidity', 'volatile.acidity', 'citric.acid'],
                                              'sugar': ['residual.sugar'],
                                              'density': ['density'],
                                              'other': ['chlorides', 'pH', 'response']}, loss_function='rmse',
                             N=None, B=100)
vi_grouped.result

In [None]:
vi_grouped.plot()

## Partial Dependence Profile

The general idea behind the design of PD profiles is to show how the expected model prediction value behaves as a function of the selected explanatory variable. For one model we can construct a general PD profile using all observations from the data set or several profiles for observation subgroups. A comparison of subgroup-specific profiles can provide important insight, for example, into the stability of the model prediction.

1. Profiles can be created for all the observations in the set, as well as for the division against other variables. For example, we can see how a specific variable behaves when differentiated by energy, intensity, or other factors.
2. We can detect some complicated variable relationships. For example, we have PD profiles for two models and we can see that one of the simple models (linear regression) does not detect any dependence, while the profile for a black-box model (random forest) notices a difference.

In [None]:
#These functions calculate explanations that explore model response as a function of selected variables.

#partial-dependence profile
pdp_num = exp.model_profile(type = 'partial', label="pdp", N=None)

In [None]:
pdp_num.plot()

In [None]:
# PDP profile for residual.sugar and pH variable
pdp_num = exp.model_profile(variables = ["residual.sugar", "pH"] , type = "partial", label="pdp", N=None)

# plot PDP
pdp_num.plot()

In [None]:
# PDP profile for residual.sugar and pH variable grouped by response
pdp_num_group = exp.model_profile(variables = ["residual.sugar", "pH"], groups = "response", type = "partial", N=None)

# plot PDP
pdp_num_group.plot()

# Yes, You can :)