# Explainable AI










**Ariel Rossanigo**

git clone git@github.com:arielrossanigo/ ??

### Quien soy?

* Ariel Rossanigo
* Artificial Intelligence teacher at UCSE-DAR
* Developer, Data Scientist
* Co-Founder of Bloom AI

### Explainable / Interpretable AI

https://christophm.github.io/interpretable-ml-book/



#### *Interpretability is the degree to which a human can understand the cause of a decision.*

#### *Interpretability is the degree to which a human can consistently predict the model's result* 



### Modelos interpretables en su naturaleza


| Algorithm	| Linear | Monotone| Interaction | Task |
| --- | --- | --- | --- | --- |
| Linear regression | Yes| Yes | No | regr | 
| Logistic regression | No  |  Yes | No  | class |
| Decision trees | No | Some | Yes | class,regr |

Hay más...


In [None]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
import warnings
warnings.filterwarnings('ignore')

import eli5
import shap
shap.initjs()

import pandas as pd
import numpy as np
from zipfile import ZipFile

from sklearn_pandas import DataFrameMapper

In [None]:
with ZipFile('Bike-Sharing-Dataset.zip') as myzip:
    by_day = pd.read_csv(myzip.open('day.csv'))
    by_hour = pd.read_csv(myzip.open('hour.csv'))
    lines = myzip.open('Readme.txt').readlines()

In [None]:
for i, line in enumerate(lines[59:85]): 
    line = line.decode('ascii').strip()
    print(line)

In [None]:
season_names = {
    1: 'springer',
    2: 'summer',
    3: 'fall',
    4: 'winter',
}

In [None]:
by_hour.head()

In [None]:
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split

In [None]:
features = DataFrameMapper([
    ('season', None),
    ('yr', None),
    ('mnth', None),
    ('hr', None),
    ('holiday', None),
    ('weekday', None),
    ('workingday', None),
    ('weathersit', None),
    ('temp', None),
    ('atemp', None),
    ('hum', None),
    ('windspeed', None),
], df_out=False)

# features.fit_transform(by_hour)
# features.transformed_names_

In [None]:
train, test = train_test_split(by_hour, test_size=0.3, random_state=42)
train = train.reset_index(drop=True).copy()
test = test.reset_index(drop=True).copy()


In [None]:
from sklearn.tree import DecisionTreeRegressor
from sklearn import metrics

In [None]:
features.fit(train)
X_train = features.transform(train)
y_train = train.cnt.values

features_names = features.transformed_names_

X_test = features.transform(test)
y_test = test.cnt.values

In [None]:
def show_error(model):
    err_train = metrics.mean_squared_error(y_train, model.predict(X_train), squared=False)
    err_test = metrics.mean_squared_error(y_test, model.predict(X_test), squared=False)
    print(f"RMSE train: {err_train:.2f}")
    print(f"RMSE test: {err_test:.2f}")

In [None]:
from dtreeviz.trees import *

In [None]:
the_tree = DecisionTreeRegressor(max_depth=4)
the_tree.fit(X_train, y_train)
print('Depth:', the_tree.get_depth())
show_error(the_tree)

In [None]:
X_sample, _, y_sample, _ = train_test_split(X_test, y_test, train_size=1000)

In [None]:
viz = dtreeviz(the_tree, X_sample, y_sample, target_name='# rental bikes', feature_names=features_names, 
               orientation ='LR')
viz.view()

In [None]:
viz_leaf_target(the_tree, X_sample, y_sample, target_name='# rental bikes', feature_names=features_names, 
                figsize=(15, 8))

In [None]:
test['leaves'] = the_tree.apply(X_test)
test.groupby('leaves').cnt.mean().round(0).sort_values()

In [None]:
leaf = 11
case = test[test.leaves==leaf].head(1).index[0]
print(explain_prediction_path(the_tree, X_test[case], y_test[case], 
                              target_name='# rental bikes', feature_names=features_names))

#### Advantages

* capturing interactions 
* distinct groups 
* natural visualization
* create good explanations

#### Disadvantages

* Trees fail to deal with linear relationships
* lack of smoothness
* unstable
* The number of terminal nodes increases quickly with depth.

### Model Agnostic Methods

#### Partial dependence plots


In [None]:
from sklearn.inspection import plot_partial_dependence
from lightgbm import LGBMRegressor
from xgboost import XGBRegressor

In [None]:
clf = XGBRegressor()
clf.fit(X_train, y_train)
show_error(clf)

In [None]:
import matplotlib.pyplot as plt

In [None]:
clf.feature_importances_

In [None]:
features = [3, 8, (3, 8)]
fig, ax = plt.subplots(1, 1, figsize=(15, 6))
plot_partial_dependence(clf, X_train, features, ax=ax, feature_names=features_names);

In [None]:
sorted_idx = clf.feature_importances_.argsort()
plt.barh(np.array(features_names)[sorted_idx], clf.feature_importances_[sorted_idx])
plt.xlabel("Xgboost Feature Importance");

In [None]:
booster = clf.get_booster()
booster.get_score(importance_type='gain')

* **weight**: the number of times a feature is used to split the data across all trees.
* **gain**: the average gain across all splits the feature is used in.
* **cover**: the average coverage across all splits the feature is used in.
* **total_gain**: the total gain across all splits the feature is used in.
* **total_cover**: the total coverage across all splits the feature is used in.

### Permutation feature importance

* Viene incluido en sklearn (https://scikit-learn.org/stable/modules/permutation_importance.html)

$$i_j = s - \frac{1}{K} \sum_{k=1}^{K} s_{k,j}$$

* Vamos a usar eli5 que ya trae algunas cosas implementadas

In [None]:
from eli5.sklearn import PermutationImportance

In [None]:
perm = PermutationImportance(clf, scoring='neg_mean_squared_error', random_state=1, cv="prefit")
perm.fit(X_train, y_train)
eli5.show_weights(perm, feature_names=features_names)

In [None]:
# test.groupby('hr').cnt.mean()
# test.iloc[25]

In [None]:
eli5.show_prediction(clf, X_test[25], show_feature_values=True, feature_names=features_names)

### Global surrogate model

* Entrenar un predictor explicable para que prediga los outputs de nuestro black box

#### Ventajas

* Es simple e intuitivo

#### Desventajas

* conclusions about the model and not about the data
* cut-off for *score being used*
* close for one subset of the dataset, but widely divergent for another subset

###  Local Surrogate (LIME)

*Local interpretable model-agnostic explanations*

The recipe for training local surrogate models:

* Select your instance of interest for which you want to have an explanation of its black box prediction.
* Perturb your dataset and get the black box predictions for these new points.
* Weight the new samples according to their proximity to the instance of interest.
* Train a weighted, interpretable model on the dataset with the variations.
* Explain the prediction by interpreting the local model.

In [None]:
import lime
import lime.lime_tabular
from lime.lime_tabular import LimeTabularExplainer

In [None]:
categorical_features = ['season', 'yr', 'mnth', 'hr', 'holiday', 'weekday', 'workingday', 'weathersit']

In [None]:
explainer = LimeTabularExplainer(X_train, # no anda con pandas...
                                 feature_names=features_names, 
                                 class_names=['# rental bikes'], 
                                 categorical_features=categorical_features, 
                                 verbose=True, 
                                 mode='regression')

i = 25
exp = explainer.explain_instance(X_test[i], clf.predict, num_features=5)

exp.show_in_notebook(show_table=True)

In [None]:
explainer = LimeTabularExplainer(X_train, # no anda con pandas...
                                 feature_names=features_names, 
                                 class_names=['# rental bikes'], 
                                 categorical_features=categorical_features, 
                                 verbose=True, 
                                 mode='regression', 
                                 kernel_width=0.5) # <= Cuidado con este parametro, 
                                                   # sobre todo en high dimentional data

i = 25
exp = explainer.explain_instance(X_test[i], clf.predict, num_features=5)

exp.show_in_notebook(show_table=True)


**¿No hay nada que haga ruido?**

La feature **yr**

### Shapley values

<div><img src="./imgs/shapley-instance.png" width="40%" style="float: left; margin: 10px;" align="middle">
    
    A prediction can be explained by assuming that each feature value of the instance is a "player" in a game where the prediction is the payout. Shapley values -- a method from coalitional game theory -- tells us how to fairly distribute the "payout" among the features.



<div><img src="./imgs/shapley-coalitions.png" width="40%" style="float: left; margin: 10px;" align="middle">

    For each of these coalitions we compute the predicted apartment price with and without the feature value cat-banned and take the difference to get the marginal contribution. The Shapley value is the (weighted) average of marginal contributions. 

### SHAP (SHapley Additive exPlanations)

Similar al anterior, pero utilizando un modelo lineal...

Hay un explainer específico para árboles que reduce la complejidad de cálculo.

In [None]:
explainer = shap.TreeExplainer(clf)
shap_values = explainer.shap_values(X_test)

In [None]:
# Feature importance
shap.summary_plot(shap_values, feature_names=features_names, plot_type='bar')

In [None]:
shap.summary_plot(shap_values, features=X_test, feature_names=features_names)

In [None]:
shap.force_plot(explainer.expected_value,
                shap_values[i], 
                features=X_test[i], feature_names=features_names)

In [None]:
shap.decision_plot(explainer.expected_value,
                   shap_values[i], 
                   features=X_test[i], feature_names=features_names)

In [None]:
shap.dependence_plot(3, shap_values, X_test, feature_names=features_names)


### Gracias! Preguntas?


<div style="float: left;"><img src="../common/imgs/man-qmark.jpg" width="300" align="middle"></div> 

<div>
<div>
  <img src="../common/imgs/gmail-1162901_960_720.png" style="width: 30px; float: left; vertical-align:middle; margin: 0px;">
  <span style="line-height:30px; vertical-align:middle; margin-left: 10px;">arielrossanigo@gmail.com</span>
</div>
<div>
  <img src="../common/imgs/twitter-312464_960_720.png" style="width: 30px; float: left; vertical-align:middle; margin: 0px;">
  <span style="line-height:30px; vertical-align:middle; margin-left: 10px;">@arielrossanigo</span>
</div>
<div>
  <img src="../common/imgs/github-154769__340.png" style="width: 30px; float: left; vertical-align:middle; margin: 0px;">
  <span style="line-height:30px; vertical-align:middle; margin-left: 10px;">https://github.com/arielrossanigo</span>
</div>
<div>
  <img src="../common/imgs/Linkedin_icon.svg" style="width: 30px; float: left; vertical-align:middle; margin: 0px;">
  <span style="line-height:30px; vertical-align:middle; margin-left: 10px;">https://www.linkedin.com/in/arielrossanigo/</span>
</div>

</div>



### Algunas referencias

* https://christophm.github.io/interpretable-ml-book/
* https://github.com/slundberg/shap
* https://eli5.readthedocs.io/en/latest/
* https://github.com/marcotcr/lime