In [1]:
!pip install dalex --quiet
!pip install shap --quiet
!pip install lime --quiet
!pip install tensorflow --quiet
!pip install scikit-learn --quiet
!pip install matplotlib --quiet
!pip install pandas --quiet
!pip install numpy --quiet
!pip install xgboost --quiet # good boost libs: XGBoost, LightGBM, CatBoost

# Understand the standard tools for model explainability

## Background


> Model explainability is a major issue for certain sectors such as insurance or banking in the validation and production of developed models. In certain situations, you will not only have to perform well but also be able to justify the decisions made by your model and explain the results. 
>
> Let's take a concrete case from everyday life: the purchase of a property. Hélène wants to become the happy owner of an apartment in Paris, so she applies for a loan from her bank. A Data Scientist has recently developed, in collaboration with a Data Engineer, an application available to bank agents that is able to predict whether or not it is profitable to grant the loan, based on a few characteristics such as the person's gender. Hélène is refused a loan and surprised by the decision, she asks her bank for an explanation, which she is obliged to provide by detailing the strengths and weaknesses of Hélène's file. The bank must therefore be able to explain to Hélène which characteristics weighed positively and especially negatively in the rejection of her loan. This requires the data scientist to provide at least some figures or a graphical interface explaining the local situation of Helen. We will show that it is possible to use simple tools to answer this question. 
> 
> Behind the question of explaining the model's decision to Hélène are several issues, including :
>* Fairness: Ensuring that predictions are unbiased and do not implicitly or explicitly discriminate against underrepresented groups. An interpretable model can tell you why it has decided that a certain person should not get a loan, and it becomes easier for a human to judge whether the decision is based on a learned demographic (e.g. racial) bias.
>* Privacy: Ensuring that sensitive information in the data is protected.
>* Reliability or Robustness: Ensuring that small changes in the input do not lead to large changes in the prediction.
>* Causality: Check that only causal relationships are picked up.
>* Trust: It is easier for humans to trust a system that explains its decisions compared to a black box.
> 
> Explainability aims to respond to these challenges and to bring confidence to the data scientists who model, to the businesses or to the users who use the model. It is a crucial point so that the people for whom you create the data product accept it and use it on a daily basis. 

## Model-specific or model-agnostic

> Model-specific interpretation tools are limited to specific model classes. The interpretation of regression weights in a linear model is a model-specific interpretation, since – by definition – the interpretation of intrinsically interpretable models is always model-specific. Tools that only work for the interpretation of e.g. neural networks are model-specific. Model-agnostic tools can be used on any machine learning model and are applied after the model has been trained (post hoc). These agnostic methods usually work by analyzing feature input and output pairs. By definition, these methods cannot have access to model internals such as weights or structural information.
>
> The interpretation tools specific to the models may contradict each other depending on the specific criterion studied.

## Local vs Global

> When we talk about explainability, it is possible to try to explain the decision of the model for a particular individual or the global behavior of the model. The first option can for example help us to understand the errors of the model for a restricted group of observations, the second option to detect biases. 
>
> Example realized on the **`housing.csv`** dataset.

In [1]:
import pandas as pd
import dalex as dx

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

X, y = fetch_california_housing(return_X_y=True, as_frame=True)

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=123)

# print the first 5 rows of the dataset
print(X.head(5))
print(y.head(5))


clf_gb = GradientBoostingRegressor(n_estimators = 250, max_depth=12, random_state=1234)
clf_rf = RandomForestRegressor(n_estimators = 250, max_depth=12, random_state=1234)

clf_gb.fit(X_train, y_train)
clf_rf.fit(X_train, y_train)

print("GradientBoosting score on train :", clf_gb.score(X_train, y_train))
print("RandomForest score on train :", clf_rf.score(X_train, y_train))

print("GradientBoosting score on test :", clf_gb.score(X_test, y_test))
print("RandomForest score on test :", clf_rf.score(X_test, y_test))

GradientBoosting score on train : 0.9997399292269133
RandomForest score on train : 0.9182912717141076
GradientBoosting score on test : 0.823128630171869
RandomForest score on test : 0.8040242936464083


In [5]:
# print the first 5 rows of the dataset
print("X samples: \n", X.head(5))
print()
print("y labels: \n", y[:5])

X samples: 
    MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
0  8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88   
1  8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86   
2  7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85   
3  5.6431      52.0  5.817352   1.073059       558.0  2.547945     37.85   
4  3.8462      52.0  6.281853   1.081081       565.0  2.181467     37.85   

   Longitude  
0    -122.23  
1    -122.22  
2    -122.24  
3    -122.25  
4    -122.25  

y labels: 
 0    4.526
1    3.585
2    3.521
3    3.413
4    3.422
Name: MedHouseVal, dtype: float64


## Partial Dependence Plot (PDP)

> The partial dependence plot is a global method: The method considers all instances and gives a statement about the global relationship of a feature with the predicted outcome.
>
> The partial function $\hat{f}_{s}$ is estimated by calculating averages in the training data, also known as Monte Carlo method:
> <center> $\hat{f}_{s}(x_s)=\frac{1}{n}\sum_{i=1}^{n}\hat{f}(x_s, x_c^{(i)})$ </center>
>
> The partial function tells us for given value(s) of features S what the average marginal effect on the prediction is. In this formula, $x_c^{(i)}$ are actual feature values from the dataset for the features in which we are not interested, and n is the number of instances in the dataset. An assumption of the PDP is that the features in C are not correlated with the features in S. If this assumption is violated, the averages calculated for the partial dependence plot will include data points that are very unlikely or even impossible.

In [None]:
housing_gb_exp = dx.Explainer(clf_gb, X_train, y_train, 
                  label = "Housing gb")

pdp_gb = housing_gb_exp.model_profile(variables = ["HouseAge", 'MedInc'])
pdp_gb.plot()

## Permutation Importance

> The permutation feature importance algorithm based on Fisher, Rudin, and Dominici (2018):
>
> Input: Trained model $\hat{f}$, feature matrix $X$, target vector $y$, error measure $L(y,\hat{f})$ :
>* Estimate the original model error $L(y,\hat{f})$.
>* For each feature $j \in (1,...,p)$ do:
>** Generate feature matrix $X_{perm}$ by permuting feature j in the data X. This breaks the association between feature j and true outcome y.
>** Estimate error $L(y,\hat{f}(X_{perm}))$ based on the predictions of the permuted data.
>** Calculate permutation feature importance as difference $FI_j = L(y,\hat{f}) - L(y,\hat{f}(X_{perm}))$
>* Sort features by descending FI.

In [None]:
housing_gb_exp = dx.Explainer(clf_gb, X_test, y_test, 
                  label = "Housing gb")

mp_gb = housing_gb_exp.model_parts(loss_function='r2')
mp_gb.plot()

#  LIME [(Ribeiro .al 2016)](https://arxiv.org/abs/1602.04938)

## Intuition
*Intuitively, an explanation is a local linear approximation of the model's behaviour. While the model may be very complex globally, it is easier to approximate it around the vicinity of a particular instance. While treating the model as a black box, we perturb the instance we want to explain and learn a sparse linear model around it, as an explanation*

<p align="center">
<img src=https://raw.githubusercontent.com/marcotcr/lime/master/doc/images/lime.png width=400>
</p>

## Algorithm steps 
The different steps computed by the algorithm are the following :

### 1. Creation of a neighbourhood around the instance : 
- Data samples are generated by applying perturbations around the instance following a normal distribution
- A weight is allocated to every sample with regard to its proximity to the instance. This is the crucial step. The instance explanations may differ a lot with regard to the kernel used to compute the weights. 2 variables are at stake here, the kernel function and the kernel width :
  - the kernel function $k$ :
  $$k(d, k_w) = exp(\frac{-d^2}{k_w})$$ 
  where $$d = \sqrt{\sum_{i}^{} (y_i - x_i)^2}$$
  - the kernel width $k_w$ :
$$k_w = 0.75*\sqrt{n_f}$$ 
with $n_f$ the number of features in the train set.

$k$ and $k_w$ are 2 parameters of our LIME function and can be customised.

An example of the impact of the kernel width on the instance explanation :

<p align="center">
<img src=https://christophm.github.io/interpretable-ml-book/images/lime-fail-1.png width=500>
</p>

### 2. Generate the samples labels 
Make black-box model predictions on the newly generated neighbourhood dataset to generate the associated labels.

### 3. Fit a linear model on the samples
A linear model is then fitted to this labeled data in order to generate our local linear model which corresponds to our instance explanation

# LIME for text

LIME for text data has one major difference with LIME for tabular data : the way the samples are generated and their weights computed. Let's take again the first step of the algorithm, illustrated with a YouTube comments Spam classification model.

|| CONTENT      | CLASS |
|-----------| ----------- | ----------- |
|267| PSY is a good guy      | 0       |
|173| For Christmas Song visit my channel! ;)   | 1        |

### 1. Creation of a neighbourhoods around the instance : 

- Data samples are generated by randomly removing some words from the instance text. The neighbourhood dataset is a dataset a binary features, where the value is 1 if the corresponding word is included and 0 if it has been removed.

| For |	Christmas	| Song |	visit |	my |	channel! |	;) |
| -- | -- | -- | -- | -- | -- | -- |
|1|0|1|1|0|0|1|
|0|1|1|1|1|0|1|
|1|0|0|1|1|1|1|
|1|0|1|1|1|1|1|
|0|1|1|1|0|0|1|

- A weight is allocated to every sample with regard to its proximity to the instance. With LIME for text, the weight is calculated with the same kernel than for tabular data, with a default kernel width of 25 (kernel width can be customised).

| For |	Christmas	| Song |	visit |	my |	channel! |	;) | weight |
| -- | -- | -- | -- | -- | -- | -- | -- |
|1|0|1|1|0|0|1|0.89|
|0|1|1|1|1|0|1|0.92|
|1|0|0|1|1|1|1|0.92|
|1|0|1|1|1|1|1|0.96|
|0|1|1|1|0|0|1|0.89|

### 2. Generate the samples labels 

- This second step is very close to the one for tabular data. The class 1 probability is calculated for every sample using the black-box model's predictions.

| For |	Christmas	| Song |	visit |	my |	channel! |	;) | weight | prob |
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
|1|0|1|1|0|0|1|0.89|0.17|
|0|1|1|1|1|0|1|0.92|0.17|
|1|0|0|1|1|1|1|0.92|0.99|
|1|0|1|1|1|1|1|0.96|0.99|
|0|1|1|1|0|0|1|0.89|0.17|

### 3. Fit a linear model on the samples
- This third step remains the same, a linear model is then fitted to this labeled data in order to generate our local linear model which corresponds to our instance explanation.

# Now let's practice !

## Packages installation & Imports 

In [None]:
!pip install lime

import pandas as pd
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.pipeline import make_pipeline
from lime.lime_text import LimeTextExplainer

## Mount Drive (ONLY USE IF WORKING ON GOOGLE COLAB)
If working on Google Colab, you can modify the PATH to the folder on which you uploaded the data on your Drive.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')
PATH = "https://raw.githubusercontent.com/mhjabreel/CharCnn_Keras/master/data/ag_news_csv/"

## Data Loading

Download the train.csv & test.csv datasets from github ([link](https://github.com/mhjabreel/CharCnn_Keras/tree/master/data/ag_news_csv))

In [None]:
import os 

df_train = pd.read_csv(os.path.join(PATH,"train.csv"), header = None)
df_test = pd.read_csv(os.path.join(PATH,"test.csv"), header = None)

## Columns name cleaning

In [None]:
# TODO : 
#   - Rename the first column as "target", the 2nd as "title", the third as "description"
#   - Lower the text

## TF-IDF Vectorizer & Logistic Regression 

In [None]:
#TF-IDF
tfidf_vc = TfidfVectorizer(
    min_df = 10, 
    max_features = 100000, 
    analyzer = "word", 
    ngram_range = (1, 2), 
    stop_words = 'english', 
    lowercase = True
)

# Logistic Regression
model = LogisticRegression(C = 0.5, solver = "sag")

# Pipeline definition
pipe = make_pipeline(tfidf_vc, model)

# Pipeline training
pipe.fit(df_train["description"], df_train.target)

# Predictions on test_set
test_pred = pipe.predict(df_test["description"])

## Evaluation

In [None]:
print(classification_report(df_test.target, test_pred))
print(confusion_matrix(df_test.target, test_pred))

## Explicability with LIME

In [None]:
# idx = ??? TO FILL

class_names = ["World", "Sports", "Business", "Sci/Tech"]
explainer = LimeTextExplainer(class_names = class_names)
exp = explainer.explain_instance(
    df_test["description"][idx], 
    # TO FILL  
    num_features = 10, 
    top_labels=3
)

exp.show_in_notebook(text=df_test["description"][idx])

# LIME for image

LIME algorithm for images works a little differently than for tabular data and text. Indeed, perturbing individual pixels one by one will not really change the prediction because more than one pixel contribute to one class. 



## Algorithm steps 
The different steps computed by the algorithm are the following :

### 1. Creation of superpixels : 
The alorithm first requires to generate "superpixels" which are composed of contigous pixels that share properties such as texture or color distribution.This step is crucial for the generation of the LIME explanation since perturbation of superpixels is used to identify which of the image areas has been relevant for a specific class decision.

LIME uses the quickshift algorithm to produce these superpixels (more details here : https://www.robots.ox.ac.uk/~vedaldi/assets/pubs/vedaldi08quick.pdf)

<p align="center">
<img src=https://www.oreilly.com/content/wp-content/uploads/sites/2/2019/06/figure3-2cea505fe733a4713eeff3b90f696507.jpg width=500>
</p>


### 2. Generate perturbed instances :
Once the superpixels are defined, we can generate a new dataset of perturbed instances by turning off superpixels on the image. The interpretable representation of the image is a binary vector where 1 indicates the original super-pixel and 0 indicates a grayed out super-pixel.

### 3. Fit a linear model on the samples

We can now fit a linear model on the perturbed instance to a specific class and highlight the superpixels with positive or negative weight towards a specific class.

<p align="center">
<img src=https://www.oreilly.com/content/wp-content/uploads/sites/2/2019/06/figure4-99d9ea184dd35876e0dbae81f6fce038.jpg width=500>
</p>


# Now let's practice !

## Packages installation & Imports 






In [None]:
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
import matplotlib.pyplot as plt
from lime import lime_image
from skimage.segmentation import mark_boundaries
from keras.applications import inception_v3 as inc_net

## Load pre-trained InceptionV3 model and images

In [None]:
# Load model
inception_model = InceptionV3(weights='imagenet')

### TODO : Here you can download any images and identify the way to access them on your computer / on drive

In [None]:
# Image processing
def transform_img_fn(path_list):
    out = []
    for img_path in path_list:
        img = image.load_img(img_path, target_size=(299, 299))
        x = image.img_to_array(img)
        x = np.expand_dims(x, axis=0)
        x = inc_net.preprocess_input(x)
        out.append(x)
    return np.vstack(out)

## TODO : apply transform_img_fn on 3 images and affect the result to a list of images called "images"

In [None]:
# display the image
plt.imshow(images[0] / 2 + 0.5)
plt.axis('off')
plt.show()

## Make some predictions

In [None]:
# TODO: 
# With de decode_predictions function print the 5 most probable classes into a list of tuples (class, description, probability)
preds = inception_model.predict(images)
for x in decode_predictions(preds)[0]:
    print(x)

## Explicability with LIME image

In [None]:
# TODO: play with the different parameters

# Train lime image explainer
explainer = lime_image.LimeImageExplainer()
explanation = explainer.explain_instance(images[0].astype('double'), inception_model.predict, top_labels="?", hide_color="?", num_samples="?")

# Plot boundaries
temp, mask = explanation.get_image_and_mask(explanation.top_labels["?"], positive_only="?", num_features="?", hide_rest="?")
plt.imshow(mark_boundaries(temp / 2 + 0.5, mask))

## Plotting the heatmap

In [None]:
# TODO : Select the same class explained on the figures above and return it as a variable called "ind"

# TODO : What does explanation.local_exp[ind] return ? 

# TODO : Map each explanation weight to the corresponding superpixel and return it as a dict called dict_heatmap



heatmap = np.vectorize(dict_heatmap.get)(explanation.segments) 

#Plot. The visualization makes more sense if a symmetrical colorbar is used.
plt.imshow(heatmap, cmap = 'RdBu', vmin  = -heatmap.max(), vmax = heatmap.max())
plt.colorbar()

# **SH**apley **A**dditive ex**P**lanations ([Lundberg et .al 2017](https://arxiv.org/abs/1905.04610))

## From Game Theory


* In game theory, the [Shapley value](https://en.wikipedia.org/wiki/Shapley_value) (1953) is a solution concept of fairly distributing both gains and costs to several actors working in coalition.
* The Shapley value applies primarily in situations when the contributions of each actor are unequal, but they work in cooperation with each other to obtain the payoff.

You first start by identifying each player’s contribution when they play individually, when 2 play together, and when all 3 play together.
<p align="center">
<img src=https://clearcode.cc/wp-content/uploads/2016/11/ABC-wide.png?ver=1478561348 width=500>
</p>

Then, you need to consider all possible orders and calculate their marginal value – e.g. what value does each player add when player A enters the game first, followed by player B, and then player C.
Below are the 6 possible orders and the marginal value each player adds in the different combinations:
<p align="center">
<img src=https://clearcode.cc/wp-content/uploads/2016/11/ABC-updated.png?ver=1479258642 width=500>
</p>

Now that we have calculated each player’s marginal value across all 6 possible order combinations, we now need to add them up and work out the Shapley value (i.e. the average) for each player.

<ins>Example for Player A:</ins>
$ \text{Shapley}_{value} = \frac{7+7+10+3+9+10}{6} \approx 7.7$

Computing the Shapley value for each player will give the true contribution each player made to the game and assign credit fairly

## To Explainability Method

* Each value of an independent variable or a feature for a given sample is a part of a cooperative game where we assume that prediction is actually the payout.
* Shapley values correspond to the contribution of each feature towards pushing the prediction away from the expected value.

Let take an example of a local prediction of a house price and see how the different features are impacting the prediction. 
<p align="center">
<img src=https://raw.githubusercontent.com/slundberg/shap/master/docs/artwork/boston_waterfall.png width=700>
</p>

Example of features definition: 
* LSTAT (% of lower status population)
* RM (average number of rooms per house in an area)
* NOX (nitric oxides concentration)
* RAD (index of accessibility to radial highways)
* For more information, link to [Boston dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) 


## Explanation of SHAP through visualization

### Global explainability & local explanation summary
<p align="center">
<img src=https://raw.githubusercontent.com/slundberg/shap/master/docs/artwork/boston_global_bar.png width=470>
<img src=https://raw.githubusercontent.com/slundberg/shap/master/docs/artwork/boston_beeswarm.png width=530>
</p>

### Local explainability and correlation
<p align="center">
<img src=https://raw.githubusercontent.com/slundberg/shap/master/docs/artwork/boston_scatter.png width=500>
</p>

## Advantages
* SHAP has a solid theoretical foundation in game theory. The prediction is fairly distributed among the feature values. We get contrastive explanations that compare the prediction with the average prediction.
* SHAP connects LIME and Shapley values.
* SHAP has a fast implementation for tree-based models.
* When computation of the many Shapley values is possible, global model interpretations can be built. The global interpretation methods include feature importance, feature dependence, interactions, clustering and summary plots.

## Drawbacks
* Slow computation if you want to compute Shapley values for many instances (except for tree-based models).
* The disadvantages of Shapley values also apply to SHAP: Shapley values can be misinterpreted.
* Since every model is trained from observational data, it is not necessarily a causal model.

For more information on SHAP values see: https://github.com/slundberg/shap

## Practical exercise

Download the dataset from Kaggle ([link](https://www.kaggle.com/paololol/league-of-legends-ranked-matches))

The objective in a game of League of Legends is to destroy the enemy base, in a 5 vs. 5 match. Using datasets with statistics of the game and the players, the goal is to predict the probability to win the game. 

In [None]:
import pandas as pd
import numpy as np 
import xgboost as xgb
from sklearn.model_selection import train_test_split
import shap
import matplotlib.pyplot as plt

shap.initjs()

In [None]:
# TODO : 
# - read the data from matches.csv, participants.csv, stats1.csv, stats2.csv into matches participants stats1 and stats2 DataFrame. 
# - concat stats1 and stats2 into a stats DataFrame 


In [None]:
# TODO : Explore the DataFrames

In [None]:
# TODO : merge matches participants and stats into a single DataFrame called allstats

In [None]:
# TODO : drop games that lasted less than 10 minutes

In [None]:
# TODO : which columns are string-based categories ? 

In [None]:
# TODO : Except for wardsbought, convert string-based categories to numeric values. 
# ints : use df[col].astype('category') and get the codes of the categories 

In [None]:
# TODO : Reduce dataset size to accelerate training filetring matchid lower than 50000

In [None]:
# TODO : Return an X DataFrame after removing the "win" column and a y Serie corresponding to the "win" colmun

In [None]:
# convert all following features we want to consider as rates
rate_features = [
    "kills", "deaths", "assists", "killingsprees", "doublekills",
    "triplekills", "quadrakills", "pentakills", "legendarykills",
    "totdmgdealt", "magicdmgdealt", "physicaldmgdealt", "truedmgdealt",
    "totdmgtochamp", "magicdmgtochamp", "physdmgtochamp", "truedmgtochamp",
    "totheal", "totunitshealed", "dmgtoobj", "timecc", "totdmgtaken",
    "magicdmgtaken" , "physdmgtaken", "truedmgtaken", "goldearned", "goldspent",
    "totminionskilled", "neutralminionskilled", "ownjunglekills",
    "enemyjunglekills", "totcctimedealt", "pinksbought", "wardsbought",
    "wardsplaced", "wardskilled"
]
for feature_name in rate_features:
    X[feature_name] /= X["duration"] / 60 # per minute rate

# convert to fraction of game
X["longesttimespentliving"] /= X["duration"]

# define friendly names for the features
full_names = {
    "kills": "Kills per min.",
    "deaths": "Deaths per min.",
    "assists": "Assists per min.",
    "killingsprees": "Killing sprees per min.",
    "longesttimespentliving": "Longest time living as % of game",
    "doublekills": "Double kills per min.",
    "triplekills": "Triple kills per min.",
    "quadrakills": "Quadra kills per min.",
    "pentakills": "Penta kills per min.",
    "legendarykills": "Legendary kills per min.",
    "totdmgdealt": "Total damage dealt per min.",
    "magicdmgdealt": "Magic damage dealt per min.",
    "physicaldmgdealt": "Physical damage dealt per min.",
    "truedmgdealt": "True damage dealt per min.",
    "totdmgtochamp": "Total damage to champions per min.",
    "magicdmgtochamp": "Magic damage to champions per min.",
    "physdmgtochamp": "Physical damage to champions per min.",
    "truedmgtochamp": "True damage to champions per min.",
    "totheal": "Total healing per min.",
    "totunitshealed": "Total units healed per min.",
    "dmgtoobj": "Damage to objects per min.",
    "timecc": "Time spent with crown control per min.",
    "totdmgtaken": "Total damage taken per min.",
    "magicdmgtaken": "Magic damage taken per min.",
    "physdmgtaken": "Physical damage taken per min.",
    "truedmgtaken": "True damage taken per min.",
    "goldearned": "Gold earned per min.",
    "goldspent": "Gold spent per min.",
    "totminionskilled": "Total minions killed per min.",
    "neutralminionskilled": "Neutral minions killed per min.",
    "ownjunglekills": "Own jungle kills per min.",
    "enemyjunglekills": "Enemy jungle kills per min.",
    "totcctimedealt": "Total crown control time dealt per min.",
    "pinksbought": "Pink wards bought per min.",
    "wardsbought": "Wards bought per min.",
    "wardsplaced": "Wards placed per min.",
    "turretkills": "# of turret kills",
    "inhibkills": "# of inhibitor kills",
    "dmgtoturrets": "Damage to turrets"
}
feature_names = [full_names.get(n, n) for n in X.columns]
X.columns = feature_names

In [None]:
# create train/validation split
Xt, Xv, yt, yv = train_test_split(X,y, test_size=0.2, random_state=10)
dt = xgb.DMatrix(Xt, label=yt.values)
dv = xgb.DMatrix(Xv, label=yv.values)

In [None]:
# We want to solve a logistic regression with a logloss evaluation
params = {
    "eta": 0.5,
    "max_depth": 4,
    "objective": 'binary:logistic',
    "silent": 1,
    "base_score": np.mean(yt),
    "eval_metric": 'logloss'
}
# TODO : with xgb.train method, code the training part for 300 iterations with early stopping rounds at 5 and a verbose eval at 25
model = "?"

In [None]:
# compute the SHAP values for every prediction in the validation dataset
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(Xv)

In [None]:
# Force plot example for a record
shap.initjs()
shap.force_plot(explainer.expected_value, shap_values[0,:], Xv.iloc[0,:])

In [None]:
xs = np.linspace(-4,4,100)
plt.xlabel("Log odds of winning")
plt.ylabel("Probability of winning")
plt.title("How changes in log odds convert to probability of winning")
plt.plot(xs, 1/(1+np.exp(-xs)))
plt.show()

In [None]:
# Global explainability
shap.plots.bar(explainer(Xv))

In [None]:
# Local explanation summary
shap.summary_plot(shap_values, Xv)

In [None]:
# Dependence plot between variables (automatic)
shap.dependence_plot("Gold earned per min.", shap_values, Xv)

In [None]:
# Dependence plot between variables (assigned)
shap.dependence_plot("Gold earned per min.", shap_values, Xv, alpha=0.2, interaction_index="Deaths per min.")

In [None]:
# TODO : sort the features indexes by their importance in the model and return them in top_inds list


# TODO : with shap.dependence_plot(), make SHAP plots of the three most important features


In [None]:
# Play with plot variables
shap.dependence_plot("id", shap_values, Xv, x_jitter="?", alpha="?", dot_size="?")

## Bonus Section: Chest X-Ray Images

> In recent years, convolutional neural networks have been widely used in image recognition tasks and have obtained excellent results. A team of radiologists who wish to be helped in the detection of pneumonia on chest X-ray images have called on a team of data scientists to train a convolutional neural network capable of determining whether a X-ray is that of a patient with pneumonia or not.
>
> However, before using it to be assisted on a daily basis by this model, they want to ensure its relevance. It is essential for them that the model uses the lung region to determine whether a patient has pneumonia or not.
>
> The trained model has been saved in the h5 format and is named **model.h5**. You have access to the images used to test the model in the directory named **test**.
>
> Your task will involve two aspects:
>* Determine if the model is relevant from a medical point of view.
>* If necessary, propose an approach to make the model more relevant.

In [None]:
# load trained model

from keras.models import load_model

model = load_model("/content/gdrive/MyDrive/.../model.h5")

## Going further with Explainability

### SHAPASH ([Github](https://github.com/MAIF/shapash))

A module developped by MAIF using SHAP methodology with nice features such as a web app for exploration and ML OPS usage.

[Demo](https://shapash-demo.ossbymaif.fr/) of the dashboarding capabilities.

[Notebook](https://github.com/MAIF/shapash/blob/master/tutorial/tutorial03-Shapash-overview-model-in-production.ipynb) example for ML OPS usage

#### Strengths 
* A great tool for data scientists to investigate a model's behaviour faster ! 
* Ongoing development today to add new features 

#### Weaknesses
* It "just" a plotly layer on top of SHAP not a end-user-driven Framework for model explanations
* Decision making based on the graphs is not immediate, it only provides insights 
* Audiences need to be a bit technical to be confortable with the approach 