# Explaining a model

| | | |
|-|-|-|
|[ ![Creative Commons License](images/cc4.png)](http://creativecommons.org/licenses/by-nc/4.0/) |[ ![aGrUM](images/logoAgrum.png)](https://agrum.org) |[ ![interactive online version](images/atbinder.svg)](https://agrum.gitlab.io/extra/agrum_at_binder.html)

In [1]:
import time

import pandas as pd

import pyagrum as gum
import pyagrum.lib.explain as expl

## Building the model

We build a simple graph for the example

In [2]:
template = gum.fastBN("X1->X2->Y;X3->Z->Y;X0->Z;X1->Z;X2->R[5];Z->R;X1->Y")
data_path = "res/shap/Data_6var_direct_indirect.csv"

# gum.generateSample(template,1000,data_path)

learner = gum.BNLearner(data_path, template)
bn = learner.learnParameters(template.dag())
bn

## 1-independence list (w.r.t. the class Y)
Given a model, it may be interesting to investigate the conditional independences of the class Y created by this very model.

In [3]:
# this function explores all the CI between 2 variables and computes the p-values w.r.t to a csv file.
expl.independenceListForPairs(bn, data_path)

AttributeError: module 'pyagrum.lib.explain' has no attribute 'independenceListForPairs'

... with respect to a specific target.

In [None]:
expl.independenceListForPairs(bn, data_path, target="Y")

## 2-ShapValues : explaining a Bayesian network as a classifier

In [None]:
print(expl.ShapValues.__doc__)

The ShapValue class implements the calculation of Shap values in Bayesian networks. It is necessary to specify a target and to provide a Bayesian network whose parameters are known and will be used later in the different calculation methods.

In [None]:
gumshap = expl.ShapValues(bn, "Y")

### Compute Conditionnal in Bayesian Network

A dataset (as a `pandas.dataframe`) must be provided so that the Bayesian network can learn its parameters and then predict.

The method `conditional` computes the conditonal shap values using the Bayesian Networks. It returns 2 graphs and a dictionary. The first one shows the distribution of the shap values for each of the variables, the second one classifies the variables by their importance.

In [None]:
train = pd.read_csv(data_path).sample(frac=1.0)

In [None]:
t_start = time.time()
resultat = gumshap.conditional(train, plot=True, plot_importance=True, percentage=False)
print(f"Run Time : {time.time() - t_start} sec")

In [None]:
t_start = time.time()
resultat = gumshap.conditional(train, plot=False, plot_importance=True, percentage=False)
print(f"Run Time : {time.time() - t_start} sec")

In [None]:
result = gumshap.conditional(train, plot=True, plot_importance=False, percentage=False)
# result is a Dict[str,float] of the different Shapley values for all nodes.

The result is returned as a dictionary, the keys are the names of the features and the associated value is the absolute value of the average of the calculated shap.

In [None]:
t_start = time.time()
resultat = gumshap.conditional(train, plot=False, plot_importance=False, percentage=False)
print(f"Run Time : {time.time() - t_start} sec")
resultat

### Causal Shap Values

This method is similar to the previous one, except the formula of computation. It computes the causal shap value as described in the paper of Heskes *Causal Shapley Values: Exploiting Causal Knowledge
to Explain Individual Predictions of Complex Models* .

In [None]:
t_start = time.time()
causal = gumshap.causal(train, plot=True, plot_importance=True, percentage=False)
print(f"Run Time : {time.time() - t_start} sec")

As you can see, since $R$ is not among the 'causes' of Y, its causal importance is null.

### Marginal Shap Values

Similarly, one can also compute marginal Shap Value.

In [None]:
t_start = time.time()
marginal = gumshap.marginal(train, sample_size=10, plot=True, plot_importance=True, percentage=False)
print(f"Run Time : {time.time() - t_start} sec")
print(marginal)

As you can see, since $R$, $X0$ and $X3$ are not in the Markov Blanket of $Y$, their marginal importances are null.

### Saving the graph

You can specify a filename if you prefer to save this figure instead of showing it:


In [None]:
t_start = time.time()
causal2 = gumshap.causal(train, plot=True, plot_importance=True, percentage=False, filename="out/marginal.pdf")
print(f"Run Time : {time.time() - t_start} sec")
print(causal2)

### Visualizing shapvalues directly on a BN

This function returns a coloured graph that makes it easier to understand which variable is important and where it is located in the graph.

In [None]:
expl.showShapValues(bn, causal)

## Visualizing information

Finally another view consists in showing the entropy on each node and the mutual informations on each arcs.

In [None]:
expl.showInformation(bn)