# Explaining the Loss of a Tree Model

Explaining the loss of a model can be very useful for debugging and model monitoring. This notebook gives a very simple example of how this works. Note that explaining the loss of a model requires passing the labels, and is only supported for the `feature_dependence="independent"` option of TreeExplainer.

This notebook will be fleshed out once we post a full write-up of this method.

In [1]:
import shap
import sklearn
import xgboost
import numpy as np

### Train an XGBoost Classifier

In [2]:
X,y = shap.datasets.adult()

model = xgboost.XGBClassifier()
model.fit(X,y)

# compute the logistic log-loss
model_loss = -np.log(model.predict_proba(X)[:,1]) * y + -np.log(model.predict_proba(X)[:,0]) * (1-y)

model_loss[:10]

array([2.77169840e-03, 2.43189454e-01, 1.06922761e-02, 9.52967107e-02,
       5.76623142e-01, 1.72828579e+00, 6.12983434e-03, 7.44314849e-01,
       3.45766719e-04, 2.10685795e-03])

### Explain the Log-Loss of the Model with TreeExplainer

Note that the `expected_value` of the model's loss depends on the label and so it is now a function instead of a single number.

In [3]:
explainer = shap.TreeExplainer(model, X, feature_perturbation = "interventional", model_output = "log_loss")
explainer.shap_values(X.iloc[:10,:], y[:10]).sum(1) + np.array([explainer.expected_value(v) for v in y[:10]])

model_output = "logloss" has been renamed to model_output = "log_loss"
feature_dependence = "independent" has been renamed to feature_perturbation = "interventional"! See GitHub issue #882.


array([2.77167490e-03, 2.43189533e-01, 1.06922868e-02, 9.52966075e-02,
       5.76623150e-01, 1.72828595e+00, 6.12980458e-03, 7.44314826e-01,
       3.45695316e-04, 2.10676165e-03])