Problem with XGBoost contributions computation #127

guillaume-vignal · 2021-02-17T16:59:21Z

Code

import numpy as np
import pandas as pd

import xgboost

import shap
from shapash.explainer.smart_explainer import SmartExplainer

X,y = shap.datasets.nhanesi()
X_display,y_display = shap.datasets.nhanesi(display=True) # human readable feature values
y = np.array(y)
X = X.drop('Unnamed: 0', axis=1)

xgb_train = xgboost.DMatrix(X, label=y)

params_train = {
    "eta": 0.002,
    "max_depth": 3,
    "objective": "survival:cox",
    "subsample": 0.5,
}

model = xgboost.train(params_train, xgb_train, num_boost_round=5)

# Smart explainer creation
xpl = SmartExplainer()
xpl.compile(
    x=X,
    model=model
)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-8abc6cc3f74d> in <module>
     28 # Smart explainer creation
     29 xpl = SmartExplainer()
---> 30 xpl.compile(
     31     x=X,
     32     model=model,

~/.local/lib/python3.8/site-packages/shapash/explainer/smart_explainer.py in compile(self, x, model, explainer, contributions, y_pred, preprocessing, postprocessing)
    192             raise ValueError("You have to specify just one of these arguments: explainer, contributions")
    193         if contributions is None:
--> 194             contributions, explainer = shap_contributions(model, self.x_init, self.check_explainer(explainer))
    195         adapt_contrib = self.adapt_contributions(contributions)
    196         self.state = self.choose_state(adapt_contrib)

~/.local/lib/python3.8/site-packages/shapash/utils/shap_backend.py in shap_contributions(model, x_df, explainer)
     55 
     56     if str(type(model)) not in list(sum((simple_tree_model,catboost_model,linear_model,svm_model),())):
---> 57         raise ValueError(
     58             """
     59             model not supported by shapash, please compute contributions

ValueError: 
            model not supported by shapash, please compute contributions
            by yourself before using shapash

Hint:

str(type(model))
"<class 'xgboost.core.Booster'>"

Python version : 3.8

Shapash version : 1.1.0
XGBoost version : 1.0.0

Operating System : Linux

gariciodaro · 2021-04-17T15:31:43Z

I am having trouble making Shapash works with Xgboost. I am running this exact code, but I get the folowing error:

TypeError: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)

I created an environment and install Xgboost 1.0.0 and python 3.7.

guillaume-vignal · 2021-04-19T10:08:06Z

I am having trouble making Shapash works with Xgboost. I am running this exact code, but I get the folowing error:

TypeError: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)

I created an environment and install Xgboost 1.0.0 and python 3.7.

Hi @gariciodaro,

please can you help us saying which version of shapash you are using ?
Can you copy/paste your code ?

You error message seems to tell you to transform first your data in Pandas DataFrame to DMatrix before passing it to your xg_boost.train.

To load a Pandas data frame into DMatrix:

data = pandas.DataFrame(np.arange(12).reshape((4,3)), columns=['a', 'b', 'c'])
dtrain = xgb.DMatrix(data)

Tell me if it works for you.

Have a nice day.

gariciodaro · 2021-04-19T11:28:34Z

Hi @guillaume-vignal,

For the sake of clarity, I created a new Python 3.7 environment (using conda). I need to have at least Xgboost + MLflow and of course Shapash. I installed them via pip. shapash==1.3.2 and xgboost==1.4.0. The complete pip freeze is as follows:

alembic==1.4.1
ansiwrap==0.8.4
appdirs==1.4.4
async-generator==1.10
attrs==20.3.0
black==20.8b1
bleach==3.3.0
Brotli==1.0.9
certifi==2020.12.5
chardet==4.0.0
click==7.1.2
cloudpickle==1.6.0
cycler==0.10.0
dash==1.17.0
dash-bootstrap-components==0.9.1
dash-core-components==1.13.0
dash-daq==0.5.0
dash-html-components==1.1.1
dash-renderer==1.8.3
dash-table==4.11.0
databricks-cli==0.14.3
defusedxml==0.7.1
docker==5.0.0
entrypoints==0.3
Flask==1.1.2
Flask-Compress==1.9.0
future==0.18.2
gitdb==4.0.7
GitPython==3.1.14
greenlet==1.0.0
idna==2.10
importlib-metadata==4.0.0
ipython-genutils==0.2.0
itsdangerous==1.1.0
Jinja2==2.11.3
joblib==1.0.1
jsonschema==3.2.0
jupyter-client==6.1.12
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
kiwisolver==1.3.1
llvmlite==0.34.0
Mako==1.1.4
MarkupSafe==1.1.1
matplotlib==3.4.1
mistune==0.8.4
mlflow==1.15.0
mypy-extensions==0.4.3
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
numba==0.51.2
numpy==1.20.2
packaging==20.9
pandas==1.1.5
pandocfilters==1.4.3
papermill==2.3.2
pathspec==0.8.1
Pillow==8.2.0
plotly==4.12.0
prometheus-client==0.10.1
prometheus-flask-exporter==0.18.1
protobuf==3.15.8
Pygments==2.8.1
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
python-editor==1.0.4
pytz==2021.1
pywin32==227
PyYAML==5.4.1
pyzmq==22.0.3
querystring-parser==1.2.4
regex==2021.4.4
requests==2.25.1
retrying==1.3.3
scikit-learn==0.24.1
scipy==1.6.2
seaborn==0.11.1
shap==0.37.0
shapash==1.3.2
six==1.15.0
slicer==0.0.3
smmap==4.0.0
SQLAlchemy==1.4.9
sqlparse==0.4.1
tabulate==0.8.9
tenacity==7.0.0
testpath==0.4.4
textwrap3==0.9.2
threadpoolctl==2.1.0
toml==0.10.2
tornado==6.1
tqdm==4.60.0
traitlets==5.0.5
typed-ast==1.4.3
typing-extensions==3.7.4.3
urllib3==1.26.4
waitress==2.0.0
webencodings==0.5.1
websocket-client==0.58.0
Werkzeug==1.0.1
wincertstore==0.2
xgboost==1.4.0
zipp==3.4.1

I am trying to load a model that was tracked and deployed with MLflow into shapash. However, I keep getting this DMatrix error. So I went back to the basis and run the example code you have here:

import numpy as np
import pandas as pd

import xgboost

import shap
from shapash.explainer.smart_explainer import SmartExplainer

X,y = shap.datasets.nhanesi()
X_display,y_display = shap.datasets.nhanesi(display=True) # human readable feature values
y = np.array(y)
X = X.drop('Unnamed: 0', axis=1)

xgb_train = xgboost.DMatrix(X, label=y)

params_train = {
    "eta": 0.002,
    "max_depth": 3,
    "objective": "survival:cox",
    "subsample": 0.5,
}

model = xgboost.train(params_train, xgb_train, num_boost_round=5)

# Smart explainer creation
xpl = SmartExplainer()
xpl.compile(
    x=X,
    model=model
)
app = xpl.run_app(title_story='test xgboost')

but I got the same error:

 File ".\test.py", line 33, in <module>
    app = xpl.run_app(title_story='test xgboost')
  File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\shapash\explainer\smart_explainer.py", line 911, in run_app
    self.predict()
  File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\shapash\explainer\smart_explainer.py", line 755, in predict
    self.y_pred = predict(self.model, self.x_init)
  File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\shapash\utils\model.py", line 76, in predict
    y_pred = pd.DataFrame(model.predict(x_init), columns=['pred'], index=x_init.index)
  File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\xgboost\core.py", line 1703, in predict
    raise TypeError('Expecting data to be a DMatrix object, got: ', type(data))
TypeError: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)

The problem is that the xgboost object/model only works with DMatrix, however, inside Shapash, I see that there is a call to get predictions out of the model, that tries to pass a dataframe instead of DMatrix to xgboost. I assume this is to get the local explanation plot, or the contribution plot. While I was trying to make it work on my real case, I was able to by pass the error by passing the predictions, but then on the web app, the local explanation keeps failing. Here is the that code:

import mlflow
import joblib
import pandas as pd
from shapash.explainer.smart_explainer import SmartExplainer
import xgboost as xgb

def main():
    # Load Data set
    XtestData=joblib.load('../DataSets/X_train.file')

    # Load artifact.
    local_path='C:/Users/CiodarG/Documents/MLflowGari/mlruns/2'
    run_location='a8bab32786794b70bd85596eebc1442a'
    logged_model = 'file:///{}/{}/artifacts/model'.format(
                                                local_path,
                                                run_location)

    loaded_model = mlflow.xgboost.load_model(logged_model)

    # make prediction with booster object.
    y_pred=loaded_model.predict(xgb.DMatrix(data=XtestData))
    y_pred_df = pd.DataFrame(y_pred,
                            columns=['Valuation'],
                            index=XtestData.index)

    #Call shapash
    xpl = SmartExplainer()
    # Compile the object to be explored
    xpl.compile(
        x=XtestData,
        y_pred= y_pred_df,
        model=loaded_model)
    # start the web app.
    app = xpl.run_app(title_story='My case')

if __name__ == '__main__':
    main()

Let me me know if you need more details, have a nice day!

guillaume-vignal · 2021-05-07T12:41:50Z

Hi,
I looked deeper to your example. And you're right, at the moment we are not supporting DMatrix as a data format.
So first thing you can do is to train the model with "sckitlearn api " from xgboost (xgboost.XGBRegressor, xgboost.XGBClassifier, ...) in place of xgboost.train.

In the case you can't change and have to use this model already trained.

To bypass the problem you have to clone and use the code.
Change in the file shapash/webapp/smart_app.py this line as following:

  self.components['graph']['detail_feature'].figure = self.explainer.plot.local_plot(index=selected,
                                                                                     label=label,
                                                                                     show_masked=True,
                                                                                     yaxis_max_label=0,
                                                                                     show_predict=False)

adding the parameter show_predict=False

Then when you are doing the compile give it the predictions and the contributions.
If I take my previous example:

import numpy as np
import pandas as pd

import xgboost

import shap
from shapash.explainer.smart_explainer import SmartExplainer

X,y = shap.datasets.nhanesi()
X_display,y_display = shap.datasets.nhanesi(display=True) # human readable feature values
y = np.array(y)
X = X.drop('Unnamed: 0', axis=1)

xgb_train = xgboost.DMatrix(X, label=y)

params_train = {
    "eta": 0.002,
    "max_depth": 3,
    "objective": "survival:cox",
    "subsample": 0.5,
}

model = xgboost.train(params_train, xgb_train, num_boost_round=5)

# Smart explainer creation
xpl = SmartExplainer()
xpl.compile(
    x=X,
    model=model,
    y_pred=pd.Series(model.predict(xgb_train), name='probability'),
    contributions=model.predict(xgb_train, pred_contribs=True)[:, :-1]
)

app = xpl.run_app(title_story='test xgboost', port=8078)

In the same time we'll work with the team on how to solve this issue in the nicest and the best way.

Best regards.

guillaume-vignal added bug Something isn't working shapash 1.1.0 labels Feb 17, 2021

ThomasBouche mentioned this issue Feb 18, 2021

fixed xgboost contribution #128

Merged

13 tasks

ThomasBouche linked a pull request Feb 18, 2021 that will close this issue

fixed xgboost contribution #128

Merged

13 tasks

ThomasBouche closed this as completed in #128 Feb 19, 2021

yg79 reopened this Apr 17, 2021

guillaume-vignal mentioned this issue May 10, 2021

Prevent in the case of a regression to call the predict function if t… #201

Merged

8 tasks

ThomasBouche closed this as completed in #201 May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with XGBoost contributions computation #127

Problem with XGBoost contributions computation #127

guillaume-vignal commented Feb 17, 2021 •

edited

gariciodaro commented Apr 17, 2021 •

edited

guillaume-vignal commented Apr 19, 2021

gariciodaro commented Apr 19, 2021

guillaume-vignal commented May 7, 2021 •

edited

Problem with XGBoost contributions computation #127

Problem with XGBoost contributions computation #127

Comments

guillaume-vignal commented Feb 17, 2021 • edited

gariciodaro commented Apr 17, 2021 • edited

guillaume-vignal commented Apr 19, 2021

gariciodaro commented Apr 19, 2021

guillaume-vignal commented May 7, 2021 • edited

guillaume-vignal commented Feb 17, 2021 •

edited

gariciodaro commented Apr 17, 2021 •

edited

guillaume-vignal commented May 7, 2021 •

edited