Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with XGBoost contributions computation #127

Closed
guillaume-vignal opened this issue Feb 17, 2021 · 4 comments · Fixed by #128 or #201
Closed

Problem with XGBoost contributions computation #127

guillaume-vignal opened this issue Feb 17, 2021 · 4 comments · Fixed by #128 or #201
Labels
bug Something isn't working shapash 1.1.0

Comments

@guillaume-vignal
Copy link
Collaborator

guillaume-vignal commented Feb 17, 2021

Code

import numpy as np
import pandas as pd

import xgboost

import shap
from shapash.explainer.smart_explainer import SmartExplainer

X,y = shap.datasets.nhanesi()
X_display,y_display = shap.datasets.nhanesi(display=True) # human readable feature values
y = np.array(y)
X = X.drop('Unnamed: 0', axis=1)

xgb_train = xgboost.DMatrix(X, label=y)

params_train = {
    "eta": 0.002,
    "max_depth": 3,
    "objective": "survival:cox",
    "subsample": 0.5,
}

model = xgboost.train(params_train, xgb_train, num_boost_round=5)

# Smart explainer creation
xpl = SmartExplainer()
xpl.compile(
    x=X,
    model=model
)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-8abc6cc3f74d> in <module>
     28 # Smart explainer creation
     29 xpl = SmartExplainer()
---> 30 xpl.compile(
     31     x=X,
     32     model=model,

~/.local/lib/python3.8/site-packages/shapash/explainer/smart_explainer.py in compile(self, x, model, explainer, contributions, y_pred, preprocessing, postprocessing)
    192             raise ValueError("You have to specify just one of these arguments: explainer, contributions")
    193         if contributions is None:
--> 194             contributions, explainer = shap_contributions(model, self.x_init, self.check_explainer(explainer))
    195         adapt_contrib = self.adapt_contributions(contributions)
    196         self.state = self.choose_state(adapt_contrib)

~/.local/lib/python3.8/site-packages/shapash/utils/shap_backend.py in shap_contributions(model, x_df, explainer)
     55 
     56     if str(type(model)) not in list(sum((simple_tree_model,catboost_model,linear_model,svm_model),())):
---> 57         raise ValueError(
     58             """
     59             model not supported by shapash, please compute contributions

ValueError: 
            model not supported by shapash, please compute contributions
            by yourself before using shapash

Hint:

str(type(model))
"<class 'xgboost.core.Booster'>"

Python version : 3.8

Shapash version : 1.1.0
XGBoost version : 1.0.0

Operating System : Linux

@guillaume-vignal guillaume-vignal added bug Something isn't working shapash 1.1.0 labels Feb 17, 2021
@ThomasBouche ThomasBouche linked a pull request Feb 18, 2021 that will close this issue
13 tasks
@gariciodaro
Copy link

gariciodaro commented Apr 17, 2021

I am having trouble making Shapash works with Xgboost. I am running this exact code, but I get the folowing error:

TypeError: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)

I created an environment and install Xgboost 1.0.0 and python 3.7.

@yg79 yg79 reopened this Apr 17, 2021
@guillaume-vignal
Copy link
Collaborator Author

I am having trouble making Shapash works with Xgboost. I am running this exact code, but I get the folowing error:

TypeError: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)

I created an environment and install Xgboost 1.0.0 and python 3.7.

Hi @gariciodaro,

please can you help us saying which version of shapash you are using ?
Can you copy/paste your code ?

You error message seems to tell you to transform first your data in Pandas DataFrame to DMatrix before passing it to your xg_boost.train.

To load a Pandas data frame into DMatrix:

data = pandas.DataFrame(np.arange(12).reshape((4,3)), columns=['a', 'b', 'c'])
dtrain = xgb.DMatrix(data)

Tell me if it works for you.

Have a nice day.

@gariciodaro
Copy link

Hi @guillaume-vignal,

For the sake of clarity, I created a new Python 3.7 environment (using conda). I need to have at least Xgboost + MLflow and of course Shapash. I installed them via pip. shapash==1.3.2 and xgboost==1.4.0. The complete pip freeze is as follows:

alembic==1.4.1
ansiwrap==0.8.4
appdirs==1.4.4
async-generator==1.10
attrs==20.3.0
black==20.8b1
bleach==3.3.0
Brotli==1.0.9
certifi==2020.12.5
chardet==4.0.0
click==7.1.2
cloudpickle==1.6.0
cycler==0.10.0
dash==1.17.0
dash-bootstrap-components==0.9.1
dash-core-components==1.13.0
dash-daq==0.5.0
dash-html-components==1.1.1
dash-renderer==1.8.3
dash-table==4.11.0
databricks-cli==0.14.3
defusedxml==0.7.1
docker==5.0.0
entrypoints==0.3
Flask==1.1.2
Flask-Compress==1.9.0
future==0.18.2
gitdb==4.0.7
GitPython==3.1.14
greenlet==1.0.0
idna==2.10
importlib-metadata==4.0.0
ipython-genutils==0.2.0
itsdangerous==1.1.0
Jinja2==2.11.3
joblib==1.0.1
jsonschema==3.2.0
jupyter-client==6.1.12
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
kiwisolver==1.3.1
llvmlite==0.34.0
Mako==1.1.4
MarkupSafe==1.1.1
matplotlib==3.4.1
mistune==0.8.4
mlflow==1.15.0
mypy-extensions==0.4.3
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
numba==0.51.2
numpy==1.20.2
packaging==20.9
pandas==1.1.5
pandocfilters==1.4.3
papermill==2.3.2
pathspec==0.8.1
Pillow==8.2.0
plotly==4.12.0
prometheus-client==0.10.1
prometheus-flask-exporter==0.18.1
protobuf==3.15.8
Pygments==2.8.1
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
python-editor==1.0.4
pytz==2021.1
pywin32==227
PyYAML==5.4.1
pyzmq==22.0.3
querystring-parser==1.2.4
regex==2021.4.4
requests==2.25.1
retrying==1.3.3
scikit-learn==0.24.1
scipy==1.6.2
seaborn==0.11.1
shap==0.37.0
shapash==1.3.2
six==1.15.0
slicer==0.0.3
smmap==4.0.0
SQLAlchemy==1.4.9
sqlparse==0.4.1
tabulate==0.8.9
tenacity==7.0.0
testpath==0.4.4
textwrap3==0.9.2
threadpoolctl==2.1.0
toml==0.10.2
tornado==6.1
tqdm==4.60.0
traitlets==5.0.5
typed-ast==1.4.3
typing-extensions==3.7.4.3
urllib3==1.26.4
waitress==2.0.0
webencodings==0.5.1
websocket-client==0.58.0
Werkzeug==1.0.1
wincertstore==0.2
xgboost==1.4.0
zipp==3.4.1

I am trying to load a model that was tracked and deployed with MLflow into shapash. However, I keep getting this DMatrix error. So I went back to the basis and run the example code you have here:

import numpy as np
import pandas as pd

import xgboost

import shap
from shapash.explainer.smart_explainer import SmartExplainer

X,y = shap.datasets.nhanesi()
X_display,y_display = shap.datasets.nhanesi(display=True) # human readable feature values
y = np.array(y)
X = X.drop('Unnamed: 0', axis=1)

xgb_train = xgboost.DMatrix(X, label=y)

params_train = {
    "eta": 0.002,
    "max_depth": 3,
    "objective": "survival:cox",
    "subsample": 0.5,
}

model = xgboost.train(params_train, xgb_train, num_boost_round=5)

# Smart explainer creation
xpl = SmartExplainer()
xpl.compile(
    x=X,
    model=model
)
app = xpl.run_app(title_story='test xgboost')

but I got the same error:

 File ".\test.py", line 33, in <module>
    app = xpl.run_app(title_story='test xgboost')
  File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\shapash\explainer\smart_explainer.py", line 911, in run_app
    self.predict()
  File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\shapash\explainer\smart_explainer.py", line 755, in predict
    self.y_pred = predict(self.model, self.x_init)
  File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\shapash\utils\model.py", line 76, in predict
    y_pred = pd.DataFrame(model.predict(x_init), columns=['pred'], index=x_init.index)
  File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\xgboost\core.py", line 1703, in predict
    raise TypeError('Expecting data to be a DMatrix object, got: ', type(data))
TypeError: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>)

The problem is that the xgboost object/model only works with DMatrix, however, inside Shapash, I see that there is a call to get predictions out of the model, that tries to pass a dataframe instead of DMatrix to xgboost. I assume this is to get the local explanation plot, or the contribution plot. While I was trying to make it work on my real case, I was able to by pass the error by passing the predictions, but then on the web app, the local explanation keeps failing. Here is the that code:

import mlflow
import joblib
import pandas as pd
from shapash.explainer.smart_explainer import SmartExplainer
import xgboost as xgb

def main():
    # Load Data set
    XtestData=joblib.load('../DataSets/X_train.file')

    # Load artifact.
    local_path='C:/Users/CiodarG/Documents/MLflowGari/mlruns/2'
    run_location='a8bab32786794b70bd85596eebc1442a'
    logged_model = 'file:///{}/{}/artifacts/model'.format(
                                                local_path,
                                                run_location)

    loaded_model = mlflow.xgboost.load_model(logged_model)

    # make prediction with booster object.
    y_pred=loaded_model.predict(xgb.DMatrix(data=XtestData))
    y_pred_df = pd.DataFrame(y_pred,
                            columns=['Valuation'],
                            index=XtestData.index)

    #Call shapash
    xpl = SmartExplainer()
    # Compile the object to be explored
    xpl.compile(
        x=XtestData,
        y_pred= y_pred_df,
        model=loaded_model)
    # start the web app.
    app = xpl.run_app(title_story='My case')

if __name__ == '__main__':
    main()
  

Let me me know if you need more details, have a nice day!

@guillaume-vignal
Copy link
Collaborator Author

guillaume-vignal commented May 7, 2021

Hi,
I looked deeper to your example. And you're right, at the moment we are not supporting DMatrix as a data format.
So first thing you can do is to train the model with "sckitlearn api " from xgboost (xgboost.XGBRegressor, xgboost.XGBClassifier, ...) in place of xgboost.train.

In the case you can't change and have to use this model already trained.

To bypass the problem you have to clone and use the code.
Change in the file shapash/webapp/smart_app.py this line as following:

  self.components['graph']['detail_feature'].figure = self.explainer.plot.local_plot(index=selected,
                                                                                     label=label,
                                                                                     show_masked=True,
                                                                                     yaxis_max_label=0,
                                                                                     show_predict=False)

adding the parameter show_predict=False

Then when you are doing the compile give it the predictions and the contributions.
If I take my previous example:

import numpy as np
import pandas as pd

import xgboost

import shap
from shapash.explainer.smart_explainer import SmartExplainer

X,y = shap.datasets.nhanesi()
X_display,y_display = shap.datasets.nhanesi(display=True) # human readable feature values
y = np.array(y)
X = X.drop('Unnamed: 0', axis=1)

xgb_train = xgboost.DMatrix(X, label=y)

params_train = {
    "eta": 0.002,
    "max_depth": 3,
    "objective": "survival:cox",
    "subsample": 0.5,
}

model = xgboost.train(params_train, xgb_train, num_boost_round=5)

# Smart explainer creation
xpl = SmartExplainer()
xpl.compile(
    x=X,
    model=model,
    y_pred=pd.Series(model.predict(xgb_train), name='probability'),
    contributions=model.predict(xgb_train, pred_contribs=True)[:, :-1]
)

app = xpl.run_app(title_story='test xgboost', port=8078)

In the same time we'll work with the team on how to solve this issue in the nicest and the best way.

Best regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working shapash 1.1.0
Projects
None yet
3 participants