New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with XGBoost contributions computation #127
Comments
I am having trouble making Shapash works with Xgboost. I am running this exact code, but I get the folowing error: TypeError: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>) I created an environment and install Xgboost 1.0.0 and python 3.7. |
Hi @gariciodaro, please can you help us saying which version of shapash you are using ? You error message seems to tell you to transform first your data in Pandas DataFrame to DMatrix before passing it to your xg_boost.train. To load a Pandas data frame into DMatrix: data = pandas.DataFrame(np.arange(12).reshape((4,3)), columns=['a', 'b', 'c'])
dtrain = xgb.DMatrix(data) Tell me if it works for you. Have a nice day. |
For the sake of clarity, I created a new Python 3.7 environment (using conda). I need to have at least Xgboost + MLflow and of course Shapash. I installed them via pip. shapash==1.3.2 and xgboost==1.4.0. The complete pip freeze is as follows:
I am trying to load a model that was tracked and deployed with MLflow into shapash. However, I keep getting this DMatrix error. So I went back to the basis and run the example code you have here: import numpy as np
import pandas as pd
import xgboost
import shap
from shapash.explainer.smart_explainer import SmartExplainer
X,y = shap.datasets.nhanesi()
X_display,y_display = shap.datasets.nhanesi(display=True) # human readable feature values
y = np.array(y)
X = X.drop('Unnamed: 0', axis=1)
xgb_train = xgboost.DMatrix(X, label=y)
params_train = {
"eta": 0.002,
"max_depth": 3,
"objective": "survival:cox",
"subsample": 0.5,
}
model = xgboost.train(params_train, xgb_train, num_boost_round=5)
# Smart explainer creation
xpl = SmartExplainer()
xpl.compile(
x=X,
model=model
)
app = xpl.run_app(title_story='test xgboost') but I got the same error: File ".\test.py", line 33, in <module>
app = xpl.run_app(title_story='test xgboost')
File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\shapash\explainer\smart_explainer.py", line 911, in run_app
self.predict()
File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\shapash\explainer\smart_explainer.py", line 755, in predict
self.y_pred = predict(self.model, self.x_init)
File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\shapash\utils\model.py", line 76, in predict
y_pred = pd.DataFrame(model.predict(x_init), columns=['pred'], index=x_init.index)
File "C:\Users\CiodarG\Anaconda3\envs\Shapash\lib\site-packages\xgboost\core.py", line 1703, in predict
raise TypeError('Expecting data to be a DMatrix object, got: ', type(data))
TypeError: ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>) The problem is that the import mlflow
import joblib
import pandas as pd
from shapash.explainer.smart_explainer import SmartExplainer
import xgboost as xgb
def main():
# Load Data set
XtestData=joblib.load('../DataSets/X_train.file')
# Load artifact.
local_path='C:/Users/CiodarG/Documents/MLflowGari/mlruns/2'
run_location='a8bab32786794b70bd85596eebc1442a'
logged_model = 'file:///{}/{}/artifacts/model'.format(
local_path,
run_location)
loaded_model = mlflow.xgboost.load_model(logged_model)
# make prediction with booster object.
y_pred=loaded_model.predict(xgb.DMatrix(data=XtestData))
y_pred_df = pd.DataFrame(y_pred,
columns=['Valuation'],
index=XtestData.index)
#Call shapash
xpl = SmartExplainer()
# Compile the object to be explored
xpl.compile(
x=XtestData,
y_pred= y_pred_df,
model=loaded_model)
# start the web app.
app = xpl.run_app(title_story='My case')
if __name__ == '__main__':
main()
Let me me know if you need more details, have a nice day! |
Hi, In the case you can't change and have to use this model already trained. To bypass the problem you have to clone and use the code. self.components['graph']['detail_feature'].figure = self.explainer.plot.local_plot(index=selected,
label=label,
show_masked=True,
yaxis_max_label=0,
show_predict=False) adding the parameter Then when you are doing the compile give it the predictions and the contributions. import numpy as np
import pandas as pd
import xgboost
import shap
from shapash.explainer.smart_explainer import SmartExplainer
X,y = shap.datasets.nhanesi()
X_display,y_display = shap.datasets.nhanesi(display=True) # human readable feature values
y = np.array(y)
X = X.drop('Unnamed: 0', axis=1)
xgb_train = xgboost.DMatrix(X, label=y)
params_train = {
"eta": 0.002,
"max_depth": 3,
"objective": "survival:cox",
"subsample": 0.5,
}
model = xgboost.train(params_train, xgb_train, num_boost_round=5)
# Smart explainer creation
xpl = SmartExplainer()
xpl.compile(
x=X,
model=model,
y_pred=pd.Series(model.predict(xgb_train), name='probability'),
contributions=model.predict(xgb_train, pred_contribs=True)[:, :-1]
)
app = xpl.run_app(title_story='test xgboost', port=8078) In the same time we'll work with the team on how to solve this issue in the nicest and the best way. Best regards. |
Code
Error:
Hint:
Python version : 3.8
Shapash version : 1.1.0
XGBoost version : 1.0.0
Operating System : Linux
The text was updated successfully, but these errors were encountered: