Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Multiregression tasks #361

Open
ThomasWolf0701 opened this issue Nov 19, 2020 · 3 comments
Open

Add support for Multiregression tasks #361

ThomasWolf0701 opened this issue Nov 19, 2020 · 3 comments
Labels
feature 💡 New feature or enhancement request long term 📆 TODO long term Python 🐍 Related to Python

Comments

@ThomasWolf0701
Copy link

ThomasWolf0701 commented Nov 19, 2020

I tried the python version of dalex with a multiregression model and it gave an error. (See below)
Is there any way around it ?
If i understand correctly iBreakdown/pyBreakdown can deal with multiple classes for classification which are also probabilities organized in multiple columns/arrays so this should be quite similar. Would be great if this would be enabled.
The SHAP package also supports Shap values for the multirgression case.

Can i call ibreakdown directly from dalex, without generating an explainer object ? The ibreakdown for Python has not been updated in a while but the new Python Dalex seems quite active.

decision tree for multioutput regression

import dalex as dx
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor

create datasets

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

define model

model = DecisionTreeRegressor()

model.fit(X,y)

dx.Explainer(model,X,y)

data is converted to pd.DataFrame, columns are set as string numbers
-> data : 1000 rows 10 cols
Traceback (most recent call last):

File "", line 11, in
dx.Explainer(model,X,y)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\dalex_explainer\object.py", line 131, in init
y = check_y(y, data, verbose)

File "C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\dalex_explainer\checks.py", line 52, in check_y
raise ValueError("y must have only one dimension")

ValueError: y must have only one dimension

@hbaniecki
Copy link
Member

We don't support multi-output models yet. You can adjust the predict_function to produce iBreakDown plots for a given class.

import dalex as dx
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)

model = DecisionTreeRegressor()

model.fit(X,y)

exp_0 = dx.Explainer(model, X, y[:, 0], predict_function = lambda m, d: m.predict(d)[:, 0], label="output 0")
exp_1 = dx.Explainer(model, X, y[:, 1], predict_function = lambda m, d: m.predict(d)[:, 1], label="output 1")

exp_0.predict_parts(X[2, :]).plot(exp_1.predict_parts(X[2, :]))
y[2, :]

@hbaniecki hbaniecki added Python 🐍 Related to Python long term 📆 TODO long term labels Nov 19, 2020
@hbaniecki hbaniecki changed the title Python Multiregression Error Add support for Multiregression tasks Aug 11, 2021
@hbaniecki hbaniecki added the feature 💡 New feature or enhancement request label Aug 11, 2021
@hbaniecki
Copy link
Member

@edgBR
Copy link

edgBR commented May 2, 2022

Hi @hbaniecki this would be nice

But it would important to consider the MultiOutput wrapper of scikit learn.

Currently I am creating explainers for every target and then adding them to the plots:

image

(every line represents a model)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature 💡 New feature or enhancement request long term 📆 TODO long term Python 🐍 Related to Python
Projects
None yet
Development

No branches or pull requests

3 participants