Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to calculate SRI for nonlinear models? #380

Open
jckkvs opened this issue Jan 10, 2024 · 0 comments
Open

How to calculate SRI for nonlinear models? #380

jckkvs opened this issue Jan 10, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jckkvs
Copy link

jckkvs commented Jan 10, 2024

@mtsokol

#374 related.

Thank you. I have modified your code and considered non-linear models such as KernelRidge.

However, KernelRidge is naturally not compatible with TreeExplainerFactory, so I considered using KernelExplainerFactory or ExactExplainerFactory. However, since ExactExplainerFactory is not usable depending on the size of the dataset, I adopted KernelExplainerFactory(shap_interaction=True).

In this case, a RuntimeError occurs.
RuntimeError: SHAP interaction values have not been calculated. Create an inspector with parameter 'shap_interaction=True' to enable calculations involving SHAP interaction values.

Checking your implementation, it seems that KernelExplainerFactory does not compute shap_interaction.

def supports_shap_interaction_values(self) -> bool:

I have two questions.
1.
For non-linear models, is it necessary to use ExactExplainerFactory and perform inspector.fit()? What should I do if the data size is large?
2.
The specification that KernelExplainerFactory internally converts shap_interaction=True to False is confusing. Would it be better to throw an error if shap_interaction=True is specified, or change it so that the shap_interaction argument cannot be specified at all?

import pandas as pd
from sklearn.model_selection import RepeatedKFold, GridSearchCV

# some helpful imports from sklearndf
from sklearndf.pipeline import RegressorPipelineDF
from sklearndf.regression import RandomForestRegressorDF

# relevant FACET imports
from facet.data import Sample
from facet.selection import LearnerSelector, ParameterSpace

from sklearn.datasets import load_diabetes
X,y = load_diabetes(return_X_y=True)
data = load_diabetes()

X = pd.DataFrame(X)
X.columns  = data["feature_names"]
y = pd.DataFrame(y)
y.columns = ["target"]
diabetes_df = pd.concat([X,y], axis=1)

# create FACET sample object
diabetes_sample = Sample(observations=diabetes_df, target_name="target")

# create a (trivial) pipeline for a random forest regressor

from sklearn.kernel_ridge import KernelRidge
model = KernelRidge()
model.fit(X,y)

# fit the model inspector
from facet.inspection import NativeLearnerInspector
inspector = NativeLearnerInspector(
    model=model,
    explainer_factory=KernelExplainerFactory(),
    n_jobs=-3,
    shap_interaction=True
    
)
inspector.fit(diabetes_sample)

# visualise synergy as a matrix
from pytools.viz.matrix import MatrixDrawer
synergy_matrix = inspector.feature_synergy_matrix()

# visualise redundancy as a matrix
redundancy_matrix = inspector.feature_redundancy_matrix()
# visualise redundancy using a dendrogram
import matplotlib
from pytools.viz.dendrogram import DendrogramDrawer
redundancy = inspector.feature_redundancy_linkage()
@jckkvs jckkvs added the bug Something isn't working label Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant