How to calculate SRI for nonlinear models? #380

jckkvs · 2024-01-10T06:39:52Z

#374 related.

Thank you. I have modified your code and considered non-linear models such as KernelRidge.

However, KernelRidge is naturally not compatible with TreeExplainerFactory, so I considered using KernelExplainerFactory or ExactExplainerFactory. However, since ExactExplainerFactory is not usable depending on the size of the dataset, I adopted KernelExplainerFactory(shap_interaction=True).

In this case, a RuntimeError occurs.
RuntimeError: SHAP interaction values have not been calculated. Create an inspector with parameter 'shap_interaction=True' to enable calculations involving SHAP interaction values.

Checking your implementation, it seems that KernelExplainerFactory does not compute shap_interaction.

facet/src/facet/explanation/_explanation.py

Line 377 in 66bea15

def supports_shap_interaction_values(self) -> bool:

facet/src/facet/inspection/_learner_inspector.py

Line 139 in 66bea15

if shap_interaction:

I have two questions.
1.
For non-linear models, is it necessary to use ExactExplainerFactory and perform inspector.fit()? What should I do if the data size is large?
2.
The specification that KernelExplainerFactory internally converts shap_interaction=True to False is confusing. Would it be better to throw an error if shap_interaction=True is specified, or change it so that the shap_interaction argument cannot be specified at all?

import pandas as pd
from sklearn.model_selection import RepeatedKFold, GridSearchCV

# some helpful imports from sklearndf
from sklearndf.pipeline import RegressorPipelineDF
from sklearndf.regression import RandomForestRegressorDF

# relevant FACET imports
from facet.data import Sample
from facet.selection import LearnerSelector, ParameterSpace

from sklearn.datasets import load_diabetes
X,y = load_diabetes(return_X_y=True)
data = load_diabetes()

X = pd.DataFrame(X)
X.columns  = data["feature_names"]
y = pd.DataFrame(y)
y.columns = ["target"]
diabetes_df = pd.concat([X,y], axis=1)

# create FACET sample object
diabetes_sample = Sample(observations=diabetes_df, target_name="target")

# create a (trivial) pipeline for a random forest regressor

from sklearn.kernel_ridge import KernelRidge
model = KernelRidge()
model.fit(X,y)

# fit the model inspector
from facet.inspection import NativeLearnerInspector
inspector = NativeLearnerInspector(
    model=model,
    explainer_factory=KernelExplainerFactory(),
    n_jobs=-3,
    shap_interaction=True
    
)
inspector.fit(diabetes_sample)

# visualise synergy as a matrix
from pytools.viz.matrix import MatrixDrawer
synergy_matrix = inspector.feature_synergy_matrix()

# visualise redundancy as a matrix
redundancy_matrix = inspector.feature_redundancy_matrix()
# visualise redundancy using a dendrogram
import matplotlib
from pytools.viz.dendrogram import DendrogramDrawer
redundancy = inspector.feature_redundancy_linkage()

The text was updated successfully, but these errors were encountered:

jckkvs added the bug Something isn't working label Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to calculate SRI for nonlinear models? #380

How to calculate SRI for nonlinear models? #380

jckkvs commented Jan 10, 2024

How to calculate SRI for nonlinear models? #380

How to calculate SRI for nonlinear models? #380

Comments

jckkvs commented Jan 10, 2024