Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow shap to take list or dict as input #3572

Open
1 of 2 tasks
JonathanBhimani-Burrows opened this issue Mar 14, 2024 · 3 comments
Open
1 of 2 tasks

Allow shap to take list or dict as input #3572

JonathanBhimani-Burrows opened this issue Mar 14, 2024 · 3 comments
Labels
awaiting feedback Indicates that further information is required from the issue creator enhancement Indicates new feature requests

Comments

@JonathanBhimani-Burrows

Problem Description

I'm currently working on a model (torch), that takes a dict of tensors as input. Thing is, each tensor has a wildly different shape, so trying to use an np array as the input type won't work
Allowing for shap to take in a list or a dict would be very useful as inputs of different lengths would be easy to manage

Alternative Solutions

Given this might take some time to implement, is there a usable workaround that doesn't require trying to create a dataframe?
In theory, I could create a dataframe, where feature A from input dict has n columns based on it's shape. In the wrapper function, I could take the dataframe, combine the appropriate columns into a dict, and then infer from there, but this will get very laborious, especially as we need to pass in multiple samples at once

Additional Context

No response

Feature request checklist

  • I have checked the issue tracker for duplicate issues.
  • I'd be interested in making a PR to implement this feature
@JonathanBhimani-Burrows JonathanBhimani-Burrows added the enhancement Indicates new feature requests label Mar 14, 2024
@CloseChoice
Copy link
Collaborator

CloseChoice commented Mar 17, 2024

Thanks @JonathanBhimani-Burrows for opening the issue. We already support handing over lists if the model takes in multiple inputs (but only for deep, kernel and gradient explainers). I refer you to one of our test where we explicitly test this feature: https://github.com/shap/shap/blob/master/tests/explainers/test_deep.py#L748-L749.

Edit: Currently there are no plans for supporting dictionaries here so you have to refer to the different explanations by list indexes. The reason for this is, that we believe it is best if the shap explainer can be called with the exactly same inputs the model can be called.

@CloseChoice CloseChoice added the awaiting feedback Indicates that further information is required from the issue creator label Mar 17, 2024
@JonathanBhimani-Burrows
Copy link
Author

JonathanBhimani-Burrows commented Mar 18, 2024

Thanks for the reply
The problem is that many models in 2024 are multimodal and they're getting bigger and more complicated: in those circumstances, the majority of researchers will use dicts or similar structures and pass them directly into the forward method, especially if they're using some of the newer packages like huggingface. This is a significant change with respect to how things where done in ML 3-4 years ago
I admit there are ways to solve this, but it does involve quite a bit of fiddling and re-writing code just to make them run, whereas this wouldn't be the case if the packages did accept the dicts as is
Perhaps you guys could come up with a tutorial on what needs to be modified for the shap code to accept different data structures. It was done for situations that are more simple (where all you need is a wrapper function, seen here
https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/neural_networks/Census%20income%20classification%20with%20Keras.html ), but if a model explicitly takes dicts in the forward call, then there's quite a bit of modification to the code that is necessary to make them run
Thoughts?

@CloseChoice
Copy link
Collaborator

CloseChoice commented Mar 25, 2024

Would it be possible for you to give a couple of examples so that we can find common patterns?
What would be important for us as well is a the use cases you are interested in. I assume you are taking about DL and large language models and not boosted trees, but a clarification on this would also be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback Indicates that further information is required from the issue creator enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

2 participants