Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Causal and asymmetric Shapley values implementation #273

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

igbucur
Copy link

@igbucur igbucur commented Aug 25, 2021

This branch contains an implementation for computing causal and asymmetric Shapley values, based on the supplement code for the paper [1]. The code is adapted from the package CauSHAPley (https://gitlab.science.ru.nl/gbucur/caushapley/).

Asymmetric Shapley values were proposed in [2] as a way to incorporate causal knowledge in the real world by restricting the possible permutations of the features when computing the Shapley values to those consistent with a (partial) causal ordering.
Causal Shapley values were proposed in [1] as a way to explain the total effect of features on the prediction, taking into account their causal relationships, by adapting the sampling procedure in shapr.
The two ideas can be combined to obtain asymmetric causal Shapley values. For more details, see [1].

The branch adds the following functions for computing causal Shapley values:

  • sample_causal in sampling.R
  • explain.causal in explanation.R
  • prepare_data.causal in observations.R

The branch adds the following functionality for computing asymmetric Shapley values:

  • additional branches for feature_combinations and feature_exact in features.R
  • respects_order in utils.R

Finally, the function shapr gets two new parameters:

  • asymmetric : Logical flag specifying whether we want to compute asymmetric Shapley values.
  • causal_ordering : List of vectors specifying (partial) causal ordering.

These parameters are saved in the explainer object returned by shapr, for which reason the known objects in the test suite have been updated. The branch also adds a number of basic tests for the new functionality.

References:
[1] Heskes, T., Sijben, E., Bucur, I. G., & Claassen, T. (2020). Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models. Advances in Neural Information Processing Systems, 33.
[2] Frye, C., Rowat, C., & Feige, I. (2020). Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability. Advances in Neural Information Processing Systems, 33.

@igbucur igbucur marked this pull request as draft August 25, 2021 15:47
@martinju
Copy link
Member

Thank you, @igbucur for taking your time to prepare this PR!

I have looked at the code and iterated through the most important parts of the new code with some example data. While I have some minor comments, and it does indeed seem to work well 👍

Before we start discussing details, I do have a few broader question/comment:

You have added the causal method as a new approach, which applies the practical implementation method from theorem 1 in your paper while assuming a Gaussian distribution for your data. Please do correct me if I am wrong, but I don't see any reason the method should be restricted to the Gaussian distribution. We have implemented a series of other approaches for estimating the conditional distributions, and it would be great if the user could use the causal method with any of these. Allowing that will require some changes in the main package, but from what I understand, it can be carried out by figuring out which conditional distributions that needs to be estimated, and in what order, and then simply looping over these different chain components -- adding new sampled columns iteratively, similarly to how you did with the Gaussian method.

What do you think @igbucur ? Did I miss any details regarding this possibility? If you think it is doable, I could assist you in making the appropriate modifications in the core package.

@igbucur
Copy link
Author

igbucur commented Aug 26, 2021

Thank you for the feedback @martinju . Yes, there should be no reason for the approach to be limited to the Gaussian distribution, and we could in principle use any of the approaches for estimating the conditional distributions for computing the causal Shapley values.

I think it would be doable. Perhaps it would be better then to have a causal flag, used in a similar way to asymmetric, instead of treating it a separate approach in explain. The other approaches for which to implement causal Shapley values would then be "empirical", "copula", "ctree", and "independence"?

@martinju
Copy link
Member

Sounds good! Yes, I am thinking that whenever causal_ordering is not NULL, then the causal ordering is respected with the method specified under approach (gaussian, copula, ctree, empirical or independence). I think we have to think a bit regarding the best approach for implementing this. Ongoing work on implementing a "batch mode" allowing just parts of the subsets to be handled simultaneously (see #244 ) may also affect this a bit.

In any case, I believe the best starting point would be to create a function which "figures out" which conditional distributions need to be computed based on the S-matrix (or X-matrix) created in the shapr function + the definition in causal_ordering, and stores that in some list or data.table, which ultimately could be used by any of the approaches in explain/prepare.data. I believe that would essentially consist of the relevant parts of the code in prepare.causal. Then I could figure out how to best use that object in a universal way within explain/prepare.datain the next stage. What do you think?

@igbucur
Copy link
Author

igbucur commented Aug 27, 2021

Thanks for the tip. Yeah, I think this makes sense, but I'll have to give some more thought on how to implement it.

@martinju martinju added this to In progress in Towards shapr 1.0.0 Oct 4, 2021
@martinju
Copy link
Member

martinju commented Oct 5, 2021

@igbucur Are you currently working on this? If so, let me know if you want to chat about how to go about it!

@igbucur
Copy link
Author

igbucur commented Oct 5, 2021

@martinju Yes, I think I'm ready to give it a go. I had a look at how to tackle the proposed extension and here are my thoughts:

  1. For the causal part, the solution is as you hinted before. I think the function you suggested would have to replace lapply in each of the prepare_data functions. The function would take as input the causal ordering and sample each different causal chain component , which means it would call the available sampling functions multiple times (these would not have to be changed). In the default case, there is a single chain component containing all variables, so the new function would call the sampling functions just once for all the variables and for all possible conditioning sets, as is being done at the moment.

  2. For the asymmetric part, it should be even simpler. The features that do not follow the causal ordering have to be removed if asymmetric == TRUE. We could do that by passing an appropriate index_features argument to the prepare_data functions. This could probably be done in the explain function.

What do you think? Does this approach seem reasonable?

@martinju
Copy link
Member

martinju commented Oct 6, 2021

Sounds good!

I think that the function that we need is really the two functions say A(S,j) and B(S,j) which gives p(X_Sbar|X_S) = \prod_j p(X_A(S,j)|X_B(S,j)) for the specified causal ordering, and I believe it would be best to compute these within the shapr-function and store them in some object there which is then used in the explain/prepare_data by iteratively updating the data matrix to perform prediction on.

In prepare data this could either be achieved by replacing the lapply-call as you write, or by modifying the sampling functions to actually doing iterative sampling. I am not sure which approach preferrable at the moment.

Note that the empirical (+ independence) methods are not constructed as lapply around sampling functions, and ctree also requires an initial model fitting procedure.

My main point is that it think construction of the "routine" needed for the specific iterative sampling should be created already in the shapr-function :-)

@igbucur
Copy link
Author

igbucur commented Oct 6, 2021

Okay, I will think how it could be done in the shapr function. The difficulty here I think is that instead of calling the sampling functions just once for each feature, you have to call them multiple times per feature, that is, the number of components times.

I was thinking about encapsulating the causal ordering functionality either in a new custom lapply or somewhere upstream (perhaps in shapr, like you suggested) in order to avoid having to reimplement this factorization every time a new sampling functions is added. This way the sampling functions can stay the same, while the decision on which conditional probabilities need to be estimated and multiplied is made upstream. It might also be an idea to design a function that takes the features in and the causal ordering and splits them according to the different components. Perhaps this is something that could be done in shapr.

@martinju
Copy link
Member

martinju commented Oct 6, 2021

Maybe I was unclear, but the function you talk about , taking features and causal ordering as inputs, is exactly what I was thinking about putting in the shapr function. :-)

@martinju
Copy link
Member

martinju commented Oct 6, 2021

And let me know if you want me to out together a function like that is shapr. It should be rather straightforward, I think.

@LHBO LHBO mentioned this pull request May 23, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Towards shapr 1.0.0
  
In progress
Status: Todo
Development

Successfully merging this pull request may close these issues.

None yet

2 participants