Linear gaussian explainer #372

martinju · 2024-01-29T08:04:46Z

Adds two user-functions: explain_lingauss and explain_lingauss_precomputed().
These allows fast computation of Shapley values for purely linear models (i.e. no interactions, quadratic terms etc) under the assumption of a Gaussian distribution for the features.

The implementation is based on Sec 2 here: https://arxiv.org/pdf/2006.16234.pdf, but with a somewhat simplified formula for the Tmu and Tx formulae avoiding the need to compute the Q-matrix as it always take the same form.
The permutation based Shapley estimation approach is used here instead of the kernelSHAP Shapley estimation approach used elsewhere in the package. Another PR will make that universally available. The
The pairwise sampling is applied and always used (currently not an option to disable this).

TODO

Implement the permutation sampling in Rcpp.
Implement the looping over Tmu/Tmx in Rcpp
Add MSE computation? We don't have the v(S) directly computes, and probably don't want to compute it either, but can we simplify the MSE computation in this case under the assumption on the model being linear, but without assuming the features are gaussian (in practice)? I do think that might be possible -- look at the formulas to verify this. Then we can decide whether it is worth implementing or not.
Implement grouping. I guess the best way to do this, is to sample group permutations first, and then translate these to the appropriate
Update vignette with example on how to use the method
Add examples
Improve documentation

no antitetic sampling yet

+ more scripting

The issue seems to be incorrect weighting of the different S's. I should try to loop through the permutations within the loop instead, extract the relevant S, to then do the computation. Just to see how they are all weighted.

then need to find the weighting per row in S, to then make it more efficient

Will simplify it all creating a function which computes Udiffs for a list of perms (perm_dt) instead of pre-computing stuff and extracting them.

martinju added 30 commits December 6, 2023 21:02

initial stuff on permutation alternative

7f67a02

no antitetic sampling yet

brute force adding of paried sampling to permutation approach

f43765f

paired sampling also for kernel

2626e72

+ more scripting

starting to set up the linear_gaussian explainer here

3c53992

Merge remote-tracking branch 'origin/master' into permute

b9bc75e

more work on linear_gaussian explainer

126941c

.

37bcb31

doc

f2ccea2

bugfix

abc1147

working permute version

211121c

script for testing pure permuting

e7e8229

force paired sampling and test linear_gaussian_model

b595337

starting to setup the mapping function, just initials

0338ce3

more work on linear explainer

0f39bd7

complete, but not correct results so far

6b05f70

Issue

22445c7

The issue seems to be incorrect weighting of the different S's. I should try to loop through the permutations within the loop instead, extract the relevant S, to then do the computation. Just to see how they are all weighted.

finally it works

1d6bb94

then need to find the weighting per row in S, to then make it more efficient

more output

67144e3

starting to build up separate X_from_perm_dt_linear_gaussian function

4433714

more on separate X mapper for linear gaussian

c2661ed

removing the Q computation to simplify + revert X_for_lin_mod

db68a52

Will simplify it all creating a function which computes Udiffs for a list of perms (perm_dt) instead of pre-computing stuff and extracting them.

starting to write up direct Ucomputation function

ac7c5b0

implemented working direct approach

323fd6e

move to perm_list for lineargaussian

aca5fe2

alter GHA to run only when "ready for review"

293a30b

remove uncessary arguments

cf59d11

Merge remote-tracking branch 'origin/master' into permute

637f4b5

implement faster permutation sampling with the ranking/unraking approach

bcddf24

remove anything not directly needed by linear_gaussian

003489e

man + other fixes

f4147ee

martinju added 11 commits January 25, 2024 14:50

clean up + new function for post-computing explanations

d51df10

clears out NULL parameters from the parameters list

56a9466

revert the clear NULL paramter stuff

a8ec7af

update rds-files

d3a4551

styler

9354a2d

man

ab7a460

lintr, tests and messages

99c5b36

new test objects

0b01ed9

styler

370552b

.

9e7dbc9

cleanup

c9401f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear gaussian explainer #372

Linear gaussian explainer #372

martinju commented Jan 29, 2024

Linear gaussian explainer #372

Are you sure you want to change the base?

Linear gaussian explainer #372

Conversation

martinju commented Jan 29, 2024