WIP: Fix tf lstm #3419

CloseChoice · 2023-12-02T23:56:58Z

Overview

Description of the changes proposed in this pull request:
The problem is that the tensorflow ops were not captured in eager mode (so self._init_between_tensors was not executed which resulted in an error cause self.in_between_ops was not set). This is fixed by introducing _init_between_tensors_eager method, that executes the graph once for the data input and captures all ops that are called. From there on the flow continues as before. Since tensorflow2 seems to introduce a couple more ops, these also need to be added to the op_handlers.

Checklist

I would suggest the following steps in the given order:

The failing test test_tf_keras_lstm2 might be an indication that one of the recently added ops shouldn't be passed through. Understand what is going on here and how we can do this better.
Check in paper if additivity is conserved with this approach (I guess it should)
Get more test cases to make sure that the issue is correctly fixed (at least 5 different tests, search in previous issues to find some and include feedback from the community). Inspired by this comment, we should at least test RNN, GRU and LSTM layers.
Clean up this PR and make it ready for review. Decide on the tests that we really want to include in the test suite on a permanent basis.
check out if this fixes The SHAP explanations do not sum up to the model's output! #2765

Later:

All pre-commit checks pass.
Unit tests added (if fixing a bug or adding a new feature)

Note: Feel free to review and also post some previously failing examples in here

…s with conversion of tensors

for more information, see https://pre-commit.ci

CloseChoice · 2023-12-07T11:22:06Z

@connortann: I would like to ask for your advice. So this is a pretty longstanding issue, and currently I am capable of evaluating shap values for LSTM, GRU and SimpleRNN layers as long as the neural net is linear (output of layer t is input of layer t+1). But there is a more general case where e.g. multiple inputs can be concatenated which I currently struggle with. I am confident to fix this problem in the upcoming month. But should I prepare a PR where the linear NNs are working and create a seperate one for nonlinear NNs? What is your thought on that?

connortann · 2023-12-11T16:51:46Z

Thanks for taking on this issue, it would be great to get those issues addressed.

In terms of how to arrange the PRs, I don't really have anything specific to say about this PR but only general advice. It's helpful to break PRs down into small incremental changes wherever possible. It's up to your judgement on how best to split any large PRs.

Your emphasis on adding test cases looks very prudent - this seems like it will be critical, given that there is a huge spaec of possible models that DeepExplainer has to be able to explain.

CloseChoice · 2023-12-11T17:52:30Z

The errors seem mainly to happen in 2 tests and whether the shap values sum up as expected is heavily dependent on the input. Not sure if I should mark this tests as xfail for now and create a seperate issue. The thing here is that we have a lot of issues where people are complaining about the Deepexplainer outputs not summing up as expected (not passing assert_additivity) and this PR is still a clear improvement over the status quo even without fixing the underlying assert_additivity issue (which I suspect might be a mixture of rounding issues + some wrong assignments in the ophandler). Any feedback on this is very welcome

raghavchalapathy · 2023-12-12T01:06:52Z

Hi @CloseChoice @connortann I am very closely following this issue and I am happy to test the releases or fixes done Kindly let me know if you need any support from my side happy to contribute to fix this issue

CloseChoice · 2023-12-12T23:05:04Z

Hi @CloseChoice @connortann I am very closely following this issue and I am happy to test the releases or fixes done Kindly let me know if you need any support from my side happy to contribute to fix this issue

Thanks for the reply. Your feedback is most welcome. If you could test this branch on data and tell us whether you realise if the results make sense or if there are strong inconsistencies, that would be amazing.

raghavchalapathy · 2023-12-13T00:23:47Z

Hi @CloseChoice

Sure, Happy to do so.

Just double checking I am good If I run this setup:

this experiment --> https://www.kaggle.com/code/billlucas/explaining-cnn-lstm-using-shap/notebook
With this branch code --> Fix-tf-lstm (Branch)
with python version 3.11 and tensorflow latest version

Is my understanding correct ? If there any specific versions I need to use Please point me to the url where can I refer these details
Thank you

CloseChoice · 2023-12-13T07:08:01Z

Hi @CloseChoice

Sure, Happy to do so.

Just double checking I am good If I run this setup:

this experiment --> https://www.kaggle.com/code/billlucas/explaining-cnn-lstm-using-shap/notebook With this branch code --> Fix-tf-lstm (Branch) with python version 3.11 and tensorflow latest version

Is my understanding correct ? If there any specific versions I need to use Please point me to the url where can I refer these details Thank you

Thanks a lot for helping on this. I really appreciate it. That looks good. Hope with this fix, we are capable to run any tensorflow version above 2.4 aswell, so feel free to use the latest version (our test suite will test against the latest either way). I did not test all layers that I can see in that examples, so am pretty curious about the results.

ANeeK181 · 2023-12-30T15:06:42Z

hi @CloseChoice

i am using it for some LSTM code, as i couldn't compute SHAP values. Now code is working, but I am getting the gradients as zero. Looking into it, I found that in function phi_symbolic (line 365) and then in function grad_graph (line 352), x_grad is zero. Now i see you are still calling self._init_between_tensors(out.op, shap_rAnD) instead of self._init_between_tensors_eager(out.op, shap_rAnD, data..?)? Maybe that is making gradient go to zero?

ANeeK181 · 2023-12-30T16:23:29Z

hi @CloseChoice

i am using it for some LSTM code, as i couldn't compute SHAP values. Now code is working, but I am getting the gradients as zero. Looking into it, I found that in function phi_symbolic (line 365) and then in function grad_graph (line 352), x_grad is zero. Now i see you are still calling self._init_between_tensors(out.op, shap_rAnD) instead of self._init_between_tensors_eager(out.op, shap_rAnD, data..?)? Maybe that is making gradient go to zero?

i also did

x_grad = tape.gradient(
                        out,
                        shap_rAnD,
                        unconnected_gradients=tf.UnconnectedGradients.NONE,
                    )

to see if there is issue in network, but gradient is still zero, which means that gradients are going to zero

CloseChoice · 2023-12-31T00:21:27Z

hi @CloseChoice
i am using it for some LSTM code, as i couldn't compute SHAP values. Now code is working, but I am getting the gradients as zero. Looking into it, I found that in function phi_symbolic (line 365) and then in function grad_graph (line 352), x_grad is zero. Now i see you are still calling self._init_between_tensors(out.op, shap_rAnD) instead of self._init_between_tensors_eager(out.op, shap_rAnD, data..?)? Maybe that is making gradient go to zero?

i also did
x_grad = tape.gradient(
                        out,
                        shap_rAnD,
                        unconnected_gradients=tf.UnconnectedGradients.NONE,
                    )
to see if there is issue in network, but gradient is still zero, which means that gradients are going to zero

Thanks for looking into this. I also found that all shap values are zero (due to the gradients being zero as you found out). The problem here is that tensorflow is hiding the exact gradients from us and just exposed a PartitionedCall to us but we need to get the exact ops that are called. I'll try to find a workaround for this.

ANeeK181 · 2023-12-31T07:32:44Z

What about call to self._init_between_tensors(out.op, shap_rAnD) in grad_graph function? I changed it to self._init_between_tensors_eager(out.op, shap_rAnD, data..?) but then got other errors. maybe if that is fixed, gradients would be correct

CloseChoice · 2024-01-01T11:22:26Z

What about call to self._init_between_tensors(out.op, shap_rAnD) in grad_graph function? I changed it to self._init_between_tensors_eager(out.op, shap_rAnD, data..?) but then got other errors. maybe if that is fixed, gradients would be correct

Nope, that won't help since the _init_between_tensors_eager is also just catching the PartitionedCell but we need to get the ops underlying this cell.

zlds123 · 2024-04-17T19:28:55Z

@CloseChoice Do we have a status update on this? Would really appreciate it because my model is using some tf2 features and require shap lstm explainer.

CloseChoice · 2024-04-18T18:08:20Z

@CloseChoice Do we have a status update on this? Would really appreciate it because my model is using some tf2 features and require shap lstm explainer.

Hey, thanks for having your eyes on this. I do not have worked on this for quite a while. The problems lay somewhere deep in tensorflow hiding their API from us. Can't give any timeframe on this.

CloseChoice and others added 8 commits October 17, 2023 07:20

WIP: add breakpoints

9e1a0c7

WIP: add stuff

bd4cdc1

WIP: add op collection for forward pass of eager model, still problem…

9020f58

…s with conversion of tensors

shap calculation running through

24253f5

Merge branch 'master' into FIX-tf-lstm

b0ed235

WIP: add bugs folder for local testing, to be removed later

1fb209d

[pre-commit.ci] auto fixes from pre-commit.com hooks

8a1641d

for more information, see https://pre-commit.ci

WIP: example with nonlinear nns works, but phi calc is throwing

5687655

CloseChoice and others added 4 commits December 11, 2023 17:16

update tests, fix other things to support tensorflow seq2seq

0bf7977

remove debug statements

869d3eb

rename variable

be3602f

Merge branch 'master' into FIX-tf-lstm

3955336

CloseChoice marked this pull request as ready for review December 11, 2023 16:28

CloseChoice added 2 commits December 11, 2023 17:34

update docstring

eba6014

rename ambiguous variable

90633a8

CloseChoice requested a review from connortann December 11, 2023 17:49

CloseChoice mentioned this pull request Mar 28, 2024

BUG: Error when using DeepExplainer on LSTM Model #3593

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Fix tf lstm #3419

WIP: Fix tf lstm #3419

CloseChoice commented Dec 2, 2023 •

edited

CloseChoice commented Dec 7, 2023 •

edited

connortann commented Dec 11, 2023

CloseChoice commented Dec 11, 2023

raghavchalapathy commented Dec 12, 2023

CloseChoice commented Dec 12, 2023 •

edited

raghavchalapathy commented Dec 13, 2023

CloseChoice commented Dec 13, 2023 •

edited

ANeeK181 commented Dec 30, 2023

ANeeK181 commented Dec 30, 2023

CloseChoice commented Dec 31, 2023

ANeeK181 commented Dec 31, 2023

CloseChoice commented Jan 1, 2024

zlds123 commented Apr 17, 2024

CloseChoice commented Apr 18, 2024

WIP: Fix tf lstm #3419

Are you sure you want to change the base?

WIP: Fix tf lstm #3419

Conversation

CloseChoice commented Dec 2, 2023 • edited

Overview

Checklist

CloseChoice commented Dec 7, 2023 • edited

connortann commented Dec 11, 2023

CloseChoice commented Dec 11, 2023

raghavchalapathy commented Dec 12, 2023

CloseChoice commented Dec 12, 2023 • edited

raghavchalapathy commented Dec 13, 2023

CloseChoice commented Dec 13, 2023 • edited

ANeeK181 commented Dec 30, 2023

ANeeK181 commented Dec 30, 2023

CloseChoice commented Dec 31, 2023

ANeeK181 commented Dec 31, 2023

CloseChoice commented Jan 1, 2024

zlds123 commented Apr 17, 2024

CloseChoice commented Apr 18, 2024

CloseChoice commented Dec 2, 2023 •

edited

CloseChoice commented Dec 7, 2023 •

edited

CloseChoice commented Dec 12, 2023 •

edited

CloseChoice commented Dec 13, 2023 •

edited