Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRP for semantic segmentation #118

Closed
gchlebus opened this issue Nov 1, 2018 · 20 comments
Closed

LRP for semantic segmentation #118

gchlebus opened this issue Nov 1, 2018 · 20 comments

Comments

@gchlebus
Copy link

gchlebus commented Nov 1, 2018

Hi,

Thank you very much for this nice library. I think it is a great initiative to have a collection of neural network analysis tools!

I tried to use iNNvestigate to get some insights into semantic segmentation neural networks. I created a toy-task to segment the mnist images using a small U-net architecture. The desired target images were created by thresholding the input images at 0.5. I encountered the following problems when running different variants of LRP analysis:

  • Running the same analysis for the same output neuron for the same input multiple times results in different relevance maps (e.g., the relevance at the same position can be sometimes negative and sometime positive).
  • The relevance map doesn't sum up to the output value of the analysed neuron.

Please find attached a jupyter notebook (as a txt file, since github doesn't support the ipynb extension) and a generated pdf with my code, that I used for this toy-task. I would appreciate your feedback and any hints on how to use LRP to analyse semantic segmentation models.

Best, Grzegorz
mnist_unet.pdf
mnist_unet.txt

@albermax
Copy link
Owner

albermax commented Nov 1, 2018

@sebastian-lapuschkin

@gchlebus
Copy link
Author

gchlebus commented Nov 5, 2018

Please let me know, if I could help you somehow to debug this problem.

@sebastian-lapuschkin
Copy link
Contributor

I will have a look at it asap, as soon as there is some time (ie next week).

@gchlebus
Copy link
Author

gchlebus commented Nov 5, 2018

Great!

@gchlebus
Copy link
Author

Hi Sebastian, did you manage to take a look at the issue? I will greatly appreciate your help.

@sebastian-lapuschkin
Copy link
Contributor

sebastian-lapuschkin commented Nov 20, 2018

Sorry to disappoint, but we have not had the opportunity to take a detailed look yet.
However, based on the attached notebook (pdf version), I have some questions and hints:

  1. your BN layer has bias units, which might absorb the missing quantities of relevance.
    with bias, I mean its two differential operations.
    Try disabling the current default behaviour for batchnorm by just returning the input relevance for this layer instead and see if the difference to the expected relevance value decreases.

  2. I am not sure why the local relevance changes, while the global relevance remains constant. try disabling the incorporation of the batchnorm in the lrp backward pass to see if the problem lies there.

  3. we did not yet use LRP to analyze segmentation models. Our previous analyses cover classification (as a special case of regression) tasks.
    I see that you use a sigmoid activation function at your output. try disabling that for use in analysis. the sigmoid does not satisfy f(0) = 0 and sign(x) = -sign(f(x)) for x <0, which might cause problems if the sigmoid is used at the output layer.

best,

@gchlebus
Copy link
Author

Hi Sebastian, thank you very much for the hints!

I removed batchnorm layers completely from the model architecture and retrained the model. For the LRP analysis I set the last activation function to "linear". Unfortunately, with these modifications the problems I reported still occur (mnist_unet.pdf).

Ad. 1.&2.: I wasn't sure how to disable the current default behavior of the batchnorm or how to disable the incorporation of the batchnorm in the LRP backward pass... As I would like to use batchnorm in my further experiments I would appreciate if you could tell me how the mentioned changes could be done.

Ad. 3.: I think that LRP for segmentation models would be of great interest (especially in the medical context). I would be happy to help you to extend your project to support such architectures.

Best,
Grzegorz

@gchlebus
Copy link
Author

Hi Sebastian,

Is there any chance that you would fine some time to take a look at this issue again? I would appreciate this.

Best, Grzegorz

@albermax
Copy link
Owner

Hi Grzegorz,

could you create a Github gist or any other link where we can access to code directly?
The pdf crops some code lines and makes it harder for us to reproduce the problem!
That would be great, and I can have a look why this seems not be deterministic.
Sorry if this took very long.

Cheers,
Max

@gchlebus
Copy link
Author

Hi Max,

I created a github repo (https://github.com/gchlebus/lrp-for-segmentation) where you can find the jupyter notebook, which reproduces the mentioned problems.

Best,
Grzegorz

@albermax
Copy link
Owner

Hi Grzegorz,

thank you! I will try to look into it soon.
I'll keep you posted!

Cheers,
Max

@albermax
Copy link
Owner

albermax commented Dec 5, 2018

I wont find time until next week. Sorry.

@gchlebus
Copy link
Author

gchlebus commented Dec 7, 2018

Ok, thanks for keeping me posted.

albermax added a commit that referenced this issue Dec 11, 2018
@albermax
Copy link
Owner

Hi Grzegorz,

I'm very sorry that this took so long! Commit should fix this. It is pushed into develop and master branch.

It would be great if you could test it! For me it solved both problems initial problems, and worked also with batchnorm.

Hope this helps you!

Cheers,
Max

@gchlebus
Copy link
Author

Hi Max,

Thank you very much. Your fix makes the analysis reproducible. I am getting exactly the same results for the same output neuron when running the analysis multiple times. However, the sum of the relevance map still deviates from the value of the analysed output neuron (I confirmed this behaviour with a model with and without batch norm). The deviation can be sometimes quite large (e.g., neuron outputs 1.7 and the relevance map for this neuron sums up to 40).

Maybe there is something wrong with the way I call the analyser? Which relevance rule would you recommend to use for semantic segmentation ConvNets (in the notebook I use the EpsilonRule)? The input_layer_rule parameter passed to the analyzer defines the range of input values the model receives, is it correct?

Best, Grzegorz

@albermax
Copy link
Owner

Hi Grzegorz,

no, you call the analyzer in the right way, and yes the input_layer_rule defines input range (which is used in the "Z_B" rule in the first layer).
But I am not sure if the Z_B rule is conservative. Does this still occur if you don't use it, i.e., not set that parameter?

@sebastian-lapuschkin should know more about which rule to use.

Cheers,
Max

@albermax albermax reopened this Dec 13, 2018
@sebastian-lapuschkin
Copy link
Contributor

sebastian-lapuschkin commented Dec 13, 2018

Sorry for my inactivity lately. Busyness levels are expected to decrease drastically after december.
Can you track the progression of relevance deviation, ie find out at which layer(s)/step(s) this prominently happens?

@albermax: are there some suitable mechanics to do so yet, e.g. early stopping the decomposition process?

My assumption would be large changes happening in BatchNorm layers. the current default treatment of the BN layer interprets it as a sequence of addition/multiplication/addition/multiplication but recent results indicate that this is not the optimal way for decomposing the layer's relevance.
Right now, the $\mu$ and $\beta$ could absorb/inject quantities of relevance, since they act as bias inputs.
Try replacing the content of innvestigate.analyzer.relevance_based.BatchNormalizationReverseLayer with return Rs (which also fits the LRP principle) and see if this helps with your results.

@albermax
Copy link
Owner

Oh the Bias. If you use "LRPEpsilonIgnoreBias" and no batchnorm the sum should stay the same.

Otherwise:
If you use the last code snippet in and use np.sum instead of np.min you should get all the relevance sums along the graph:
https://github.com/albermax/innvestigate/blob/master/examples/notebooks/introduction_development.ipynb

@gchlebus
Copy link
Author

Hi Max, hi Sebastian,

Thank you very much for your help. I tested your suggestions. Please find my answers below.

But I am not sure if the Z_B rule is conservative. Does this still occur if you don't use it, i.e., not set that parameter?

Not setting the input_layer_rule parameter does not change anything. At least in my mnist toy example that I provided you as well.

Try replacing the content of innvestigate.analyzer.relevance_based.BatchNormalizationReverseLayer with return Rs (which also fits the LRP principle) and see if this helps with your results.

I modified the BatchNormalizationReverseLayer.apply function to return Rs immediately. This change didn't solve the problem.

Oh the Bias. If you use "LRPEpsilonIgnoreBias" and no batchnorm the sum should stay the same.

This is correct. If I use the LRPEpsilonIgnoreBias rule to analyse a model without batch norm, then the sum of the relevance map is equal to the output of the analysed neuron. However, in my case I would like to analyse a model that uses BatchNorm layers. Is there any way to get the analysis results to be correct for models with BatchNorm?

Otherwise:
If you use the last code snippet in and use np.sum instead of np.min you should get all the relevance sums along the graph:
https://github.com/albermax/innvestigate/blob/master/examples/notebooks/introduction_development.ipynb

I am not sure how I can make use of the relevances along the graph to solve the problem of relevance absorption/injection.

@albermax
Copy link
Owner

Hi Grzegorz,

thank you so much for looking into this. Basically everything works as it should. The case that the relevance does not sum up for LRPEpsilon and does for LRPEpsilonIgnoreBias is a feature rather than a bug. The idea is that biases are (constant) inputs to the network and "absorb" relevance.

Unfortunately, for the BatchNorm layer there is only an implementation that does not ignore the biases, hence as you observed the output sum is not equal the neuron's activation value.
If you would like to have that I suggest to use LRPEpsilonIgnoreBias and use the fix for BatchNorm until Sebastian finds the time to extend the code base.

@sebastian-lapuschkin maybe you would like to add something to this.

Cheers,
Max

PS: To the code reference: it was not to "solve" your problem, only to inspect the values along the backward propagation. My bad for being not clear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants