LRP for semantic segmentation #118

gchlebus · 2018-11-01T11:30:36Z

Hi,

Thank you very much for this nice library. I think it is a great initiative to have a collection of neural network analysis tools!

I tried to use iNNvestigate to get some insights into semantic segmentation neural networks. I created a toy-task to segment the mnist images using a small U-net architecture. The desired target images were created by thresholding the input images at 0.5. I encountered the following problems when running different variants of LRP analysis:

Running the same analysis for the same output neuron for the same input multiple times results in different relevance maps (e.g., the relevance at the same position can be sometimes negative and sometime positive).
The relevance map doesn't sum up to the output value of the analysed neuron.

Please find attached a jupyter notebook (as a txt file, since github doesn't support the ipynb extension) and a generated pdf with my code, that I used for this toy-task. I would appreciate your feedback and any hints on how to use LRP to analyse semantic segmentation models.

Best, Grzegorz
mnist_unet.pdf
mnist_unet.txt

The text was updated successfully, but these errors were encountered:

albermax · 2018-11-01T15:54:56Z

@sebastian-lapuschkin

gchlebus · 2018-11-05T10:21:51Z

Please let me know, if I could help you somehow to debug this problem.

sebastian-lapuschkin · 2018-11-05T12:03:04Z

I will have a look at it asap, as soon as there is some time (ie next week).

gchlebus · 2018-11-05T12:09:15Z

Great!

gchlebus · 2018-11-19T10:51:34Z

Hi Sebastian, did you manage to take a look at the issue? I will greatly appreciate your help.

sebastian-lapuschkin · 2018-11-20T09:03:28Z

Sorry to disappoint, but we have not had the opportunity to take a detailed look yet.
However, based on the attached notebook (pdf version), I have some questions and hints:

your BN layer has bias units, which might absorb the missing quantities of relevance.
with bias, I mean its two differential operations.
Try disabling the current default behaviour for batchnorm by just returning the input relevance for this layer instead and see if the difference to the expected relevance value decreases.
I am not sure why the local relevance changes, while the global relevance remains constant. try disabling the incorporation of the batchnorm in the lrp backward pass to see if the problem lies there.
we did not yet use LRP to analyze segmentation models. Our previous analyses cover classification (as a special case of regression) tasks.
I see that you use a sigmoid activation function at your output. try disabling that for use in analysis. the sigmoid does not satisfy f(0) = 0 and sign(x) = -sign(f(x)) for x <0, which might cause problems if the sigmoid is used at the output layer.

best,

gchlebus · 2018-11-20T13:17:18Z

Hi Sebastian, thank you very much for the hints!

I removed batchnorm layers completely from the model architecture and retrained the model. For the LRP analysis I set the last activation function to "linear". Unfortunately, with these modifications the problems I reported still occur (mnist_unet.pdf).

Ad. 1.&2.: I wasn't sure how to disable the current default behavior of the batchnorm or how to disable the incorporation of the batchnorm in the LRP backward pass... As I would like to use batchnorm in my further experiments I would appreciate if you could tell me how the mentioned changes could be done.

Ad. 3.: I think that LRP for segmentation models would be of great interest (especially in the medical context). I would be happy to help you to extend your project to support such architectures.

Best,
Grzegorz

gchlebus · 2018-11-28T15:25:57Z

Hi Sebastian,

Is there any chance that you would fine some time to take a look at this issue again? I would appreciate this.

Best, Grzegorz

albermax · 2018-11-28T16:29:28Z

Hi Grzegorz,

could you create a Github gist or any other link where we can access to code directly?
The pdf crops some code lines and makes it harder for us to reproduce the problem!
That would be great, and I can have a look why this seems not be deterministic.
Sorry if this took very long.

Cheers,
Max

gchlebus · 2018-11-29T08:36:53Z

Hi Max,

I created a github repo (https://github.com/gchlebus/lrp-for-segmentation) where you can find the jupyter notebook, which reproduces the mentioned problems.

Best,
Grzegorz

albermax · 2018-11-30T13:34:51Z

Hi Grzegorz,

thank you! I will try to look into it soon.
I'll keep you posted!

Cheers,
Max

albermax · 2018-12-05T14:21:45Z

I wont find time until next week. Sorry.

gchlebus · 2018-12-07T15:40:27Z

Ok, thanks for keeping me posted.

albermax · 2018-12-11T18:55:27Z

Hi Grzegorz,

I'm very sorry that this took so long! Commit should fix this. It is pushed into develop and master branch.

It would be great if you could test it! For me it solved both problems initial problems, and worked also with batchnorm.

Hope this helps you!

Cheers,
Max

gchlebus · 2018-12-13T12:54:46Z

Hi Max,

Thank you very much. Your fix makes the analysis reproducible. I am getting exactly the same results for the same output neuron when running the analysis multiple times. However, the sum of the relevance map still deviates from the value of the analysed output neuron (I confirmed this behaviour with a model with and without batch norm). The deviation can be sometimes quite large (e.g., neuron outputs 1.7 and the relevance map for this neuron sums up to 40).

Maybe there is something wrong with the way I call the analyser? Which relevance rule would you recommend to use for semantic segmentation ConvNets (in the notebook I use the EpsilonRule)? The input_layer_rule parameter passed to the analyzer defines the range of input values the model receives, is it correct?

Best, Grzegorz

albermax · 2018-12-13T15:12:25Z

Hi Grzegorz,

no, you call the analyzer in the right way, and yes the input_layer_rule defines input range (which is used in the "Z_B" rule in the first layer).
But I am not sure if the Z_B rule is conservative. Does this still occur if you don't use it, i.e., not set that parameter?

@sebastian-lapuschkin should know more about which rule to use.

Cheers,
Max

sebastian-lapuschkin · 2018-12-13T15:25:34Z

Sorry for my inactivity lately. Busyness levels are expected to decrease drastically after december.
Can you track the progression of relevance deviation, ie find out at which layer(s)/step(s) this prominently happens?

@albermax: are there some suitable mechanics to do so yet, e.g. early stopping the decomposition process?

My assumption would be large changes happening in BatchNorm layers. the current default treatment of the BN layer interprets it as a sequence of addition/multiplication/addition/multiplication but recent results indicate that this is not the optimal way for decomposing the layer's relevance.
Right now, the $\mu$ and $\beta$ could absorb/inject quantities of relevance, since they act as bias inputs.
Try replacing the content of innvestigate.analyzer.relevance_based.BatchNormalizationReverseLayer with return Rs (which also fits the LRP principle) and see if this helps with your results.

albermax · 2018-12-13T16:21:59Z

Oh the Bias. If you use "LRPEpsilonIgnoreBias" and no batchnorm the sum should stay the same.

Otherwise:
If you use the last code snippet in and use np.sum instead of np.min you should get all the relevance sums along the graph:
https://github.com/albermax/innvestigate/blob/master/examples/notebooks/introduction_development.ipynb

gchlebus · 2018-12-14T17:43:39Z

Hi Max, hi Sebastian,

Thank you very much for your help. I tested your suggestions. Please find my answers below.

But I am not sure if the Z_B rule is conservative. Does this still occur if you don't use it, i.e., not set that parameter?

Not setting the input_layer_rule parameter does not change anything. At least in my mnist toy example that I provided you as well.

Try replacing the content of innvestigate.analyzer.relevance_based.BatchNormalizationReverseLayer with return Rs (which also fits the LRP principle) and see if this helps with your results.

I modified the BatchNormalizationReverseLayer.apply function to return Rs immediately. This change didn't solve the problem.

Oh the Bias. If you use "LRPEpsilonIgnoreBias" and no batchnorm the sum should stay the same.

This is correct. If I use the LRPEpsilonIgnoreBias rule to analyse a model without batch norm, then the sum of the relevance map is equal to the output of the analysed neuron. However, in my case I would like to analyse a model that uses BatchNorm layers. Is there any way to get the analysis results to be correct for models with BatchNorm?

Otherwise:
If you use the last code snippet in and use np.sum instead of np.min you should get all the relevance sums along the graph:
https://github.com/albermax/innvestigate/blob/master/examples/notebooks/introduction_development.ipynb

I am not sure how I can make use of the relevances along the graph to solve the problem of relevance absorption/injection.

albermax · 2018-12-17T08:23:49Z

Hi Grzegorz,

thank you so much for looking into this. Basically everything works as it should. The case that the relevance does not sum up for LRPEpsilon and does for LRPEpsilonIgnoreBias is a feature rather than a bug. The idea is that biases are (constant) inputs to the network and "absorb" relevance.

Unfortunately, for the BatchNorm layer there is only an implementation that does not ignore the biases, hence as you observed the output sum is not equal the neuron's activation value.
If you would like to have that I suggest to use LRPEpsilonIgnoreBias and use the fix for BatchNorm until Sebastian finds the time to extend the code base.

@sebastian-lapuschkin maybe you would like to add something to this.

Cheers,
Max

PS: To the code reference: it was not to "solve" your problem, only to inspect the values along the backward propagation. My bad for being not clear!

albermax added a commit that referenced this issue Dec 11, 2018

Fix bug #118.

99c14ed

albermax closed this as completed Dec 13, 2018

albermax reopened this Dec 13, 2018

albermax closed this as completed Dec 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LRP for semantic segmentation #118

LRP for semantic segmentation #118

gchlebus commented Nov 1, 2018 •

edited

albermax commented Nov 1, 2018

gchlebus commented Nov 5, 2018

sebastian-lapuschkin commented Nov 5, 2018

gchlebus commented Nov 5, 2018

gchlebus commented Nov 19, 2018

sebastian-lapuschkin commented Nov 20, 2018 •

edited

gchlebus commented Nov 20, 2018

gchlebus commented Nov 28, 2018

albermax commented Nov 28, 2018

gchlebus commented Nov 29, 2018

albermax commented Nov 30, 2018

albermax commented Dec 5, 2018

gchlebus commented Dec 7, 2018

albermax commented Dec 11, 2018

gchlebus commented Dec 13, 2018

albermax commented Dec 13, 2018

sebastian-lapuschkin commented Dec 13, 2018 •

edited

albermax commented Dec 13, 2018

gchlebus commented Dec 14, 2018

albermax commented Dec 17, 2018

LRP for semantic segmentation #118

LRP for semantic segmentation #118

Comments

gchlebus commented Nov 1, 2018 • edited

albermax commented Nov 1, 2018

gchlebus commented Nov 5, 2018

sebastian-lapuschkin commented Nov 5, 2018

gchlebus commented Nov 5, 2018

gchlebus commented Nov 19, 2018

sebastian-lapuschkin commented Nov 20, 2018 • edited

gchlebus commented Nov 20, 2018

gchlebus commented Nov 28, 2018

albermax commented Nov 28, 2018

gchlebus commented Nov 29, 2018

albermax commented Nov 30, 2018

albermax commented Dec 5, 2018

gchlebus commented Dec 7, 2018

albermax commented Dec 11, 2018

gchlebus commented Dec 13, 2018

albermax commented Dec 13, 2018

sebastian-lapuschkin commented Dec 13, 2018 • edited

albermax commented Dec 13, 2018

gchlebus commented Dec 14, 2018

albermax commented Dec 17, 2018

gchlebus commented Nov 1, 2018 •

edited

sebastian-lapuschkin commented Nov 20, 2018 •

edited

sebastian-lapuschkin commented Dec 13, 2018 •

edited