Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some setting when using the code #1

Open
GlowingHorse opened this issue Jun 2, 2018 · 11 comments
Open

Some setting when using the code #1

GlowingHorse opened this issue Jun 2, 2018 · 11 comments

Comments

@GlowingHorse
Copy link

GlowingHorse commented Jun 2, 2018

I just run the code yesterday, and found some things need to pay attention before running.
I am using matlab2018a, Windows10 64, VS2015 64.

First,
When install gsc-1.2, you'd better look this document from matlab. And change some code in compile_mex.m(deconvnet_analysis-master\saliency\gsc\packages+bj) (https://ww2.mathworks.cn/help/matlab/matlab_external/upgrading-mex-files-to-use-64-bit-api.html)

Second,
This file needs to make some change(saliency\ksseg.m). Code(95) change into :
if max(max(labels))>0
segH.start(labels);
else
segH.start(labelsClamp);
end
cuz, in start there is only one parameter for input.

Third,
version of matconvnet 25 is OK. Lower version I found some functions parameters wrong.

Forth,
When using run_2.m, add run ../matconvnet/matlab/vl_setupnn.m in your file is better.

Last,
When try to visualize multi-class model, you should see advice of aravindhm. "Please try either network inversion, excitation backprop, layerwise relevance propagation or grad-CAM."

@GlowingHorse GlowingHorse changed the title Some solve methods when using the code Some setting when using the code Jun 2, 2018
@GlowingHorse
Copy link
Author

It seems like in deep layers, choose any neuron and given a big or little value for that neuron (means "dzdy(opts.neuron_I, opts.neuron_J, opts.neuron_channel) = 1" or "dzdy(opts.neuron_I, opts.neuron_J, opts.neuron_channel-150) = 900;" in code "hand_specified_neuron_viz_fn.m" ).
Any of these will not hurt the result of visualization. Maybe it is caused by high-level features is not visible.
Maybe the low-level features, like set different neuron fired in shallow layers, maybe influence the result of visualization.

I just want to figure out what the reason of setting dzdy to 1. Because sigmoid's (last layer in Alexnet) output is 1 ?
And I don't know is there some value if I try to visualize the fast rcnn(pre-trained dagnn model in matconvnet). Thank you.

@aravindhm
Copy link
Owner

Hi @ShiRuiCV

The value of 1 was picked arbitrarily. We observed that using any positive random noise yields similar results. So I think the exact value doesn't matter.

Our observation is that gradient based network reversal strategies such as DeConvNet, Guided-Backprop and Network Saliency are not useful for studying individual neurons. Thus using these methods to visualize fast-rcnn will be ineffective. Please try either network inversion, excitation backprop, layerwise relevance propagation or grad-CAM.

@GlowingHorse
Copy link
Author

GlowingHorse commented Jun 7, 2018

@aravindhm Thank you very much. Your advice really help me a lot.

I find the helplessness of DeConvNet, Guided-Backprop and Network Saliency is, just like this paper said, "class-discriminative". ("Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization", which cited yours as [34]). I think I should begin to figure out this new paper of visualization.

But, I also would like to try visual the last layer of the rcnn using method like Network Saliency . Maybe it can get the salient objects in image, but cannot classify these objects?

Ps: your code of graph cut, it seems that graph cut needs some knowledge of energy functional. I have not studied functional. Maybe, I should learn some knowledge of it. Because, it always appear in computer vision. But I think it would not too hard for you to study it.

@aravindhm
Copy link
Owner

Hi @ShiRuiCV

The graph cut code is not mine. It is from this paper:

V. Gulshan, C. Rother, A. Criminisi, A. Blake and A. Zisserman. Geodesic star convexity for interactive image segmentation. Proceedings of Conference on Vision and Pattern Recognition (CVPR 2010). PDF

@GlowingHorse
Copy link
Author

GlowingHorse commented Jul 3, 2018

Hello, @aravindhm

Recently, I tried to program the network inversion, excitation backprop and grad-CAM. I found excitation backprop and grad-CAM are based on one consideration about using dzdx to be as weight or probability. These method is useful for understanding classification layers(fc layers), because the result can be class-discriminative and is easy to see from the dzdx('input').

Using Guided-Backprop, the low level conv-layers' result also can be seen easily.

But both of these method cannot explain the middle layers' function well (like, VGGDP conv3 conv 4 conv5). Commonly, people think middle layers are responsible for feature combination.

Would you like share some thoughts about explaining middle layers?
When I try grad-cam with guided-back, the middle layers can be visualized as some parts of image(it should means features clustering), but I think it's very non-intuitive.

Personly, I'd like to study clustering method or minimization of the energy functional and use them to try to explain the network.

@GlowingHorse
Copy link
Author

Hello, @aravindhm

There is one thing I would like to ask for some suggestions. I found transposed convolution for dzdx in MatConvNet is not compute the gradient of x. So it seems like, for example when set one of feature maps of x31 as dzdx31, there is no need to set the value in this feature map to one, we can retain original value of the feature map of x31. And when retain the original value, the results of lower layers are better to see.

@aravindhm
Copy link
Owner

Hi @ShiRuiCV

I am not familiar with the implementations in MatConvNet. I do not follow your question.

Regarding visualizing middle layers: Fully connected layers and convolutional layers differ only in the size of their receptive field. As such all three methods should work equally for both layer types. Our paper argues that DeConvNet's and Guided Backprop's result is influenced by auxilliary information such as pooling switches and rectification masks. This effect will be stronger for deeper layers, typically the fully connected ones. I think excitation backprop is able to overcome it.

@GlowingHorse
Copy link
Author

Thanks for you reply @aravindhm

I recently used the deconvolution method to visualize ResNet, but found that due to the influence of BatchNorm, the visualization results did not work well on like in VGG model. I would like to ask for some information about the setting of BN when trying to use deconvolution on ResNet? (I found you use identity value when handle LRN layers, it seems like a good setting)

@aravindhm
Copy link
Owner

Hi @ShiRuiCV ,

Good question. I haven't experimented with batch-norm based networks yet. I suggest that you absord the batch-norm parameters into the convolution layers preceding each of them. This gives a batch-norm free network which can be visualized as it is. Setting batch-norm-reversed to identity will affect the scaling of gradients which may break things.

Best Wishes,
Aravindh

@GlowingHorse
Copy link
Author

Thanks for your warm reply.

In fact I didn't accurately understand your means of

absorb the batch-norm parameters into the convolution layers preceding each of them

. Would you mind give some more specific implement method about it? I just try three ways to compute gradients in the layer, but both of them are not good to generate good visualization results.

Say, dzdx[l] dzdx[l-1] x[l] x[l-1] as one BN layer's inputs and outputs.

First, I try to use original gradient computing method (Built-in function in most of framework)in BN layer.

Second, I change the backward computing method to use identity value
( a) set one feature map of x[l] into a new all zeros matrix dzdx[l],
b) at BN layer, just make dzdx[l-1] equal to dzdx[l].
c) continue the backward propagation.)
It also failed as you said.

Third, BN is x[l] = moments_1 * (x[l-1] - mean) / var + moments_2. Then I use the reverse version of it.
dzdx[l-1] = (dzdx[l] - moments_2) / moments_1 * var + mean. Also, failed.

@GlowingHorse
Copy link
Author

Ps: Because the influences of Batch Norm, the visual results have many artifacts looks like regular ripple disturbance in method 1 2 as I mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants