Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output features (after ReLU) contains negative values #65

Closed
dasabir opened this issue Nov 30, 2016 · 11 comments
Closed

Output features (after ReLU) contains negative values #65

dasabir opened this issue Nov 30, 2016 · 11 comments
Labels

Comments

@dasabir
Copy link

dasabir commented Nov 30, 2016

Issue summary

There seems something weird when I'm trying to see the output activation from the first conv layer. I'm getting negative values as activations after ReLU.

Steps to reproduce

You have to run "check_data_and_model.py" inside "examples/c3d_ucf101/". SInce I did not have the model ("c3d_ucf101_iter_5000.caffemodel") said in this python script (Line 17), I downloaded the pretrained model provided by Jimmy - Link.
Just add the following lines at the end of the script. This checks if the activation from 'conv1a' has any negative values or not. I can see that (except for the first 4 feature maps) they do have (See the sample output from the script at the bottom). This should not be the case as "net.blobs['conv1a'].data" should give the output after ReLU.
== Code to add at the bottom ==

for i in range(64):
  print "Conv1a output filter {} has negative values: {}".format(i,np.any(net.blobs['conv1a'].data[0,i]<0))

As a check I tried to see what are the output of the pooling layer after this? So I added the following two lines at the end. But the result is still same.
== Code to add at the bottom ==

for i in range(64):
  print "Pool1 output filter {} has negative values: {}".format(i,np.any(net.blobs['pool1'].data[0,i]<0))

Another check I also did. Changing the name of the top layer in the 'relu1a' layer of the prototxt ("c3d_ucf101_deploy.prototxt") - Line 44. I just changed it to - top: "conv1a-relu". Now when I'm runing same code (with net.blobs['conv1a-relu'].data[0,i]) it shows me the intended result, i.e. no negative value.

Your system configuration

Operating system: centos 7
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5.1
BLAS: open
Python or MATLAB version (for pycaffe and matcaffe respectively): python 2.7

Sample output (when testing with net.blobs['conv1a']):

Conv1a output filter 0 has negative values: False
Conv1a output filter 1 has negative values: False
Conv1a output filter 2 has negative values: False
Conv1a output filter 3 has negative values: False
Conv1a output filter 4 has negative values: True
Conv1a output filter 5 has negative values: True
Conv1a output filter 6 has negative values: True
Conv1a output filter 7 has negative values: True
Conv1a output filter 8 has negative values: True
Conv1a output filter 9 has negative values: True
Conv1a output filter 10 has negative values: True
Conv1a output filter 11 has negative values: True
Conv1a output filter 12 has negative values: True
Conv1a output filter 13 has negative values: True
Conv1a output filter 14 has negative values: True
Conv1a output filter 15 has negative values: True
Conv1a output filter 16 has negative values: True
Conv1a output filter 17 has negative values: True
Conv1a output filter 18 has negative values: True
Conv1a output filter 19 has negative values: True
Conv1a output filter 20 has negative values: True
Conv1a output filter 21 has negative values: True
Conv1a output filter 22 has negative values: True
Conv1a output filter 23 has negative values: True
Conv1a output filter 24 has negative values: True
Conv1a output filter 25 has negative values: True
Conv1a output filter 26 has negative values: True
Conv1a output filter 27 has negative values: True
Conv1a output filter 28 has negative values: True
Conv1a output filter 29 has negative values: True
Conv1a output filter 30 has negative values: True
Conv1a output filter 31 has negative values: True
Conv1a output filter 32 has negative values: True
Conv1a output filter 33 has negative values: True
Conv1a output filter 34 has negative values: True
Conv1a output filter 35 has negative values: True
Conv1a output filter 36 has negative values: True
Conv1a output filter 37 has negative values: True
Conv1a output filter 38 has negative values: True
Conv1a output filter 39 has negative values: True
Conv1a output filter 40 has negative values: True
Conv1a output filter 41 has negative values: True
Conv1a output filter 42 has negative values: True
Conv1a output filter 43 has negative values: True
Conv1a output filter 44 has negative values: True
Conv1a output filter 45 has negative values: True
Conv1a output filter 46 has negative values: True
Conv1a output filter 47 has negative values: True
Conv1a output filter 48 has negative values: True
Conv1a output filter 49 has negative values: True
Conv1a output filter 50 has negative values: True
Conv1a output filter 51 has negative values: True
Conv1a output filter 52 has negative values: True
Conv1a output filter 53 has negative values: True
Conv1a output filter 54 has negative values: True
Conv1a output filter 55 has negative values: True
Conv1a output filter 56 has negative values: True
Conv1a output filter 57 has negative values: True
Conv1a output filter 58 has negative values: True
Conv1a output filter 59 has negative values: True
Conv1a output filter 60 has negative values: True
Conv1a output filter 61 has negative values: True
Conv1a output filter 62 has negative values: True
Conv1a output filter 63 has negative values: True

@dasabir
Copy link
Author

dasabir commented Dec 1, 2016

Update:

I could examine that it is the relu layer which is doing something wrong. I wrote two small minimal working examples (mwe) for this. The minimal working examples have just one conv layer and one relu layer. In the first mwe, I have used the relu layer from this repository. In the second mwe I have written a simple custom relu operation which is nothing but applying np.maximum(). I just wrote the forward pass for it. I'm attaching the files here. The first mwe is saved as 'mwe.py', the second one is 'mwe_custom_relu.py'. The two prototxts for these two mwes are 'mwe.prototxt' and 'mwe_custom_relu.prototxt' respectively. The custom relu code is 'myRelu.py'.
The two mwes read the conv layer output and the relu layer output. Then it compares (element by element) if taking max(conv1_output,0.0) gives the same values as the relu layer output. Running the codes shows that the caffe (from this repo) relu does not give them as same. The custom relu layer gives them as same. Further investigation shows the caffe relu just applies relu to the first 4 filters (out of 64).
I'm attaching the scripts and prototxts to reproduce. mwe.zip

To reproduce:

  1. Install/Make this (video-caffe) repo.
  2. Dowload the attached files from this post (link above) and copy them inside 'examples/c3d_ucf101'. The custom layer myRelu.py should also be placed inside the same folder.
  3. Download a pretrained model (I tested with the supplied one with this repo - which can be found here. I tested with a pretrained model so as not to change the weights between runs).
  4. Change the caffe_root path inside the mwes and also myRelu.py (Line 2) to point to the place where video_caffe resides.
  5. Run the two scripts to see the difference.

@chuckcho
Copy link
Owner

chuckcho commented Dec 1, 2016

Thanks a lot for the detailed issue report. Will look into it by end of the week.

@chuckcho
Copy link
Owner

chuckcho commented Dec 1, 2016

So, to summarize the issue, in-place relu looks to be an issue. If relu output is saved as a different name (as in conv1-relu), and fed to the next layer (like pool1 in this case), everything looks fine.

@chuckcho
Copy link
Owner

chuckcho commented Dec 1, 2016

For now, two quick work-arounds:

  1. not to use in-place ReLU (something like https://gist.github.com/chuckcho/7d78212896951f558407386dbf489089)
  2. non-CuDNN version of ReLU seems to be fine. so, use relu_param: {engine: CAFFE } for every ReLU layer, e.g.
layer {
  name: "relu1a"
  type: "ReLU"
  relu_param { engine: CAFFE }
  bottom: "conv1a"
  top: "conv1a"
}

@chuckcho chuckcho added the bug label Dec 1, 2016
@dasabir
Copy link
Author

dasabir commented Dec 1, 2016

@chuckcho Thanks for the work arounds. The second work around seems to work good. For the first one, I'm not sure. In mwe.prototxt I have done precisely that (like the first workaround). However, when I digged deeper I found that for the last 60 filters the ReLU operation is doing nothing. For the last 60 filters, the relu output is all zeros. You can reproduce it by adding the the following two lines of code at the end of mwe.py

for i in range(64):
  print "Filter {}: {} in relu ouput".format(i,\
                            "some non-zero values" if np.any(net.blobs['conv1a-relu'].data[0,i]) else "all zero values")

For inplace relu (CuDNN version) this means it just passes the conv output as it is (for the last 60 filters). But the CAFFE version seems good.

One relevant question (if you can answer or redirect it to Jimmy) - is the pretrained model supplied by Jimmy buggy then?

@ronghanghu
Copy link
Contributor

@chuckcho since you added a length dimension in video-Caffe, I think you should add it into CuDNNReLULayer<Dtype>::LayerSetUp in
https://github.com/chuckcho/video-caffe/blob/master/src/caffe/layers/cudnn_relu_layer.cpp#L24-L27

chuckcho pushed a commit that referenced this issue Dec 22, 2016
@chuckcho
Copy link
Owner

Sorry for the late reply. Thanks @ronghanghu for the insight. It did fix the problem, and is reflected in the recent commit: 9b635f2

@chuckcho
Copy link
Owner

chuckcho commented Dec 22, 2016

@dasabir, thanks again for the detailed bug report. please let me know if you can double-check this issue is gone with the latest change. i'll have this closed unless you see the problem persisting.

@dasabir
Copy link
Author

dasabir commented Dec 23, 2016

@chuckcho I checked with the latest commit. The issue is gone. Thanks!

@chuckcho
Copy link
Owner

@dasabir awesome! thanks for your contribution. :)

@dksakkos
Copy link

dksakkos commented Apr 6, 2018

I had the same issue after using ReLu in-place with a Deconvolution layer. When used with Convolutional layers it doesn't seem to be any problem, but using ReLu with a Deconv layer definitely produces zero output. I had to remove it in order to get meaningful results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants