Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when rerunning script #12

Open
varun-nagaraja opened this issue Sep 22, 2015 · 15 comments
Open

Error when rerunning script #12

varun-nagaraja opened this issue Sep 22, 2015 · 15 comments

Comments

@varun-nagaraja
Copy link

When I run script_faster_rcnn_demo for the first time after starting Matlab, things work fine. But if I re-run the same script after the first run, I get the following error

fast_rcnn startup done
GPU 1: free memory 11945885696
GPU 2: free memory 813449216
Use GPU 1
[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed: generated_database_->Add(encoded_file_descriptor, size):
Caught "std::exception" Exception message is:
CHECK failed: generated_database_->Add(encoded_file_descriptor, size):
@varun-nagaraja varun-nagaraja changed the title Error when we re-rerunning script Error when re-rerunning script Sep 22, 2015
@KapSteR
Copy link

KapSteR commented Sep 22, 2015

I get a similar error. Matlab simply shuts down when re-running the matlab demo.
Often a reboot is required to get it to run again.

@varun-nagaraja varun-nagaraja changed the title Error when re-rerunning script Error when rerunning script Sep 22, 2015
@varun-nagaraja
Copy link
Author

Yup, reboot is the only way for me to get it working again.

@ShaoqingRen
Copy link
Owner

@varun-nagaraja @KapSteR

I can't reproduce this bug on Windows. Ross also hasn't reported this bug on Ubuntu.

In the head for script_faster_rcnn_demo, we clear caffe mex (mexLock() is commented), so there should be any error thrown by caffe in the second calling.

I think we should make sure that the mex is cleared on your machine as expected.

@rbgirshick
Copy link
Collaborator

I can reproduce the error in linux. It's low priority since it just affects the demo script and not training or testing. To clarify comments in the thread: a "reboot" of the computer is not required, just a restart of matlab.

@KapSteR
Copy link

KapSteR commented Oct 6, 2015

So... It seems to my that there is somehow a GPU memory leak. The GPU memory usage grows linearly with every iteration of the main loop, until MATLAB crashes.

Is it wrong to assume that GPU memory usage is relatively constant with each forward pass, after "warm-up" ?

@kukuruza
Copy link

So is it a problem that mex doesn't clean up after itself correctly after all?

  1. For me on Linux, the free gpu memory before the 2nd run (4205486080) is 1MB less than before the first run (4206583808). That looks like a leek indeed.
  2. I also get a protobuf issue on the second run (Linux):
fast_rcnn startup done
GPU 1: free memory 4205486080
Use GPU 1

[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed: generated_database_->Add(encoded_file_descriptor, size): 

------------------------------------------------------------------------
          std::terminate() detected at Mon Oct 12 13:07:26 2015
------------------------------------------------------------------------

Configuration:
  Crash Decoding      : Disabled
  Crash Mode          : continue (default)
  Current Graphics Driver: Unknown software 
  Current Visual      : None
  Default Encoding    : UTF-8
  GNU C Library       : 2.19 stable
  Host Name           : ip-172-31-21-65
  MATLAB Architecture : glnxa64
  MATLAB Root         : /usr/local/MATLAB/R2015a
  MATLAB Version      : 8.5.0.197613 (R2015a)
  OpenGL              : software
  Operating System    : Linux 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64
  Processor ID        : x86 Family 6 Model 45 Stepping 7, GenuineIntel
  Virtual Machine     : Java 1.7.0_60-b19 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
  Window System       : No active display

Fault Count: 1

...
Stack Trace (captured):
[  0] 0x00007f53a6b6570e    /usr/local/MATLAB/R2015a/bin/glnxa64/libmwfl.so+00988942 _ZN2fl4diag5linux6x86_6412context_base12capture_dataEv+00000030
...
[ 12] 0x00007f52bd507c12 /usr/local/MATLAB/R2015a/bin/glnxa64/libprotobuf.so.8+00433170 _ZN6google8protobuf14DescriptorPool24InternalAddGeneratedFileEPKvi+00000194
[ 13] 0x00007f52bdc6c37c /home/ubuntu/src/faster_rcnn/external/caffe/matlab/+caffe/private/caffe_.mexa64+00443260
...

@BlueCrow1991
Copy link

This bug does not just affect the demo script, but also training and testing on Ubuntu.

When I re-run 'script_faster_rcnn_VOC2007_ZF.m', it happened too.

@YingjieYin
Copy link

When I run script_faster_rcnn_demo
errors in caffe_log:
F1028 15:47:12.852134 2204 syncedmem.cpp:51] Check failed: error == cudaSuccess (4 vs. 0) unspecified launch failure
F1028 15:47:12.852134 2204 syncedmem.cpp:51] Check failed: error == cudaSuccess (4 vs. 0) unspecified launch failure

@fengyuxi55
Copy link

I can reproduce this problem.When I re-running script_faster_rcnn_demo.m, matlab crash:

[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:1018] CHECK failed: generated_database_->Add(encoded_file_descriptor, size):
Caught "std::exception" Exception message is:
CHECK failed: generated_database_->Add(encoded_file_descriptor, size):

@corganhejijun
Copy link

BVLC/caffe#1917
is this problem the same as this Issue?

@roytseng-tw
Copy link

so how could I solve this problem?
I don't really understand. thx

@gjyin
Copy link

gjyin commented Jul 18, 2016

how to solve the problem?
I met the bug on Ubuntu 14.04
[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed: generated_database_->Add(encoded_file_descriptor, size):

@esason
Copy link

esason commented Sep 1, 2016

I have solved the last issue ...
THE BUG:
Bug on Ubuntu 14.04,
[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database:
caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed:
generated_database_->Add(encoded_file_descriptor, size):
in the first time I am running training or testing phase everything works fine at the first running,
but if the matlab is still on and I am trying to run it once again the bug occurs.

SOLUTION
it seems that it related to clear mex issues.

  • I comment the clear mex from m file
  • in the mex file I commented out the mexLock() function.

It seems to works ok. I would like to know why using the mex clear at all.

@ZiangYan
Copy link

ZiangYan commented Sep 5, 2016

I have encountered the same bug, and solved it by re-compiling opencv with out dnn module. I found that caffe, protobuf, opencv-dnn couldn't work together. It seems to be a bug in either protobuf or opencv.

There are two solutions:

  1. statically link to protobuf (i.e., link to protobuf.a, NOT protobuf.so)

OR

  1. remove opencv_contrib/modules/cnn, and re-compile opencv

@hongkaiyu2012
Copy link

Problem solved:
#112 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests