Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task failed with error code -11 #1878

Open
MonkeyWithAComputer opened this issue Nov 9, 2017 · 11 comments
Open

Task failed with error code -11 #1878

MonkeyWithAComputer opened this issue Nov 9, 2017 · 11 comments

Comments

@MonkeyWithAComputer
Copy link

Using Latest Versions(as of 11-9-17) of Digits and NVCaffe and Opencv 3.2.1. I am following the object detection guide here: https://github.com/NVIDIA/DIGITS/tree/master/examples/object-detection
When I create the model I get error code -11 after 2 seconds of running. Ive tried reinstalling different versions of OpenCV, NVCaffe, and DIGITS.

This is the DIGITS GUI output:
Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT
Created Layer bbox_loss (172)
bbox_loss <- bboxes-obj-masked-norm
bbox_loss <- bbox-obj-label-norm
bbox_loss -> loss_bbox
Setting up bbox_loss
TEST Top shape for layer 172 'bbox_loss' (1)
with loss weight 2
Creating layer 'coverage_loss' of type 'EuclideanLoss'
Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT
Created Layer coverage_loss (173)
coverage_loss <- coverage_coverage/sig_0_split_0
coverage_loss <- coverage-label_slice-label_4_split_0
coverage_loss -> loss_coverage
Setting up coverage_loss
TEST Top shape for layer 173 'coverage_loss' (1)
with loss weight 1
Creating layer 'cluster' of type 'Python'
Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT
Importing Python module 'caffe.layers.detectnet.clustering'

The following error is printed in DIGITS console:
2017-11-09 11:58:06 [20171109-115804-1668] [INFO ] Task subprocess args: "/home/dev/caffe/build/tools/caffe train --solver=/home/dev/DIGITS/digits/jobs/20171109-115804-1668/solver.prototxt --gpu=0,1 --weights=/home/dev/bvlc_googlenet.caffemodel"
2017-11-09 11:58:08 [20171109-115804-1668] [ERROR] Train Caffe Model task failed with error code -11

The Following are the last couple lines of the Caffe log:
I1109 11:36:37.703176 30569 net.cpp:182] Created Layer bbox-obj-norm (171)
I1109 11:36:37.703177 30569 net.cpp:559] bbox-obj-norm <- bboxes-masked-norm
I1109 11:36:37.703179 30569 net.cpp:559] bbox-obj-norm <- obj-block_obj-block_0_split_1
I1109 11:36:37.703182 30569 net.cpp:528] bbox-obj-norm -> bboxes-obj-masked-norm
I1109 11:36:37.703200 30569 net.cpp:243] Setting up bbox-obj-norm
I1109 11:36:37.703203 30569 net.cpp:250] TEST Top shape for layer 171 'bbox-obj-norm' 2 4 24 78 (14976)
I1109 11:36:37.703204 30569 layer_factory.hpp:136] Creating layer 'bbox_loss' of type 'L1Loss'
I1109 11:36:37.703213 30569 layer_factory.hpp:148] Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT
I1109 11:36:37.703222 30569 net.cpp:182] Created Layer bbox_loss (172)
I1109 11:36:37.703223 30569 net.cpp:559] bbox_loss <- bboxes-obj-masked-norm
I1109 11:36:37.703225 30569 net.cpp:559] bbox_loss <- bbox-obj-label-norm
I1109 11:36:37.703228 30569 net.cpp:528] bbox_loss -> loss_bbox
I1109 11:36:37.705127 30569 net.cpp:243] Setting up bbox_loss
I1109 11:36:37.705132 30569 net.cpp:250] TEST Top shape for layer 172 'bbox_loss' (1)
I1109 11:36:37.705133 30569 net.cpp:254] with loss weight 2
I1109 11:36:37.705142 30569 layer_factory.hpp:136] Creating layer 'coverage_loss' of type 'EuclideanLoss'
I1109 11:36:37.705145 30569 layer_factory.hpp:148] Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT
I1109 11:36:37.705149 30569 net.cpp:182] Created Layer coverage_loss (173)
I1109 11:36:37.705152 30569 net.cpp:559] coverage_loss <- coverage_coverage/sig_0_split_0
I1109 11:36:37.705154 30569 net.cpp:559] coverage_loss <- coverage-label_slice-label_4_split_0
I1109 11:36:37.705157 30569 net.cpp:528] coverage_loss -> loss_coverage
I1109 11:36:37.707231 30569 net.cpp:243] Setting up coverage_loss
I1109 11:36:37.707234 30569 net.cpp:250] TEST Top shape for layer 173 'coverage_loss' (1)
I1109 11:36:37.707237 30569 net.cpp:254] with loss weight 1
I1109 11:36:37.707240 30569 layer_factory.hpp:136] Creating layer 'cluster' of type 'Python'
I1109 11:36:37.707242 30569 layer_factory.hpp:148] Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT
I1109 11:36:37.707250 30569 layer_factory.cpp:325] Importing Python module 'caffe.layers.detectnet.clustering'
*** Aborted at 1510256198 (unix time) try "date -d @1510256198" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 30569 (TID 0x7fa6a66ce8c0) from PID 0; stack trace: ***
@ 0x7fa6a38254b0 (unknown)
@ 0x0 (unknown)

screenshot from 2017-11-09 12-11-28

@Extrunder
Copy link

Extrunder commented Dec 30, 2017

Did you found a solution? I have same issue.

Ubuntu: 16.04
OpenCV: 2.xxx
protobuf: 3.5.1
DIGITS: 6.1.0
NVCaffe: 0.16.4

Error occur while I following tutorial and train DetectNet model from KITTI db.

@Extrunder
Copy link

Fixed by reinstalling python binding to protobuf.

@JadBatmobile
Copy link

you mean go to /protobuf/python and run "python setup.py install --cpp_implementation" correct>?

@mantianwuming
Copy link

@Extrunder Sorry trouble you, but can you tell some details about how to "reinstalling python binding to protobuf"? I met the same problem and I can't solve it. Thank you very much

@ethantang95
Copy link
Contributor

pip install protobuf3 or something like that... though I am not sure the stability of using protobuf 3.5 with DIGITS as I built it with protobuf 3.2

@mantianwuming
Copy link

@ethantang95 Thank you! I try as you say and ... it may have some problems.just like this:
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-w7QVgY/protobuf3/setup.py", line 22
print(version, file=open('.version', mode='w'))

@galdalali
Copy link

@mantianwuming, did you found a solution? I have same issue.

@j0ebl4ck
Copy link

Can anyone please EXPLAIN the proper solution?

@MonkeyWithAComputer
Copy link
Author

You can try checking that none of your images have more than about 600(it might be 1000, you'd have to look at the code) objects labeled.

@mantianwuming
Copy link

Actually I solve it by just use NVCAFFE 0.15. That NVCAFFE 0.17 has some problem. Also I try protobuf3 but although it said success, the protobuf is 2.6 indeed.
So I think you can try with 0.15 first.

@galdalali
Copy link

Hi @mantianwuming, Please see my discussion here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants