Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matlab system error when running script_faster_rcnn_VOC2007_ZF.m #129

Open
BUAAkong opened this issue Dec 7, 2016 · 33 comments
Open

Matlab system error when running script_faster_rcnn_VOC2007_ZF.m #129

BUAAkong opened this issue Dec 7, 2016 · 33 comments

Comments

@BUAAkong
Copy link

BUAAkong commented Dec 7, 2016

I tried to run script_faster_rcnn_VOC2007_ZF.m to train my own datasets,and my matlab crashed with the following crash report:
fast_rcnn startup done
GPU 1: free memory 2066997248
Use GPU 1
imdb (voc_2007_trainval): 9/20
Saving imdb to cache...done
Loading region proposals...done
Warrning: no windows proposal is loaded !
Saving roidb to cache...done
imdb (voc_2007_test): 1/10
Saving imdb to cache...done
Loading region proposals...done
Warrning: no windows proposal is loaded !
Saving roidb to cache...done
Cleared 0 solvers and 1 stand-alone nets


stage one proposal


conf:
batch_size: 256
bg_thresh_hi: 0.3000
bg_thresh_lo: 0
bg_weight: 1
drop_boxes_runoff_image: 1
feat_stride: 16
fg_fraction: 0.5000
fg_thresh: 0.7000
image_means: [224x224x3 single]
ims_per_batch: 1
max_size: 1000
rng_seed: 6
scales: 600
target_only_gt: 1
test_binary: 0
test_drop_boxes_runoff_image: 0
test_max_size: 1000
test_min_box_size: 16
test_nms: 0.3000
test_scales: 600
use_flipped: 1
use_gpu: 1
anchors: [9x4 double]
output_width_map: [901x1 containers.Map]
output_height_map: [901x1 containers.Map]

opts:
cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn'
conf: [1x1 struct]
do_val: 1
imdb_train: {[1x1 struct]}
imdb_val: [1x1 struct]
net_file: 'D:\Faster_RCNN\faster_rcnn-master\models\pre_trained_models\ZF\ZF.caffemodel'
roidb_train: {[1x1 struct]}
roidb_val: [1x1 struct]
snapshot_interval: 10000
solver_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\solver_60k80k.prototxt'
val_interval: 2000
val_iters: 1

Preparing training data...Starting parallel pool (parpool) using the 'local' profile ... connected to 2 workers.
Done.
Preparing validation data...Done.
Saved as D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_2007_trainval\iter_2000
Saved as D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_2007_trainval\final
Cleared 1 solvers and 0 stand-alone nets
opts:
cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn'
conf: [1x1 struct]
imdb: [1x1 struct]
net_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\test.prototxt'
net_file: 'D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_20...'
suffix: ''

conf:
batch_size: 256
bg_thresh_hi: 0.3000
bg_thresh_lo: 0
bg_weight: 1
drop_boxes_runoff_image: 1
feat_stride: 16
fg_fraction: 0.5000
fg_thresh: 0.7000
image_means: [224x224x3 single]
ims_per_batch: 1
max_size: 1000
rng_seed: 6
scales: 600
target_only_gt: 1
test_binary: 0
test_drop_boxes_runoff_image: 0
test_max_size: 1000
test_min_box_size: 16
test_nms: 0.3000
test_scales: 600
use_flipped: 1
use_gpu: 1
anchors: [9x4 double]
output_width_map: [901x1 containers.Map]
output_height_map: [901x1 containers.Map]

faster_rcnn-master: test (voc_2007_trainval) 1/20 time: 1.234s
faster_rcnn-master: test (voc_2007_trainval) 2/20 time: 0.830s
faster_rcnn-master: test (voc_2007_trainval) 3/20 time: 0.672s
faster_rcnn-master: test (voc_2007_trainval) 4/20 time: 0.665s
faster_rcnn-master: test (voc_2007_trainval) 5/20 time: 0.739s
faster_rcnn-master: test (voc_2007_trainval) 6/20 time: 0.740s
faster_rcnn-master: test (voc_2007_trainval) 7/20 time: 0.666s
faster_rcnn-master: test (voc_2007_trainval) 8/20 time: 0.666s
faster_rcnn-master: test (voc_2007_trainval) 9/20 time: 0.756s
faster_rcnn-master: test (voc_2007_trainval) 10/20 time: 0.755s
faster_rcnn-master: test (voc_2007_trainval) 11/20 time: 0.727s
faster_rcnn-master: test (voc_2007_trainval) 12/20 time: 0.725s
faster_rcnn-master: test (voc_2007_trainval) 13/20 time: 0.740s
faster_rcnn-master: test (voc_2007_trainval) 14/20 time: 0.739s
faster_rcnn-master: test (voc_2007_trainval) 15/20 time: 0.724s
faster_rcnn-master: test (voc_2007_trainval) 16/20 time: 0.723s
faster_rcnn-master: test (voc_2007_trainval) 17/20 time: 0.767s
faster_rcnn-master: test (voc_2007_trainval) 18/20 time: 0.767s
faster_rcnn-master: test (voc_2007_trainval) 19/20 time: 0.669s
faster_rcnn-master: test (voc_2007_trainval) 20/20 time: 0.669s
Cleared 0 solvers and 1 stand-alone nets
aver_boxes_num = 2731, select top 2000
opts:
cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn'
conf: [1x1 struct]
imdb: [1x1 struct]
net_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\test.prototxt'
net_file: 'D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_20...'
suffix: ''

conf:
batch_size: 256
bg_thresh_hi: 0.3000
bg_thresh_lo: 0
bg_weight: 1
drop_boxes_runoff_image: 1
feat_stride: 16
fg_fraction: 0.5000
fg_thresh: 0.7000
image_means: [224x224x3 single]
ims_per_batch: 1
max_size: 1000
rng_seed: 6
scales: 600
target_only_gt: 1
test_binary: 0
test_drop_boxes_runoff_image: 0
test_max_size: 1000
test_min_box_size: 16
test_nms: 0.3000
test_scales: 600
use_flipped: 1
use_gpu: 1
anchors: [9x4 double]
output_width_map: [901x1 containers.Map]
output_height_map: [901x1 containers.Map]

faster_rcnn-master: test (voc_2007_test) 1/10 time: 0.866s
faster_rcnn-master: test (voc_2007_test) 2/10 time: 0.961s
faster_rcnn-master: test (voc_2007_test) 3/10 time: 0.664s
faster_rcnn-master: test (voc_2007_test) 4/10 time: 0.659s
faster_rcnn-master: test (voc_2007_test) 5/10 time: 0.753s
faster_rcnn-master: test (voc_2007_test) 6/10 time: 0.899s
faster_rcnn-master: test (voc_2007_test) 7/10 time: 0.738s
faster_rcnn-master: test (voc_2007_test) 8/10 time: 0.750s
faster_rcnn-master: test (voc_2007_test) 9/10 time: 0.850s
faster_rcnn-master: test (voc_2007_test) 10/10 time: 0.821s
Cleared 0 solvers and 1 stand-alone nets
aver_boxes_num = 2695, select top 2000


stage one fast rcnn


conf:
batch_size: 128
bbox_thresh: 0.5000
bg_thresh_hi: 0.5000
bg_thresh_lo: 0.1000
fg_fraction: 0.2500
fg_thresh: 0.5000
image_means: [224x224x3 single]
ims_per_batch: 2
max_size: 1000
rng_seed: 6
scales: 600
test_binary: 0
test_max_size: 1000
test_nms: 0.3000
test_scales: 600
use_flipped: 1
use_gpu: 1

opts:
cache_name: 'faster_rcnn_VOC2007_ZF_top-1_nms0_7_top2000_stage1_fast_rcnn'
conf: [1x1 struct]
do_val: 1
imdb_train: {[1x1 struct]}
imdb_val: [1x1 struct]
net_file: 'D:\Faster_RCNN\faster_rcnn-master\models\pre_trained_models\ZF\ZF.caffemodel'
roidb_train: {[1x1 struct]}
roidb_val: [1x1 struct]
snapshot_interval: 10000
solver_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\fast_rcnn_prototxts\ZF\solver_30k40k.prototxt'
val_interval: 2000
val_iters: 1

Preparing training data...Done.
Preparing validation data...Done.
错误使用 caffe_
glog check error, please check log and clear mex

出错 caffe.Solver/step (line 56)
caffe_('solver_step', self.hSolver_self, iters);

出错 fast_rcnn_train>check_gpu_memory (line 216)
caffe_solver.step(1);

出错 fast_rcnn_train (line 89)
check_gpu_memory(conf, caffe_solver, num_classes, opts.do_val);

出错 Faster_RCNN_Train.do_fast_rcnn_train (line 7)
model_stage.output_model_file = fast_rcnn_train(conf, dataset.imdb_train, dataset.roidb_train, ...

出错 script_faster_rcnn_VOC2007_ZF (line 64)
model.stage1_fast_rcnn = Faster_RCNN_Train.do_fast_rcnn_train(conf_fast_rcnn, dataset, model.stage1_fast_rcnn,
opts.do_val);

IdleTimeout has been reached.
Parallel pool using the 'local' profile is shutting down.

Thanks for your help!

@oneQuery
Copy link

oneQuery commented Dec 8, 2016

@BUAAkong You need to show your log, which is in /output

@BUAAkong
Copy link
Author

BUAAkong commented Dec 9, 2016

@assess09 I have sent an e-mail to you with an attachment.

@oneQuery
Copy link

@BUAAkong I didn't receive your email. And I'm not sure I can solve your problem even if I check your log file.

@BUAAkong
Copy link
Author

@assess09 I made a mistake about the e-mail...Thanks for your attention and help!

@xzabg
Copy link

xzabg commented Dec 27, 2016

@BUAAkong We face the same mistake as yours, have you solved it? Thx!

@BUAAkong
Copy link
Author

@xzabg Maybe it's because the GPU's computing capability is too weak.Please read here:
https://github.com/ShaoqingRen/faster_rcnn#requirements-software

@xzabg
Copy link

xzabg commented Dec 28, 2016

@BUAAkong So, you change other GPU or GPUs with stronger capability? And the code can run normally?

@BUAAkong
Copy link
Author

@xzabg No,I am just going to change it. I heard that from a friend ,and he run the code successfully after updating the GPU.And have you read the web I share you?The code may need at least 3GB GPU memory for ZF net and 8GB GPU memory for VGG-16 net.

@xzabg
Copy link

xzabg commented Dec 28, 2016

@BUAAkong Yes, I saw it. And my configuration now is GTX1060 with cuda 8.0, how about you?

@xzabg
Copy link

xzabg commented Dec 28, 2016

@BUAAkong After you updating your GPU, if it is convenient, would you like to tell me the result, please?

@BUAAkong
Copy link
Author

@xzabg OK, but now it seems the work station in our laboratory is to be built after over one month later. And no GPU ,no training. Since I have not ever trained the net completely yet,I am not sure whether the issue really comes from GPU ‘S weak capability or not. Furthermore, I think GTX 1060‘s capability is enough to run faster rcnn(for ZF is enough but for VGG is not).

@xzabg
Copy link

xzabg commented Dec 29, 2016

@BUAAkong Yes, I also think GTX 1060 is enough for training ZF, but from the information from caffe_log, it seems that there's something wrong with the capability of GPU.
Part of the caffe_log:
I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer conv1
I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer relu1
I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer norm1
I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer pool1
I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer conv2
I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer relu2
I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer norm2
I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer pool2
I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer conv3
I1229 10:09:26.349350 6356 net.cpp:746] Copying source layer relu3
I1229 10:09:26.349350 6356 net.cpp:746] Copying source layer conv4
I1229 10:09:26.350352 6356 net.cpp:746] Copying source layer relu4
I1229 10:09:26.350352 6356 net.cpp:746] Copying source layer conv5
I1229 10:09:26.351356 6356 net.cpp:746] Copying source layer relu5
I1229 10:09:26.351356 6356 net.cpp:743] Ignoring source layer pool5_spm6
I1229 10:09:26.352356 6356 net.cpp:743] Ignoring source layer pool5_spm6_flatten
I1229 10:09:26.352356 6356 net.cpp:746] Copying source layer fc6
I1229 10:09:26.388463 6356 net.cpp:746] Copying source layer relu6
I1229 10:09:26.388463 6356 net.cpp:746] Copying source layer drop6
I1229 10:09:26.389463 6356 net.cpp:746] Copying source layer fc7
I1229 10:09:26.405477 6356 net.cpp:746] Copying source layer relu7
I1229 10:09:26.405477 6356 net.cpp:746] Copying source layer drop7
I1229 10:09:26.405477 6356 net.cpp:743] Ignoring source layer fc8
I1229 10:09:26.405477 6356 net.cpp:743] Ignoring source layer prob
F1229 10:09:59.980269 6356 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory
F1229 10:09:59.980269 6356 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory

@BUAAkong
Copy link
Author

@xzabg Sorry, I cannot explain it,either.Something else should be wrong.

@xzabg
Copy link

xzabg commented Dec 29, 2016

@BUAAkong That's fine.
I1229 11:17:24.730571 13420 net.cpp:743] Ignoring source layer fc8
I1229 11:17:24.730571 13420 net.cpp:743] Ignoring source layer prob
I1229 11:17:57.716583 13420 solver.cpp:214] Iteration 0, loss = 3.04357
I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #0: accuarcy = 0
I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #1: loss_bbox = 0 (* 1 = 0 loss)
I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #2: loss_cls = 3.04357 (* 1 = 3.04357 loss)
I1229 11:17:57.716583 13420 solver.cpp:486] Iteration 0, lr = 0.001
F1229 11:17:57.719590 13420 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory
F1229 11:17:57.719590 13420 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory

It seems that the training code can run, but the memory is not enough and I'll try to change some parameters. Let's keep in touch and maybe we'll find something else.

@BUAAkong
Copy link
Author

@xzabg With pleasure.email:18811736851@163.com

@YilunYang
Copy link

@xzabg @BUAAkong Did you guys solve this problem in the end? I met the same as you. Does it work changing the parameters?

@LEXUSAPI
Copy link

Using GTX`1080 ALso occur the error! like the fllowing status.Preparing training data...Done.
Preparing validation data...Done.
错误使用 caffe_
glog check error, please check log and clear mex

出错 caffe.Solver/step (line 56)
caffe_('solver_step', self.hSolver_self, iters);

出错 fast_rcnn_train>check_gpu_memory (line 216)
caffe_solver.step(1);

出错 fast_rcnn_train (line 89)
check_gpu_memory(conf, caffe_solver, num_classes, opts.do_val);

出错 Faster_RCNN_Train.do_fast_rcnn_train (line 7)
model_stage.output_model_file = fast_rcnn_train(conf, dataset.imdb_train, dataset.roidb_train, ...

出错 script_faster_rcnn_VOC2007_ZF (line 53)
model.stage1_fast_rcnn = Faster_RCNN_Train.do_fast_rcnn_train(conf_fast_rcnn, dataset, model.stage1_fast_rcnn, opts.do_val);

@BUAAkong
Copy link
Author

@LEXUSAPI 你用的是cuda7.5 还是cuda8.0 ?

@LEXUSAPI
Copy link

@BUAAkong i had solve the problem ,all the wrong is happend in caffe vision !

@ggghh
Copy link

ggghh commented Jul 2, 2018

@LEXUSAPI how solve your problem i have same problem ?? can you explain how change caffe vision?

@qwertyDvo
Copy link

Did you solve this problem? I have same problem please help

@BUAAkong
Copy link
Author

@qwertyDvo What is your GPU version and cuda version ?

@BUAAkong
Copy link
Author

@qwertyDvo my email : wenshangkf@163.com

@qwertyDvo
Copy link

GPU is GTX 1070 8GB and I use 6.5 cuda for faster rcnn

@qwertyDvo
Copy link

What shall I send you?

@BUAAkong
Copy link
Author

@qwertyDvo Maybe you can update the cuda version to 8.0 and try it again. And the email is for that I cannot always receive your reply without delay.

@qwertyDvo
Copy link

Ok thank you. Did you solve this problem by using cuda 8.0?

@BUAAkong
Copy link
Author

@qwertyDvo Actually I cannot be sure if it is effective, but since I used the combination of gtx 1080 gpu and cuda 8.0 , such issue has never appeared.

@qwertyDvo
Copy link

Ok thank you I will try

@qwertyDvo
Copy link

qwertyDvo commented Oct 13, 2018

@BUAAkong Once I tried to use cuda 9.1 I got this error:
Missing dependent shared libraries: 'cudart64_91.dll' required by nms_gpu_mex.mexw64.

@BUAAkong
Copy link
Author

@qwertyDvo how about cuda8 ?

@qwertyDvo
Copy link

qwertyDvo commented Oct 13, 2018

@BUAAkong Failed to install it

@BUAAkong
Copy link
Author

@qwertyDvo I cannot solve the 'cudart64_91.dll' issue, maybe you can google it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants