Matlab system error when running script_faster_rcnn_VOC2007_ZF.m #129

BUAAkong · 2016-12-07T07:42:34Z

I tried to run script_faster_rcnn_VOC2007_ZF.m to train my own datasets,and my matlab crashed with the following crash report:
fast_rcnn startup done
GPU 1: free memory 2066997248
Use GPU 1
imdb (voc_2007_trainval): 9/20
Saving imdb to cache...done
Loading region proposals...done
Warrning: no windows proposal is loaded !
Saving roidb to cache...done
imdb (voc_2007_test): 1/10
Saving imdb to cache...done
Loading region proposals...done
Warrning: no windows proposal is loaded !
Saving roidb to cache...done
Cleared 0 solvers and 1 stand-alone nets

stage one proposal

conf:
batch_size: 256
bg_thresh_hi: 0.3000
bg_thresh_lo: 0
bg_weight: 1
drop_boxes_runoff_image: 1
feat_stride: 16
fg_fraction: 0.5000
fg_thresh: 0.7000
image_means: [224x224x3 single]
ims_per_batch: 1
max_size: 1000
rng_seed: 6
scales: 600
target_only_gt: 1
test_binary: 0
test_drop_boxes_runoff_image: 0
test_max_size: 1000
test_min_box_size: 16
test_nms: 0.3000
test_scales: 600
use_flipped: 1
use_gpu: 1
anchors: [9x4 double]
output_width_map: [901x1 containers.Map]
output_height_map: [901x1 containers.Map]

opts:
cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn'
conf: [1x1 struct]
do_val: 1
imdb_train: {[1x1 struct]}
imdb_val: [1x1 struct]
net_file: 'D:\Faster_RCNN\faster_rcnn-master\models\pre_trained_models\ZF\ZF.caffemodel'
roidb_train: {[1x1 struct]}
roidb_val: [1x1 struct]
snapshot_interval: 10000
solver_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\solver_60k80k.prototxt'
val_interval: 2000
val_iters: 1

Preparing training data...Starting parallel pool (parpool) using the 'local' profile ... connected to 2 workers.
Done.
Preparing validation data...Done.
Saved as D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_2007_trainval\iter_2000
Saved as D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_2007_trainval\final
Cleared 1 solvers and 0 stand-alone nets
opts:
cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn'
conf: [1x1 struct]
imdb: [1x1 struct]
net_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\test.prototxt'
net_file: 'D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_20...'
suffix: ''

conf:
batch_size: 256
bg_thresh_hi: 0.3000
bg_thresh_lo: 0
bg_weight: 1
drop_boxes_runoff_image: 1
feat_stride: 16
fg_fraction: 0.5000
fg_thresh: 0.7000
image_means: [224x224x3 single]
ims_per_batch: 1
max_size: 1000
rng_seed: 6
scales: 600
target_only_gt: 1
test_binary: 0
test_drop_boxes_runoff_image: 0
test_max_size: 1000
test_min_box_size: 16
test_nms: 0.3000
test_scales: 600
use_flipped: 1
use_gpu: 1
anchors: [9x4 double]
output_width_map: [901x1 containers.Map]
output_height_map: [901x1 containers.Map]

faster_rcnn-master: test (voc_2007_trainval) 1/20 time: 1.234s
faster_rcnn-master: test (voc_2007_trainval) 2/20 time: 0.830s
faster_rcnn-master: test (voc_2007_trainval) 3/20 time: 0.672s
faster_rcnn-master: test (voc_2007_trainval) 4/20 time: 0.665s
faster_rcnn-master: test (voc_2007_trainval) 5/20 time: 0.739s
faster_rcnn-master: test (voc_2007_trainval) 6/20 time: 0.740s
faster_rcnn-master: test (voc_2007_trainval) 7/20 time: 0.666s
faster_rcnn-master: test (voc_2007_trainval) 8/20 time: 0.666s
faster_rcnn-master: test (voc_2007_trainval) 9/20 time: 0.756s
faster_rcnn-master: test (voc_2007_trainval) 10/20 time: 0.755s
faster_rcnn-master: test (voc_2007_trainval) 11/20 time: 0.727s
faster_rcnn-master: test (voc_2007_trainval) 12/20 time: 0.725s
faster_rcnn-master: test (voc_2007_trainval) 13/20 time: 0.740s
faster_rcnn-master: test (voc_2007_trainval) 14/20 time: 0.739s
faster_rcnn-master: test (voc_2007_trainval) 15/20 time: 0.724s
faster_rcnn-master: test (voc_2007_trainval) 16/20 time: 0.723s
faster_rcnn-master: test (voc_2007_trainval) 17/20 time: 0.767s
faster_rcnn-master: test (voc_2007_trainval) 18/20 time: 0.767s
faster_rcnn-master: test (voc_2007_trainval) 19/20 time: 0.669s
faster_rcnn-master: test (voc_2007_trainval) 20/20 time: 0.669s
Cleared 0 solvers and 1 stand-alone nets
aver_boxes_num = 2731, select top 2000
opts:
cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn'
conf: [1x1 struct]
imdb: [1x1 struct]
net_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\test.prototxt'
net_file: 'D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_20...'
suffix: ''

conf:
batch_size: 256
bg_thresh_hi: 0.3000
bg_thresh_lo: 0
bg_weight: 1
drop_boxes_runoff_image: 1
feat_stride: 16
fg_fraction: 0.5000
fg_thresh: 0.7000
image_means: [224x224x3 single]
ims_per_batch: 1
max_size: 1000
rng_seed: 6
scales: 600
target_only_gt: 1
test_binary: 0
test_drop_boxes_runoff_image: 0
test_max_size: 1000
test_min_box_size: 16
test_nms: 0.3000
test_scales: 600
use_flipped: 1
use_gpu: 1
anchors: [9x4 double]
output_width_map: [901x1 containers.Map]
output_height_map: [901x1 containers.Map]

faster_rcnn-master: test (voc_2007_test) 1/10 time: 0.866s
faster_rcnn-master: test (voc_2007_test) 2/10 time: 0.961s
faster_rcnn-master: test (voc_2007_test) 3/10 time: 0.664s
faster_rcnn-master: test (voc_2007_test) 4/10 time: 0.659s
faster_rcnn-master: test (voc_2007_test) 5/10 time: 0.753s
faster_rcnn-master: test (voc_2007_test) 6/10 time: 0.899s
faster_rcnn-master: test (voc_2007_test) 7/10 time: 0.738s
faster_rcnn-master: test (voc_2007_test) 8/10 time: 0.750s
faster_rcnn-master: test (voc_2007_test) 9/10 time: 0.850s
faster_rcnn-master: test (voc_2007_test) 10/10 time: 0.821s
Cleared 0 solvers and 1 stand-alone nets
aver_boxes_num = 2695, select top 2000

stage one fast rcnn

conf:
batch_size: 128
bbox_thresh: 0.5000
bg_thresh_hi: 0.5000
bg_thresh_lo: 0.1000
fg_fraction: 0.2500
fg_thresh: 0.5000
image_means: [224x224x3 single]
ims_per_batch: 2
max_size: 1000
rng_seed: 6
scales: 600
test_binary: 0
test_max_size: 1000
test_nms: 0.3000
test_scales: 600
use_flipped: 1
use_gpu: 1

opts:
cache_name: 'faster_rcnn_VOC2007_ZF_top-1_nms0_7_top2000_stage1_fast_rcnn'
conf: [1x1 struct]
do_val: 1
imdb_train: {[1x1 struct]}
imdb_val: [1x1 struct]
net_file: 'D:\Faster_RCNN\faster_rcnn-master\models\pre_trained_models\ZF\ZF.caffemodel'
roidb_train: {[1x1 struct]}
roidb_val: [1x1 struct]
snapshot_interval: 10000
solver_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\fast_rcnn_prototxts\ZF\solver_30k40k.prototxt'
val_interval: 2000
val_iters: 1

Preparing training data...Done.
Preparing validation data...Done.
错误使用 caffe_
glog check error, please check log and clear mex

出错 caffe.Solver/step (line 56)
caffe_('solver_step', self.hSolver_self, iters);

出错 fast_rcnn_train>check_gpu_memory (line 216)
caffe_solver.step(1);

出错 fast_rcnn_train (line 89)
check_gpu_memory(conf, caffe_solver, num_classes, opts.do_val);

出错 Faster_RCNN_Train.do_fast_rcnn_train (line 7)
model_stage.output_model_file = fast_rcnn_train(conf, dataset.imdb_train, dataset.roidb_train, ...

出错 script_faster_rcnn_VOC2007_ZF (line 64)
model.stage1_fast_rcnn = Faster_RCNN_Train.do_fast_rcnn_train(conf_fast_rcnn, dataset, model.stage1_fast_rcnn,
opts.do_val);

IdleTimeout has been reached.
Parallel pool using the 'local' profile is shutting down.

Thanks for your help!

oneQuery · 2016-12-08T15:50:20Z

@BUAAkong You need to show your log, which is in /output

BUAAkong · 2016-12-09T14:44:37Z

@assess09 I have sent an e-mail to you with an attachment.

oneQuery · 2016-12-10T21:14:43Z

@BUAAkong I didn't receive your email. And I'm not sure I can solve your problem even if I check your log file.

BUAAkong · 2016-12-11T00:36:18Z

@assess09 I made a mistake about the e-mail...Thanks for your attention and help!

xzabg · 2016-12-27T11:34:04Z

@BUAAkong We face the same mistake as yours, have you solved it? Thx!

BUAAkong · 2016-12-27T11:46:02Z

@xzabg Maybe it's because the GPU's computing capability is too weak.Please read here:
https://github.com/ShaoqingRen/faster_rcnn#requirements-software

xzabg · 2016-12-28T02:43:54Z

@BUAAkong So, you change other GPU or GPUs with stronger capability? And the code can run normally?

BUAAkong · 2016-12-28T03:10:31Z

@xzabg No，I am just going to change it. I heard that from a friend ,and he run the code successfully after updating the GPU.And have you read the web I share you?The code may need at least 3GB GPU memory for ZF net and 8GB GPU memory for VGG-16 net.

xzabg · 2016-12-28T06:53:05Z

@BUAAkong Yes, I saw it. And my configuration now is GTX1060 with cuda 8.0, how about you?

xzabg · 2016-12-28T07:11:21Z

@BUAAkong After you updating your GPU, if it is convenient, would you like to tell me the result, please?

BUAAkong · 2016-12-28T08:35:41Z

@xzabg OK, but now it seems the work station in our laboratory is to be built after over one month later. And no GPU ,no training. Since I have not ever trained the net completely yet，I am not sure whether the issue really comes from GPU ‘S weak capability or not. Furthermore, I think GTX 1060‘s capability is enough to run faster rcnn(for ZF is enough but for VGG is not).

xzabg · 2016-12-29T02:30:22Z

@BUAAkong Yes, I also think GTX 1060 is enough for training ZF, but from the information from caffe_log, it seems that there's something wrong with the capability of GPU.
Part of the caffe_log:
I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer conv1
I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer relu1
I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer norm1
I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer pool1
I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer conv2
I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer relu2
I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer norm2
I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer pool2
I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer conv3
I1229 10:09:26.349350 6356 net.cpp:746] Copying source layer relu3
I1229 10:09:26.349350 6356 net.cpp:746] Copying source layer conv4
I1229 10:09:26.350352 6356 net.cpp:746] Copying source layer relu4
I1229 10:09:26.350352 6356 net.cpp:746] Copying source layer conv5
I1229 10:09:26.351356 6356 net.cpp:746] Copying source layer relu5
I1229 10:09:26.351356 6356 net.cpp:743] Ignoring source layer pool5_spm6
I1229 10:09:26.352356 6356 net.cpp:743] Ignoring source layer pool5_spm6_flatten
I1229 10:09:26.352356 6356 net.cpp:746] Copying source layer fc6
I1229 10:09:26.388463 6356 net.cpp:746] Copying source layer relu6
I1229 10:09:26.388463 6356 net.cpp:746] Copying source layer drop6
I1229 10:09:26.389463 6356 net.cpp:746] Copying source layer fc7
I1229 10:09:26.405477 6356 net.cpp:746] Copying source layer relu7
I1229 10:09:26.405477 6356 net.cpp:746] Copying source layer drop7
I1229 10:09:26.405477 6356 net.cpp:743] Ignoring source layer fc8
I1229 10:09:26.405477 6356 net.cpp:743] Ignoring source layer prob
F1229 10:09:59.980269 6356 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory
F1229 10:09:59.980269 6356 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory

BUAAkong · 2016-12-29T02:34:17Z

@xzabg Sorry, I cannot explain it,either.Something else should be wrong.

xzabg · 2016-12-29T03:26:57Z

@BUAAkong That's fine.
I1229 11:17:24.730571 13420 net.cpp:743] Ignoring source layer fc8
I1229 11:17:24.730571 13420 net.cpp:743] Ignoring source layer prob
I1229 11:17:57.716583 13420 solver.cpp:214] Iteration 0, loss = 3.04357
I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #0: accuarcy = 0
I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #1: loss_bbox = 0 (* 1 = 0 loss)
I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #2: loss_cls = 3.04357 (* 1 = 3.04357 loss)
I1229 11:17:57.716583 13420 solver.cpp:486] Iteration 0, lr = 0.001
F1229 11:17:57.719590 13420 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory
F1229 11:17:57.719590 13420 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory

It seems that the training code can run, but the memory is not enough and I'll try to change some parameters. Let's keep in touch and maybe we'll find something else.

BUAAkong · 2016-12-29T03:49:51Z

@xzabg With pleasure.email:18811736851@163.com

YilunYang · 2017-02-17T00:32:28Z

@xzabg @BUAAkong Did you guys solve this problem in the end? I met the same as you. Does it work changing the parameters?

LEXUSAPI · 2017-04-14T07:49:06Z

Using GTX`1080 ALso occur the error! like the fllowing status.Preparing training data...Done.
Preparing validation data...Done.
错误使用 caffe_
glog check error, please check log and clear mex

出错 caffe.Solver/step (line 56)
caffe_('solver_step', self.hSolver_self, iters);

出错 fast_rcnn_train>check_gpu_memory (line 216)
caffe_solver.step(1);

出错 fast_rcnn_train (line 89)
check_gpu_memory(conf, caffe_solver, num_classes, opts.do_val);

出错 Faster_RCNN_Train.do_fast_rcnn_train (line 7)
model_stage.output_model_file = fast_rcnn_train(conf, dataset.imdb_train, dataset.roidb_train, ...

出错 script_faster_rcnn_VOC2007_ZF (line 53)
model.stage1_fast_rcnn = Faster_RCNN_Train.do_fast_rcnn_train(conf_fast_rcnn, dataset, model.stage1_fast_rcnn, opts.do_val);

BUAAkong · 2017-04-14T12:11:59Z

@LEXUSAPI 你用的是cuda7.5 还是cuda8.0 ?

LEXUSAPI · 2017-04-27T02:04:25Z

@BUAAkong i had solve the problem ,all the wrong is happend in caffe vision !

ggghh · 2018-07-02T15:52:29Z

@LEXUSAPI how solve your problem i have same problem ?? can you explain how change caffe vision?

qwertyDvo · 2018-10-13T02:02:06Z

Did you solve this problem? I have same problem please help

BUAAkong · 2018-10-13T02:12:47Z

@qwertyDvo What is your GPU version and cuda version ?

BUAAkong · 2018-10-13T02:15:53Z

@qwertyDvo my email : wenshangkf@163.com

qwertyDvo · 2018-10-13T02:51:14Z

GPU is GTX 1070 8GB and I use 6.5 cuda for faster rcnn

qwertyDvo · 2018-10-13T02:51:35Z

What shall I send you?

BUAAkong · 2018-10-13T03:05:27Z

@qwertyDvo Maybe you can update the cuda version to 8.0 and try it again. And the email is for that I cannot always receive your reply without delay.

qwertyDvo · 2018-10-13T03:08:50Z

Ok thank you. Did you solve this problem by using cuda 8.0?

BUAAkong · 2018-10-13T03:14:20Z

@qwertyDvo Actually I cannot be sure if it is effective, but since I used the combination of gtx 1080 gpu and cuda 8.0 , such issue has never appeared.

qwertyDvo · 2018-10-13T03:16:25Z

Ok thank you I will try

qwertyDvo · 2018-10-13T13:15:56Z

@BUAAkong Once I tried to use cuda 9.1 I got this error:
Missing dependent shared libraries: 'cudart64_91.dll' required by nms_gpu_mex.mexw64.

BUAAkong · 2018-10-13T13:21:09Z

@qwertyDvo how about cuda8 ?

qwertyDvo · 2018-10-13T14:03:00Z

@BUAAkong Failed to install it

BUAAkong · 2018-10-13T14:06:52Z

@qwertyDvo I cannot solve the 'cudart64_91.dll' issue, maybe you can google it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matlab system error when running script_faster_rcnn_VOC2007_ZF.m #129

Matlab system error when running script_faster_rcnn_VOC2007_ZF.m #129

BUAAkong commented Dec 7, 2016

oneQuery commented Dec 8, 2016

BUAAkong commented Dec 9, 2016

oneQuery commented Dec 10, 2016

BUAAkong commented Dec 11, 2016

xzabg commented Dec 27, 2016

BUAAkong commented Dec 27, 2016

xzabg commented Dec 28, 2016

BUAAkong commented Dec 28, 2016

xzabg commented Dec 28, 2016

xzabg commented Dec 28, 2016

BUAAkong commented Dec 28, 2016

xzabg commented Dec 29, 2016

BUAAkong commented Dec 29, 2016

xzabg commented Dec 29, 2016

BUAAkong commented Dec 29, 2016

YilunYang commented Feb 17, 2017

LEXUSAPI commented Apr 14, 2017

BUAAkong commented Apr 14, 2017

LEXUSAPI commented Apr 27, 2017

ggghh commented Jul 2, 2018

qwertyDvo commented Oct 13, 2018

BUAAkong commented Oct 13, 2018

BUAAkong commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018

BUAAkong commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018

BUAAkong commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018 •

edited

BUAAkong commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018 •

edited

BUAAkong commented Oct 13, 2018

Matlab system error when running script_faster_rcnn_VOC2007_ZF.m #129

Matlab system error when running script_faster_rcnn_VOC2007_ZF.m #129

Comments

BUAAkong commented Dec 7, 2016

oneQuery commented Dec 8, 2016

BUAAkong commented Dec 9, 2016

oneQuery commented Dec 10, 2016

BUAAkong commented Dec 11, 2016

xzabg commented Dec 27, 2016

BUAAkong commented Dec 27, 2016

xzabg commented Dec 28, 2016

BUAAkong commented Dec 28, 2016

xzabg commented Dec 28, 2016

xzabg commented Dec 28, 2016

BUAAkong commented Dec 28, 2016

xzabg commented Dec 29, 2016

BUAAkong commented Dec 29, 2016

xzabg commented Dec 29, 2016

BUAAkong commented Dec 29, 2016

YilunYang commented Feb 17, 2017

LEXUSAPI commented Apr 14, 2017

BUAAkong commented Apr 14, 2017

LEXUSAPI commented Apr 27, 2017

ggghh commented Jul 2, 2018

qwertyDvo commented Oct 13, 2018

BUAAkong commented Oct 13, 2018

BUAAkong commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018

BUAAkong commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018

BUAAkong commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018 • edited

BUAAkong commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018 • edited

BUAAkong commented Oct 13, 2018

qwertyDvo commented Oct 13, 2018 •

edited

qwertyDvo commented Oct 13, 2018 •

edited