Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory #26

Open
dingjietao opened this issue Sep 11, 2021 · 0 comments
Open

CUDA out of memory #26

dingjietao opened this issue Sep 11, 2021 · 0 comments

Comments

@dingjietao
Copy link

when i run, i got "RuntimeError: CUDA out of memory."i don't know how to modify.
gpu0: 12G; gpu1: 12G
my command is "bash experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16" or
"bash experiments/scripts/train_faster_rcnn.sh 0,1 pascal_voc vgg16". The two commands both caused "RuntimeError: CUDA out of memory"
And i modified vgg16.yml. TRAIN.BATCH_SIZE : 256 --> 2
Running Logs:

  • set -e
  • export PYTHONUNBUFFERED=True
  • PYTHONUNBUFFERED=True
  • GPU_ID=0
  • DATASET=pascal_voc
  • NET=vgg16
  • array=($@)
  • len=3
  • EXTRA_ARGS=
  • EXTRA_ARGS_SLUG=
  • case ${DATASET} in
  • TRAIN_IMDB=voc_2007_trainval
  • TEST_IMDB=voc_2007_test
  • STEPSIZE='[50000]'
  • ITERS=100000
  • ANCHORS='[8,16,32]'
  • RATIOS='[0.5,1,2]'
    ++ date +%Y-%m-%d_%H-%M-%S
  • LOG=experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2021-09-11_15-11-21
  • exec
    ++ tee -a experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2021-09-11_15-11-21
    tee: experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2021-09-11_15-11-21: No such file or directory
  • echo Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2021-09-11_15-11-21
    Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2021-09-11_15-11-21
  • set +x
  • '[' '!' -f output/vgg16/voc_2007_trainval/default/vgg16_MELM_iter_100000.pth.index ']'
  • [[ ! -z '' ]]
  • CUDA_VISIBLE_DEVICES=0
  • python ./tools/trainval_net.py --weight data/imagenet_weights/vgg16.pth --imdb voc_2007_trainval --imdbval voc_2007_test --iters 100000 --cfg experiments/cfgs/vgg16.yml --net vgg16 --set ANCHOR_SCALES '[8,16,32]' ANCHOR_RATIOS '[0.5,1,2]' TRAIN.STEPSIZE '[50000]'
    Called with args:
    Namespace(cfg_file='experiments/cfgs/vgg16.yml', imdb_name='voc_2007_trainval', imdbval_name='voc_2007_test', max_iters=100000, net='vgg16', set_cfgs=['ANCHOR_SCALES', '[8,16,32]', 'ANCHOR_RATIOS', '[0.5,1,2]', 'TRAIN.STEPSIZE', '[50000]'], tag=None, weight='data/imagenet_weights/vgg16.pth')
    /media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/model/config.py:369: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
    yaml_cfg = edict(yaml.load(f))
    Loaded dataset voc_2007_trainval for training
    Set proposal method: selective_search
    Appending horizontally-flipped training examples...
    voc_2007_trainval ss roidb loaded from /media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/data/cache/voc_2007_trainval_selective_search_roidb.pkl
    done
    Preparing training data...
    done
    10022 roidb entries
    Output will be saved to /media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/output/vgg16_MELM/voc_2007_trainval/default
    TensorFlow summaries will be saved to /media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tensorboard/vgg16_MELM/voc_2007_trainval/default
    Loaded dataset voc_2007_test for training
    Set proposal method: selective_search
    Preparing training data...
    voc_2007_test ss roidb loaded from /media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/data/cache/voc_2007_test_selective_search_roidb.pkl
    done
    4952 validation roidb entries
    Filtered 0 roidb entries: 10022 -> 10022
    Filtered 0 roidb entries: 4952 -> 4952
    Solving...
    Loading initial model weights from data/imagenet_weights/vgg16.pth
    Loaded.
    Traceback (most recent call last):
    File "./tools/trainval_net.py", line 135, in
    max_iters=args.max_iters)
    File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/model/train_val.py", line 377, in train_net
    sw.train_model(max_iters)
    File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/model/train_val.py", line 291, in train_model
    cls_det_loss, refine_loss_1, refine_loss_2, consistency_loss, total_loss = self.net.train_step(blobs,self.optimizer,iter)
    File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/nets/network.py", line 634, in train_step
    self.forward(blobs['data'], blobs['image_level_labels'], blobs['im_info'], blobs['gt_boxes'], blobs['ss_boxes'], step)
    File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/nets/network.py", line 562, in forward
    roi_labels_1, keep_inds_1, roi_labels_2, keep_inds_2, bbox_pred, rois = self._predict_train(ss_boxes_all, step)
    File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/nets/network.py", line 508, in _predict_train
    roi_labels_2, keep_inds_2, bbox_pred = self._region_classification_train(pool5_roi, fc7_roi,fc7_context, fc7_frame, step)
    File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/nets/network.py", line 398, in _region_classification_train
    mask_1 = self._inverted_attention(bbox_feats_new, gt, keep_inds_1_new, 1, step, fg_num_1_new, bg_num_1_new)
    File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/nets/network.py", line 147, in _inverted_attention
    pooled_feat_before_after = torch.cat((bbox_feats_new, bbox_feats_new * mask_all), dim=0)
    RuntimeError: CUDA out of memory. Tried to allocate 766.00 MiB (GPU 0; 11.91 GiB total capacity; 9.59 GiB already allocated; 99.19 MiB free; 1.49 GiB cached)

I would appreciate it if you could help me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant