GPU Out of Memory Issue #137

dongho-Han · 2024-04-02T12:14:35Z

When I try to evaluate with your code, I met GPU Memory Issue.
Especially, running this code

CUDA_VISIBLE_DEVICES=0,1,2,3 mpirun -n 4 python entry.py evaluate --conf_files configs/seem/focalt_unicl_lang_v1.yaml --overrides COCO.INPUT.IMAGE_SIZE 1024 MODEL.DECODER.HIDDEN_DIM 512 MODEL.ENCODER.CONVS_DIM 512 MODEL.ENCODER.MASK_DIM 512 VOC.TEST.BATCH_SIZE_TOTAL 8 TEST.BATCH_SIZE_TOTAL 8 REF.TEST.BATCH_SIZE_TOTAL 8 FP16 True WEIGHT True RESUME_FROM ./pretrained/seem_focalt_v1.pt

Could you share how much memory is needed for evaluation?

Error log:

  File "/home/Segment-Everything-Everywhere-All-At-Once/entry.py", line 75, in <module>
      main()
    File "/home/Segment-Everything-Everywhere-All-At-Once/entry.py", line 70, in main
      trainer.eval()
    File "/home/Segment-Everything-Everywhere-All-At-Once/trainer/default_trainer.py", line 79, in eval
      results = self._eval_on_set(self.save_folder)
    File "/home/Segment-Everything-Everywhere-All-At-Once/trainer/default_trainer.py", line 87, in _eval_on_set
      results = self.pipeline.evaluate_model(self, save_folder)
    File "/home/Segment-Everything-Everywhere-All-At-Once/./pipeline/XDecoderPipeline.py", line 155, in evaluate_model
      outputs = model(batch, mode=eval_type)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/BaseModel.py", line 19, in forward
      outputs = self.model(*inputs, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/architectures/seem_model_v1.py", line 318, in forward
      return self.evaluate(batched_inputs)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/architectures/seem_model_v1.py", line 387, in evaluate
      outputs = self.sem_seg_head(features, target_queries=queries_grounding)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/body/xdecoder_head.py", line 99, in forward
      return self.layers(features, mask, target_queries, target_vlp, task, extra)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/body/xdecoder_head.py", line 102, in layers
      mask_features, transformer_encoder_features, multi_scale_features = self.pixel_decoder.forward_features(features)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/vision/encoder/transformer_encoder_fpn.py", line 293, in forward_features
      cur_fpn = lateral_conv(x)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      return forward_call(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/detectron2/layers/wrappers.py", line 110, in forward
      x = self.norm(x)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      return forward_call(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 279, in forward
      return F.group_norm(
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/functional.py", line 2558, in group_norm
      return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
  torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.41 GiB. GPU 0 has a total capacty of 23.64 GiB of which 386.50 MiB is free. Process 2385114 has 4.12 GiB memory in use. Process 2385112 has 17.04 GiB memory in use. Process 2385111 has 1.05 GiB memory in use. Process 2385113 has 1.05 GiB memory in use. Of the allocated memory 3.07 GiB is allocated by PyTorch, and 860.55 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I used 4 Titan RTX with 24576MiB.

The text was updated successfully, but these errors were encountered:

dongho-Han · 2024-04-04T17:20:33Z

@MaureenZOU @jwyang
Could you check on this issue?
I also get the error when using seem_samvitb with the same code as assets/readmes/EVAL.md.
How can I change the values to run your code without GPU memory problem? As a first step, I changed the batch size to 2, but fails.

In INSTALL.md, you mentioned

CUDA enabled GPU with Memory > 8GB (Evaluation)

but I think my setting is doing something wrong.
When I check the status, only 1 GPU is used even when I change the CUDA_VISIBLE_DEIVCES & mpi-run number. And the number of mpirun is only used for the # of concurrent tasks in one GPU.
This image shows the status when I try to evaluate with 8 GPUs.
Did you use mpi for distributed GPUs or CPUs?

juju0111 · 2024-05-03T11:09:11Z

same problem!!

Beck-127 · 2024-05-09T06:05:24Z

@MaureenZOU @jwyang Could you check on this issue? I also get the error when using seem_samvitb with the same code as assets/readmes/EVAL.md. How can I change the values to run your code without GPU memory problem? As a first step, I changed the batch size to 2, but fails.

In INSTALL.md, you mentioned

CUDA enabled GPU with Memory > 8GB (Evaluation)

but I think my setting is doing something wrong. When I check the status, only 1 GPU is used even when I change the CUDA_VISIBLE_DEIVCES & mpi-run number. And the number of mpirun is only used for the # of concurrent tasks in one GPU. This image shows the status when I try to evaluate with 8 GPUs. Did you use mpi for distributed GPUs or CPUs?

Have you solved this problem?

jwyang · 2024-05-09T06:29:11Z

Hi, @dongho-Han , I noticed that in your script you used TEST.BATCH_SIZE_TOTAL 8 on 4 GPUs, can you try change it to 4?

MaureenZOU · 2024-05-26T15:56:32Z

Same suggestion, evaluating multiple images on a single image will cause: 1. Inaccurate evaluation (Because of padding). 2. OOM for GPU. I usually use 1 GPU for evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Out of Memory Issue #137

GPU Out of Memory Issue #137

dongho-Han commented Apr 2, 2024 •

edited

Loading

dongho-Han commented Apr 4, 2024 •

edited

Loading

juju0111 commented May 3, 2024

Beck-127 commented May 9, 2024

jwyang commented May 9, 2024

MaureenZOU commented May 26, 2024

GPU Out of Memory Issue #137

GPU Out of Memory Issue #137

Comments

dongho-Han commented Apr 2, 2024 • edited Loading

dongho-Han commented Apr 4, 2024 • edited Loading

juju0111 commented May 3, 2024

Beck-127 commented May 9, 2024

jwyang commented May 9, 2024

MaureenZOU commented May 26, 2024

dongho-Han commented Apr 2, 2024 •

edited

Loading

dongho-Han commented Apr 4, 2024 •

edited

Loading