Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"CUDA out of memory" on dataset with 300 classes. #22

Open
igorvishnevskiy opened this issue May 25, 2022 · 1 comment
Open

"CUDA out of memory" on dataset with 300 classes. #22

igorvishnevskiy opened this issue May 25, 2022 · 1 comment

Comments

@igorvishnevskiy
Copy link

igorvishnevskiy commented May 25, 2022

Let me mention that training on dataset with 6K inputs and 1 class, works great. However the training with 300 classes and 6000 images large dateset causes the following error:

------------CPU Mode for This Batch-------------
2022-05-25 13:52:53 | INFO     | yolox.models.yolo_head:335 - OOM RuntimeError is raised due to the huge memory cost during label assignment. 
CPU mode is applied in this batch. If you want to avoid this issue, try to reduce the batch size or image size.
OOM RuntimeError is raised due to the huge memory cost during label assignment. 
CPU mode is applied in this batch. If you want to avoid this issue, try to reduce the batch size or image size.

Training continues for some time, then quits completely with "CUDA out of memory":

RuntimeError: CUDA out of memory. Tried to allocate 22.00 MiB (GPU 0; 7.79 GiB total capacity; 6.20 GiB already allocated; 21.44 MiB free; 6.26 GiB reserved in total by PyTorch)

Trying to fix it. Help from everyone else is welcome. Please drop your solutions/thoughts here. As soon as I find a solution, I will share it here too. Thank you.

I'm running on 2 GPUs. GTX 1070 and RTX 3070. Should be plenty. Platform needs more optimization.

P.S. Lowering batch doesn't help. I set batch to "-b 2" and devices also to "-d 2". 1 batch per GPU. Can't get lower than that.

Image size is set to:
self.input_size = (256, 512)
self.test_size = (256, 512)
Also very low res.

@igorvishnevskiy
Copy link
Author

Just tried to cut inputs down to 10 images, but left 300 classes. Still issue reproduces. Same low res, same 1 batch per GPU. Issue is definitely caused by the high number of classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant