Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 39.59 GiB total capacity; 33.31 GiB already allocated; 1.06 GiB free; 36.81 GiB reserved in total by PyTorch) #144

Open
asagar60 opened this issue Apr 30, 2022 · 3 comments

Comments

@asagar60
Copy link

asagar60 commented Apr 30, 2022

I trying to generate Images using pretrained StyleGAN2-SPD-ADA , but this error is coming which i initially thought was due to 15 GB GPU of colab , but i tried with 24, and 40 GB GPU still getting the same error

I tried reducing the batchsize from 64-> 32 --> 16 .. still the same

code :
!python PyTorch-StudioGAN/src/main.py -t -v -ckpt StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19 -cfg PyTorch-StudioGAN/src/configs/AFHQ/StyleGAN2-SPD-ADA.yaml -save gen -data afhq -best

Logs:--

[INFO] 2022-04-30 07:22:26 > Generator checkpoint is StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19/model=G-best-weights-step=196000.pth
[INFO] 2022-04-30 07:22:26 > EMA_Generator checkpoint is StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19/model=G_ema-best-weights-step=196000.pth
[INFO] 2022-04-30 07:22:26 > Discriminator checkpoint is StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19/model=D-best-weights-step=196000.pth
/opt/conda/lib/python3.8/site-packages/torchvision/models/inception.py:44: FutureWarning: The default weight initialization of inception_v3 will be changed in future releases of torchvision. If you wish to keep the old behavior (which leads to long initialization times due to scipy/scipy#11299), please set init_weights=True.
warnings.warn(
wandb: Currently logged in as: asagar60 (use wandb login --relogin to force relogin)
wandb: Tracking run with wandb version 0.12.15
wandb: Run data is saved locally in gen/wandb/run-20220430_072228-1zr43u7c
wandb: Run wandb offline to turn off syncing.
wandb: Resuming run StyleGAN2-SPD-ADA-train-2021_10_18_16_01_19
wandb: ⭐️ View project at https://wandb.ai/asagar60/uncategorized
wandb: 🚀 View run at https://wandb.ai/asagar60/uncategorized/runs/1zr43u7c
[INFO] 2022-04-30 07:22:29 > Start training!
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Traceback (most recent call last):
File "PyTorch-StudioGAN/src/main.py", line 182, in
loader.load_worker(local_rank=rank,
File "/home/PyTorch-StudioGAN/src/loader.py", line 348, in load_worker
gen_acml_loss = worker.train_generator(current_step=step)
File "/home/PyTorch-StudioGAN/src/worker.py", line 564, in train_generator
fake_dict = self.Dis(fake_images_, fake_labels)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/PyTorch-StudioGAN/src/models/stylegan2.py", line 849, in forward
x, img = block(x, img, **block_kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/PyTorch-StudioGAN/src/models/stylegan2.py", line 648, in forward
x = self.conv0(x)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/PyTorch-StudioGAN/src/models/stylegan2.py", line 176, in forward
x = conv2d_resample.conv2d_resample(x=x,
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_resample.py", line 133, in conv2d_resample
return _conv2d_wrapper(x=x, w=w, padding=[py0,px0], groups=groups, flip_weight=flip_weight)
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_resample.py", line 41, in _conv2d_wrapper
return op(x, w, stride=stride, padding=padding, groups=groups)
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_gradfix.py", line 37, in conv2d
return _conv2d_gradfix(transpose=False, weight_shape=weight.shape, stride=stride, padding=padding, output_padding=0, dilation=dilation, groups=groups).apply(input, weight, bias)
File "/home/PyTorch-StudioGAN/src/utils/style_ops/conv2d_gradfix.py", line 127, in forward
return torch.nn.functional.conv2d(input=input, weight=weight, bias=bias, **common_kwargs)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 39.59 GiB total capacity; 33.31 GiB already allocated; 1.06 GiB free; 36.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@alex4727
Copy link
Collaborator

alex4727 commented Apr 30, 2022

I think there's a bug regarding -v option. For now, instead of saving it as a canvas (its what -v option does) you can try to save images one by one in png format. To do so, add -sf -sf_num NUMBER_OF_IMAGES_TO_GENERATE options. If you are only planning to generate images, you can omit -t option and specify -metrics none to avoid unnecessary training and evaluation steps. We'll try to fix the bug ASAP.
+) Since StyleGAN Models are trained using mixed precision, I also recommend using -mpc in all cases.

@lavish619
Copy link
Contributor

@alex4727
Hi,
You have mentioned in your comment that StyleGAN Models are trained using Mixed Precision, but in the code, wherever mixed-precision is used, an additional condition of not is_stylegan is present, so I was trying to figure out why mixed-precision training is disabled for StyleGAN, and now it confuses me as you mentioned that StyleGAN uses mpc.

It would be very helpful if you could clarify that. Thanks in advance..!!

@alex4727
Copy link
Collaborator

@lavish619
Sorry for late reply,
You are correct, wherever mixed-precision is used, an additional condition of not is_stylegan is present. That is because StyleGAN incorporates fp16 datatypes in the model file itself so there's no need of using torch.cuda.amp.autocast() wrapper in the worker.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants