Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory. Tried to allocate error #82

Closed
daggs1 opened this issue May 13, 2024 · 2 comments
Closed

CUDA out of memory. Tried to allocate error #82

daggs1 opened this issue May 13, 2024 · 2 comments

Comments

@daggs1
Copy link

daggs1 commented May 13, 2024

Greetings,

I'm trying to run gardio_app demo like stated in the readme and I'm getting this error:

$ python gradio_app.py
/home/worker/zero123plus/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
text_encoder/model.safetensors not found
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00, 2.43it/s]
Traceback (most recent call last):
File "/home/worker/zero123plus/gradio_app.py", line 204, in
fire.Fire(run_demo)
File "/home/worker/zero123plus/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/worker/zero123plus/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/worker/zero123plus/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/worker/zero123plus/gradio_app.py", line 137, in run_demo
pipeline.to(f'cuda:{_GPU_ID}')
File "/home/worker/zero123plus/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 727, in to
module.to(torch_device, torch_dtype)
File "/home/worker/zero123plus/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1878, in to
return super().to(*args, **kwargs)
File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
return self._apply(convert)
File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
param_applied = fn(param)
File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert
return t.to(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU

any idea what is wrong? I've ran the setup like stated in the readme file

@daggs1
Copy link
Author

daggs1 commented May 14, 2024

I understand now, your small example requires 5GB of vram, my gpu has only 4GB of vram, shame, is there any way to reduce memory consumption?

@eliphatfs
Copy link
Collaborator

Nowadays we do have more techniques to reduce inference-time memory, including model offloading, autotune compile, quantization and maybe others. We do not have the code for these ready now but you are welcome to contribute. The first one does not incur any time overhead usually, while compiling will take some time before the start. Quantization would need some extra code and more tuning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants