Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 13.90 GiB already allocated; 14.75 MiB free; 14.14 GiB reserved in total by PyTorch #18

Open
paratechnical opened this issue Jan 5, 2024 · 3 comments

Comments

@paratechnical
Copy link

I keep getting out of memory exceptions no matter how I try to set PYTORCH_CUDA_ALLOC_CONF
This is the error:
File "/opt/saturncloud/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 13.90 GiB already allocated; 14.75 MiB free; 14.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@JahnKhan
Copy link

JahnKhan commented Jan 7, 2024

I got the same message. Could you fix this?

@GraemeHarris
Copy link

@paratechnical @JahnKhan I've had some success using the following https://huggingface.co/docs/diffusers/optimization/memory#memoryefficient-attention, the pipeline is instantiated in the load_trained_pipeline function, where you should be able to try reduce the memory usage as per the hugging face article.

Because I was still low on VRAM I went to the pipe.enable_sequential_cpu_offload() option, which is much slower but working :). I haven't tried model offloading yet, but might be something to try to keep some speed.

@paratechnical
Copy link
Author

@GraemeHarris

if model_path is not None:
        # TODO: long warning for lora
        pipe = DiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
        if load_lora:
            pipe.load_lora_weights(lora_path)
    else:
        pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")
    pipe.to("cuda")
    pipe.enable_xformers_memory_efficient_attention()
    pipe.enable_sequential_cpu_offload()

tried it like this and I have the same problem

What kind of GPU configuration are you using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants