Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly GPU memory leak? #24

Closed
kshieh1 opened this issue Apr 13, 2023 · 11 comments
Closed

Possibly GPU memory leak? #24

kshieh1 opened this issue Apr 13, 2023 · 11 comments

Comments

@kshieh1
Copy link

kshieh1 commented Apr 13, 2023

Hi,

Found a GPU out-of-memory(OOM) error when using comple in my project. I made a shorter test program out of your compel-demp.py :

import torch
from compel import Compel
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from torch import Generator

device = "cuda"
pipeline = StableDiffusionPipeline.from_pretrained("dreamlike-art/dreamlike-photoreal-2.0",
                                                   torch_dtype=torch.float16).to(device)
# dpm++
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config,
                                                             algorithm_type="dpmsolver++")

COMPEL = True
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

i = 0
while True:
    prompts = ["a cat playing with a ball++ in the forest", "a cat playing with a ball in the forest"]

    if COMPEL:
        prompt_embeds = torch.cat([compel.build_conditioning_tensor(prompt) for prompt in prompts])
        images = pipeline(prompt_embeds=prompt_embeds, num_inference_steps=10, width=256, height=256).images
        #del prompt_embeds # not helping
    else:
        images = pipeline(prompt=prompts, num_inference_steps=10, width=256, height=256).images
    i += 1
    print(i, images)

    images[0].save('img0.jpg')
    images[1].save('img1.jpg')

Tested on Nvidia RTX-3050Ti Mobile GPU w/ 4G VRAM, an OOM exception will occur after 10~20 iterations. No OOM if use COMPEL = False.

@damian0815
Copy link
Owner

hmm, compel is basically stateless, there isn't much that could leak that i have much control over. torch is sometimes poor at cleaning up its caches properly, you might want to try calling torch.cuda.empty_cache() occasionally

@kshieh1
Copy link
Author

kshieh1 commented Apr 14, 2023

Thanks. I think I have pushed VRAM usage on edge -- maybe torch need some extra room to maneuver...

(Updated Apr. 17) OOM occurs even if just prompt embeddings were built repeatedly w/o running inference (i.e., images = pipeline(...) has been commented out). torch.cuda.empty_cache() does not help.

@damian0815
Copy link
Owner

urgh. idk. i also don't have a local gpu to readily debug this. have you tried tearing down the compel instance and making a new one for each prompt?

@kshieh1
Copy link
Author

kshieh1 commented Apr 26, 2023

Interesting. I run the same test on Google Colab (GPU w/ 12G VRAM) and no OOM issue occured. Then I updated my local envrionment with exact same package versions (e.g., torch, diffusers, compel, ... etc) like the Colab however OOM issue still occurs. Local test was on Nvidia GPU with 4G and 8G, btw.

init & delete compel instance inside the loop doesn't help, fyi

@jbhurruth
Copy link

@kshieh1 Did you ever figure out a solution to this? I'm also hitting my 6GB limit as soon as I use the compel embeddings

@kshieh1
Copy link
Author

kshieh1 commented May 25, 2023

@kshieh1 Did you ever figure out a solution to this? I'm also hitting my 6GB limit as soon as I use the compel embeddings

No luck so far

@kshieh1
Copy link
Author

kshieh1 commented May 26, 2023

I think I have come out a solution. After image generation, you should explictly de-reference the tensor object (i.e., prompt_embeds = None) and call gc.collect()

@damian0815
Copy link
Owner

ahh nice. i'll add a note on the readme for the next version. thanks for sharing your solution!

@damian0815
Copy link
Owner

The readme has been updated.

@damian0815
Copy link
Owner

@kshieh1 we encountered a possibly related (possibly the same?) problem in InvokeAI, which was resolved by doing calls to Compel inside a with torch.no_grad(): block. did you try this?

@kshieh1
Copy link
Author

kshieh1 commented Jul 5, 2023

Yeah, I just did a quick test and found the amount of cuda memory allocation is stable -- I think I can get rid of those costly gc.collect() operations from my code.

Thanks for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants