Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using enable_sequential_cpu_offload() #23

Closed
cmdr2 opened this issue Apr 12, 2023 · 6 comments · Fixed by #31
Closed

Error when using enable_sequential_cpu_offload() #23

cmdr2 opened this issue Apr 12, 2023 · 6 comments · Fixed by #31

Comments

@cmdr2
Copy link
Contributor

cmdr2 commented Apr 12, 2023

Hi,

I'm getting an error when applying pipe.enable_sequential_cpu_offload() as per: https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings

compel works fine if we don't apply CPU offloading, but fails with it. diffusers+cpu_offloading works fine without compel.

compel: 1.0.5
diffusers: 0.14.0
accelerate: 0.15.0
transformers: 4.26.1

Modified version of the example from the README, using enable_sequential_cpu_offload():

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
# pipeline.to("cuda")
pipeline.enable_sequential_cpu_offload()
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# or: conditioning = compel([prompt])

# generate image
images = pipeline(prompt_embeds=conditioning, num_inference_steps=20).images
images[0].save("image.jpg")

We get: NotImplementedError: Cannot copy out of meta tensor; no data!

Full stacktrace:

Traceback (most recent call last):
  File "src\sdkit\scripts\run_everything.py", line 274, in run_samplers
    images = generate_images(
  File "d:\sd\user-dev\src\sdkit\sdkit\generate\image_generator.py", line 61, in generate_images
    return make_with_diffusers(
  File "d:\sd\user-dev\src\sdkit\sdkit\generate\image_generator.py", line 270, in make_with_diffusers
    cmd["prompt_embeds"] = compel(prompt)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\compel.py", line 80, in __call__
    cond_tensor.append(self.build_conditioning_tensor(text_input))
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\compel.py", line 70, in build_conditioning_tensor
    conditioning, _ = self.build_conditioning_tensor_for_prompt_object(prompt_object)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\compel.py", line 117, in build_conditioning_tensor_for_prompt_object
    return self._get_conditioning_for_flattened_prompt(prompt), {}
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\compel.py", line 159, in _get_conditioning_for_flattened_prompt
    conditioning, tokens = self.conditioning_provider.get_embeddings_for_weighted_prompt_fragments(
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\embeddings_provider.py", line 105, in get_embeddings_for_weighted_prompt_fragments
    base_embedding = self.build_weighted_embedding_tensor(tokens, per_token_weights, mask)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\embeddings_provider.py", line 318, in build_weighted_embedding_tensor
    empty_z = self.text_encoder(empty_token_ids, return_dict=False)[0]
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\hooks.py", line 156, in new_forward
    output = old_forward(*args, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\transformers\models\clip\modeling_clip.py", line 816, in forward
    return self.text_model(
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\transformers\models\clip\modeling_clip.py", line 712, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\hooks.py", line 151, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\hooks.py", line 266, in pre_forward
    return send_to_device(args, self.execution_device), send_to_device(kwargs, self.execution_device)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\utils\operations.py", line 131, in send_to_device
    return recursively_apply(_send_to_device, tensor, device, non_blocking, test_type=_has_to_method)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\utils\operations.py", line 91, in recursively_apply
    {
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\utils\operations.py", line 92, in <dictcomp>
    k: recursively_apply(
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\utils\operations.py", line 99, in recursively_apply
    return func(data, *args, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\utils\operations.py", line 124, in _send_to_device
    return t.to(device, non_blocking=non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

Any ideas for this problem would be really appreciated!

Thanks!

@damian0815
Copy link
Owner

No idea. @patrickvonplaten any hints?

@patrickvonplaten
Copy link
Contributor

Ah I see @cmdr2 I think when you use compel you'll have to manually offload the text encoder by doing the following:

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.to("cuda")
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# or: conditioning = compel([prompt])

# generate image
pipeline.enable_sequential_cpu_offload()
images = pipeline(prompt_embeds=conditioning, num_inference_steps=20).images
images[0].save("image.jpg")

@cmdr2
Copy link
Contributor Author

cmdr2 commented Apr 13, 2023

Thanks @patrickvonplaten and @damian0815 !

@cmdr2 cmdr2 closed this as completed Apr 13, 2023
@cmdr2
Copy link
Contributor Author

cmdr2 commented May 1, 2023

Hi @patrickvonplaten and @damian0815 , sorry for reopening this issue, but the suggestion in the previous message only works for the first prompt. The next prompt fails again with NotImplementedError: Cannot copy out of meta tensor; no data!

Here's the modified version of the suggestion (which essentially just calls build_conditioning_tensor() again):

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.to("cuda")
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# 1. upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# or: conditioning = compel([prompt])

# generate image
pipeline.enable_sequential_cpu_offload()
images = pipeline(prompt_embeds=conditioning, num_inference_steps=4).images
images[0].save("image.jpg")

# ----- do this again -----
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)

This fails with the same stack-trace from my first message.

It's very strange, because in theory the text_encoder should be returning back to the GPU (via the hook), because compel calls text_encoder with a forward() call here, just like stable_diffusion_pipeline._encode_prompt() does. In theory, it should load text_encoder back to the GPU via the hook, but it fails for compel but works for stable_diffusion_pipeline. Not sure what's going on.

I even tried monkey-patching the Compel.device property to always return cuda:0 (for my GPU), to avoid having it return 'meta' for calls to self.device in compel. But that didn't make a difference.

Thanks for any help with this! :)

@cmdr2
Copy link
Contributor Author

cmdr2 commented May 2, 2023

Created a PR for this - #31

After this PR, the following code should work:

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
# pipeline.to("cuda")
pipeline.enable_sequential_cpu_offload()

compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder, device="cuda:0")

# 1. upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# or: conditioning = compel([prompt])

# generate image
images = pipeline(prompt_embeds=conditioning, num_inference_steps=4).images
images[0].save("image.jpg")

# ----- do this again -----
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)

@cmdr2
Copy link
Contributor Author

cmdr2 commented May 2, 2023

The hacky temporary workaround is to move the text_encoder modules back to the GPU, each time we need to build a tensor. I think this is what @patrickvonplaten meant?

This works for me (but PR #31 is obviously better!):

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
# pipeline.to("cuda")
pipeline.enable_sequential_cpu_offload()

compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# 1. upweight "ball"
[m._hf_hook.pre_forward(m) for m in pipeline.text_encoder.modules() if hasattr(m, "_hf_hook")]
print("moved to", pipeline.text_encoder.device)

prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# or: conditioning = compel([prompt])

# generate image
images = pipeline(prompt_embeds=conditioning, num_inference_steps=4).images
images[0].save("image.jpg")

# ----- do this again -----
[m._hf_hook.pre_forward(m) for m in pipeline.text_encoder.modules() if hasattr(m, "_hf_hook")]
print("moved to", pipeline.text_encoder.device)

prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)

damian0815 added a commit that referenced this issue May 2, 2023
Fix #23 - enable compel to work with enable_sequential_cpu_offload(), by optionally specifying the exact device to create the tensors on
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants