Error when using enable_sequential_cpu_offload() #23

cmdr2 · 2023-04-12T10:41:30Z

Hi,

I'm getting an error when applying pipe.enable_sequential_cpu_offload() as per: https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings

compel works fine if we don't apply CPU offloading, but fails with it. diffusers+cpu_offloading works fine without compel.

compel: 1.0.5
diffusers: 0.14.0
accelerate: 0.15.0
transformers: 4.26.1

Modified version of the example from the README, using enable_sequential_cpu_offload():

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
# pipeline.to("cuda")
pipeline.enable_sequential_cpu_offload()
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# or: conditioning = compel([prompt])

# generate image
images = pipeline(prompt_embeds=conditioning, num_inference_steps=20).images
images[0].save("image.jpg")

We get: NotImplementedError: Cannot copy out of meta tensor; no data!

Full stacktrace:

Traceback (most recent call last):
  File "src\sdkit\scripts\run_everything.py", line 274, in run_samplers
    images = generate_images(
  File "d:\sd\user-dev\src\sdkit\sdkit\generate\image_generator.py", line 61, in generate_images
    return make_with_diffusers(
  File "d:\sd\user-dev\src\sdkit\sdkit\generate\image_generator.py", line 270, in make_with_diffusers
    cmd["prompt_embeds"] = compel(prompt)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\compel.py", line 80, in __call__
    cond_tensor.append(self.build_conditioning_tensor(text_input))
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\compel.py", line 70, in build_conditioning_tensor
    conditioning, _ = self.build_conditioning_tensor_for_prompt_object(prompt_object)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\compel.py", line 117, in build_conditioning_tensor_for_prompt_object
    return self._get_conditioning_for_flattened_prompt(prompt), {}
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\compel.py", line 159, in _get_conditioning_for_flattened_prompt
    conditioning, tokens = self.conditioning_provider.get_embeddings_for_weighted_prompt_fragments(
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\embeddings_provider.py", line 105, in get_embeddings_for_weighted_prompt_fragments
    base_embedding = self.build_weighted_embedding_tensor(tokens, per_token_weights, mask)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\compel\embeddings_provider.py", line 318, in build_weighted_embedding_tensor
    empty_z = self.text_encoder(empty_token_ids, return_dict=False)[0]
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\hooks.py", line 156, in new_forward
    output = old_forward(*args, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\transformers\models\clip\modeling_clip.py", line 816, in forward
    return self.text_model(
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\transformers\models\clip\modeling_clip.py", line 712, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\hooks.py", line 151, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\hooks.py", line 266, in pre_forward
    return send_to_device(args, self.execution_device), send_to_device(kwargs, self.execution_device)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\utils\operations.py", line 131, in send_to_device
    return recursively_apply(_send_to_device, tensor, device, non_blocking, test_type=_has_to_method)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\utils\operations.py", line 91, in recursively_apply
    {
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\utils\operations.py", line 92, in <dictcomp>
    k: recursively_apply(
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\utils\operations.py", line 99, in recursively_apply
    return func(data, *args, **kwargs)
  File "D:\sd\user-dev\stable-diffusion\env\lib\site-packages\accelerate\utils\operations.py", line 124, in _send_to_device
    return t.to(device, non_blocking=non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

Any ideas for this problem would be really appreciated!

Thanks!

The text was updated successfully, but these errors were encountered:

damian0815 · 2023-04-12T12:14:43Z

No idea. @patrickvonplaten any hints?

patrickvonplaten · 2023-04-12T13:31:05Z

Ah I see @cmdr2 I think when you use compel you'll have to manually offload the text encoder by doing the following:

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.to("cuda")
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# or: conditioning = compel([prompt])

# generate image
pipeline.enable_sequential_cpu_offload()
images = pipeline(prompt_embeds=conditioning, num_inference_steps=20).images
images[0].save("image.jpg")

cmdr2 · 2023-04-13T13:42:21Z

Thanks @patrickvonplaten and @damian0815 !

cmdr2 · 2023-05-01T17:09:39Z

Hi @patrickvonplaten and @damian0815 , sorry for reopening this issue, but the suggestion in the previous message only works for the first prompt. The next prompt fails again with NotImplementedError: Cannot copy out of meta tensor; no data!

Here's the modified version of the suggestion (which essentially just calls build_conditioning_tensor() again):

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.to("cuda")
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# 1. upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# or: conditioning = compel([prompt])

# generate image
pipeline.enable_sequential_cpu_offload()
images = pipeline(prompt_embeds=conditioning, num_inference_steps=4).images
images[0].save("image.jpg")

# ----- do this again -----
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)

This fails with the same stack-trace from my first message.

It's very strange, because in theory the text_encoder should be returning back to the GPU (via the hook), because compel calls text_encoder with a forward() call here, just like stable_diffusion_pipeline._encode_prompt() does. In theory, it should load text_encoder back to the GPU via the hook, but it fails for compel but works for stable_diffusion_pipeline. Not sure what's going on.

I even tried monkey-patching the Compel.device property to always return cuda:0 (for my GPU), to avoid having it return 'meta' for calls to self.device in compel. But that didn't make a difference.

Thanks for any help with this! :)

cmdr2 · 2023-05-02T11:45:23Z

Created a PR for this - #31

After this PR, the following code should work:

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
# pipeline.to("cuda")
pipeline.enable_sequential_cpu_offload()

compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder, device="cuda:0")

# 1. upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# or: conditioning = compel([prompt])

# generate image
images = pipeline(prompt_embeds=conditioning, num_inference_steps=4).images
images[0].save("image.jpg")

# ----- do this again -----
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)

cmdr2 · 2023-05-02T12:36:57Z

The hacky temporary workaround is to move the text_encoder modules back to the GPU, each time we need to build a tensor. I think this is what @patrickvonplaten meant?

This works for me (but PR #31 is obviously better!):

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
# pipeline.to("cuda")
pipeline.enable_sequential_cpu_offload()

compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# 1. upweight "ball"
[m._hf_hook.pre_forward(m) for m in pipeline.text_encoder.modules() if hasattr(m, "_hf_hook")]
print("moved to", pipeline.text_encoder.device)

prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# or: conditioning = compel([prompt])

# generate image
images = pipeline(prompt_embeds=conditioning, num_inference_steps=4).images
images[0].save("image.jpg")

# ----- do this again -----
[m._hf_hook.pre_forward(m) for m in pipeline.text_encoder.modules() if hasattr(m, "_hf_hook")]
print("moved to", pipeline.text_encoder.device)

prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)

Fix #23 - enable compel to work with enable_sequential_cpu_offload(), by optionally specifying the exact device to create the tensors on

cmdr2 closed this as completed Apr 13, 2023

cmdr2 reopened this May 1, 2023

cmdr2 mentioned this issue May 2, 2023

Fix #23 - enable compel to work with enable_sequential_cpu_offload(), by optionally specifying the exact device to create the tensors on #31

Merged

damian0815 closed this as completed in #31 May 2, 2023

damian0815 added a commit that referenced this issue May 2, 2023

Merge pull request #31 from cmdr2/cpu_offload

4a39892

Fix #23 - enable compel to work with enable_sequential_cpu_offload(), by optionally specifying the exact device to create the tensors on

xalteropsx mentioned this issue Jun 5, 2024

sequential crashing the embedding system forward huggingface/diffusers#8413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when using enable_sequential_cpu_offload() #23

Error when using enable_sequential_cpu_offload() #23

cmdr2 commented Apr 12, 2023 •

edited

Loading

damian0815 commented Apr 12, 2023

patrickvonplaten commented Apr 12, 2023

cmdr2 commented Apr 13, 2023

cmdr2 commented May 1, 2023 •

edited

Loading

cmdr2 commented May 2, 2023 •

edited

Loading

cmdr2 commented May 2, 2023 •

edited

Loading

Error when using enable_sequential_cpu_offload() #23

Error when using enable_sequential_cpu_offload() #23

Comments

cmdr2 commented Apr 12, 2023 • edited Loading

damian0815 commented Apr 12, 2023

patrickvonplaten commented Apr 12, 2023

cmdr2 commented Apr 13, 2023

cmdr2 commented May 1, 2023 • edited Loading

cmdr2 commented May 2, 2023 • edited Loading

cmdr2 commented May 2, 2023 • edited Loading

cmdr2 commented Apr 12, 2023 •

edited

Loading

cmdr2 commented May 1, 2023 •

edited

Loading

cmdr2 commented May 2, 2023 •

edited

Loading

cmdr2 commented May 2, 2023 •

edited

Loading