Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: IPAdapter, RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: c10::Half key.dtype: float and value.dtype: float instead. #2208

Closed
1 task done
frankjiang opened this issue Nov 1, 2023 · 16 comments · Fixed by #2348
Labels
MacOS MacOS related issue

Comments

@frankjiang
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

What happened?

IPAdapter cannot run correctly.

Steps to reproduce the problem

  1. Img2Img
  2. ControlNet (latest)
  3. Choose IPAdapter
  4. Choose ip-adapter_clip_sd15 (default)
  5. Choose ip-adapter-plus-face_sd15 [71693645] (default)
  6. Add prompts
  7. Generate

What should have happened?

raise a RuntimeError

Commit where the problem happens

webui: 5ef669de080814067961f28357256e8fe27544f4
controlnet: 3011ff6

What browsers do you use to access the UI ?

No response

Command Line Arguments

No

List of enabled extensions

image

Console logs

*** Error completing request
*** Arguments: ('task(510w65ya0s7jt96)', 0, '', '', ['Asian Boy Portrait'], <PIL.Image.Image image mode=RGBA size=512x512 at 0x2A90DE920>, None, None, None, None, None, None, 20, 'DPM++ 2M Karras', 4, 0, 1, 1, 1, 7, 1.5, 0.75, 0, 512, 512, 1, 0, 0, 32, 0, '', '', '', [], False, [], '', <gradio.routes.Request object at 0x32fc100a0>, 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, 'MultiDiffusion', False, True, 1024, 1024, 96, 96, 48, 4, 'None', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 512, 64, True, True, True, False, <scripts.animatediff_ui.AnimateDiffProcess object at 0x36fc1c880>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x32fb90400>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x32fb902b0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x2afde87f0>, '* `CFG Scale` should be 2 or lower.', True, True, '', '', True, 50, True, 1, 0, False, 4, 0.5, 'Linear', 'None', '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, 50) {}
    Traceback (most recent call last):
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/img2img.py", line 208, in img2img
        processed = process_images(p)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/processing.py", line 732, in process_images
        res = process_images_inner(p)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/processing.py", line 867, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/hook.py", line 451, in process_sample
        return process.sample_before_CN_hack(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/processing.py", line 1528, in sample
        samples = self.sampler.sample_img2img(self, self.init_latent, x, conditioning, unconditional_conditioning, image_conditioning=self.image_conditioning)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 188, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 188, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_samplers_cfg_denoiser.py", line 169, in forward
        x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 112, in forward
        eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 138, in get_eps
        return self.inner_model.apply_model(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_hijack_utils.py", line 26, in __call__
        return self.__sub_func(self.__orig_func, *args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_hijack_unet.py", line 48, in apply_model
        return orig_func(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).float()
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
        x_recon = self.model(x_noisy, t, **cond)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1335, in forward
        out = self.diffusion_model(x, t, context=cc)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/hook.py", line 858, in forward_webui
        raise e
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/hook.py", line 855, in forward_webui
        return forward(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/hook.py", line 762, in forward
        h = module(h, emb, context)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
        x = layer(x, context)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 334, in forward
        x = block(x, context=context[i])
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 269, in forward
        return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 121, in checkpoint
        return CheckpointFunction.apply(func, len(inputs), *args)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
        return super().apply(*args, **kwargs)  # type: ignore[misc]
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 136, in forward
        output_tensors = ctx.run_function(*ctx.input_tensors)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 273, in _forward
        x = self.attn2(self.norm2(x), context=context) + x
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/controlmodel_ipadapter.py", line 246, in attn_forward_hacked
        out = out + f(self, x, q)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/controlmodel_ipadapter.py", line 406, in forward
        ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k, ip_v, attn_mask=None, dropout_p=0.0, is_causal=False)
    RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: c10::Half key.dtype: float and value.dtype: float instead.

---

Additional information

Also occurs in other ip-adapter models, e.g. ip-adapter-plus_sd15 [c817b455]

@Seal-Pavel
Copy link

same issue

@undeadx1
Copy link

undeadx1 commented Nov 5, 2023

same issue too. env : m1 mac

@Idmon
Copy link

Idmon commented Nov 6, 2023

Same here. IP-Adapter been buggy and can't get it to work

@Osato28
Copy link

Osato28 commented Nov 16, 2023

Same here. M1 Mac 8GB, Sonoma 14.1.1.

Information that might be related: Sonoma has previously caused an fp16-related issue with NeuralNet on PyTorch 2.1.0, but that particular problem was solved by updating to 2.2.0.dev20231012. (Issue AUTOMATIC1111/stable-diffusion-webui#13419)

Attempted solutions:
Launching SD with --no-half "fixes" the problem by forcing all fp16 values into fp32, but it also slows down each iteration by 8-12 times (from 2 to 16-20 seconds, in my case).
UPD: Tried enabling the "Upcast cross attention layer to float32" option in Settings -> Stable Diffusion. Didn't work.

@beltonk
Copy link

beltonk commented Nov 17, 2023

Same here. M1 Max

@beltonk
Copy link

beltonk commented Nov 17, 2023

This works for me:

Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430
to
ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False)

to convert ip_k & ip_v from float to c10:Half by adding .half() for each.

Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.

@huchenlei
Copy link
Collaborator

This works for me:

Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False)

to convert ip_k & ip_v from float to c10:Half by adding .half() for each.

Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.

Anyone verify this solution on their Mac? I do not have an MacOS machine to verify this patch. I will merge this patch to main branch once it is verified.

@huchenlei huchenlei added the MacOS MacOS related issue label Nov 21, 2023
@Osato28
Copy link

Osato28 commented Nov 21, 2023

This works for me:
Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False)
to convert ip_k & ip_v from float to c10:Half by adding .half() for each.
Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.

Anyone verify this solution on their Mac? I do not have an MacOS machine to verify this patch. I will merge this patch to main branch once it is verified.

I can't compare the results to an Nvidia machine, so I'm going to post a detailed report with image samples just in case this fix caused some weirdness that I can't detect.

My apologies if this response is a bit long; I'd rather be thorough than miss something that an Nvidia owner would notice.

TL;DR:

  1. Tested on txt2img and img2img. Didn't find any issues.
  2. Outputs in both modes are highly accurate and reproducible.
  3. The slowdown due to IPAdapter seems to be within 15% of the original s/it value.

Testing parameters:

Processor: M1 8GB.

OS: Sonoma 14.1.1.

PyTorch version: 2.2.0.dev20231012

Webui arguments on launch: --skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate.

Resolutions: 512x512 and 512x768.

IPAdapter settings: ip-adapter_clip -> ip-adapter-plus-face_sd15, Low VRAM, Control Weight 0.7, Steps 0.5-1.0.


Attaching XY grids below to display the results.

Model: Deliberate v2.

Sampler: DPM++ 2M Karras, sampling steps: 20.

Prompt: female nurse, black hair.

Negative prompt: nsfw, disfigured, (deformed), ugly, saturated, doll, cgi, calligraphy, mismatched eyes, poorly drawn, b&w, blurry, missing, ((malformed)), ((out of frame)), model, letters, mangled, old, surreal, ((bad anatomy)), ((deformed legs)), ((deformed arms)).

IPAdapter image:

image (22)

  1. 512x512. No issues. Average time per iteration: 1.555 s/it without ControlNet, 1.6 s/it with IPAdapter.

xyz_grid-0001-2734938831

  1. 512x768. No issues. Average time per iteration: 2.75 s/it without ControlNet, 2.965 s/it with IPAdapter

xyz_grid-0002-2734938831

  1. Reproducibility test: generating from the same seed three times, IPAdapter turned on, to see if outputs will differ from each other. No issues.

xyz_grid-0003-2734938831

  1. img2img test (using only one seed, testing for accuracy and reproducibility at the same time). No issues.

xyz_grid-0001-2734938831

@beltonk
Copy link

beltonk commented Nov 21, 2023

@Osato28 So the fix works for you too, right? Do you spot anything weird in your generations?

Your generations look pretty cool to me. I'm bad in tuning settings for nice outputs...

If the output does work for Apple Silicon, my only concern is about the --upcast-sampling, --no-half settings, etc. I have a feeling they are related to the error. simply typecasting by .half() might break users not using Apple Silicon. I only have a M1 Max, so unable to test for other PC / GPU / CPU...

By the way, My COMMANDLINE_ARGS is:

"--skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --medvram --use-cpu Interrogate --no-half-vae --disable-safe-unpickle --autolaunch",

which I thought is optimized for Apple Silicon

@Osato28
Copy link

Osato28 commented Nov 21, 2023

@beltonk I didn't spot anything weird and I can't test it on non-Apple Silicon.

Hence the overly detailed test results: I'm hoping that if there is anything weird, it will be caught by someone with a more traditional GPU.

Thank you for posting that fix, by the way. I couldn't make heads or tails of how IPAdapter worked, and I didn't have the courage to blindly typecast values until the error message went away.


Offtopic:

  1. Prettiness is not due to prompt engineering but due to the model, Deliberate v2. It's as stable and balanced as models get: it would probably give better results with a shorter negative prompt, I just stopped optimizing that prompt halfway.

  2. As for COMMANDLINE_ARGS, I simply kept the most minimal set that prevented crashes and kept performance reasonably high. I didn't optimize it besides that. --medvram does seem to improve performance with heavier ControlNet models, though; added it to my args, thank you.

But I'm afraid that both of those discussions are outside the scope of this issue.

If you wish to initiate testing on several Apple Silicon machines to find an optimal set of COMMANDLINE_ARGS, I think it would be better to start a separate discussion issue in the main AUTOMATIC1111 repo.

@axeldelafosse
Copy link

Thank you @beltonk -- your fix worked for me too!

@Lichtfabrik
Copy link

Thx @beltonk -- works for me as well!

@Osniackal
Copy link

The fix of @beltonk worked for me on m2 mac mini

@MrSegundus
Copy link

Worked here! (Mac, M2 / 1111 v 1.7)

@alamyrjunior
Copy link

This works for me:

Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False)

to convert ip_k & ip_v from float to c10:Half by adding .half() for each.

Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.

which file should I change? Cant find controlmodel_ipadapter.py

@xuyang16
Copy link

xuyang16 commented Aug 5, 2024

Thank you,@huchenlei
#2348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MacOS MacOS related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.