Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: NotImplementedError: No operator found for memory_efficient_attention_forward with inputs: query #10429

Closed
1 task done
WYS-Stack opened this issue May 16, 2023 · 4 comments
Labels
not-an-issue This issue is not with the repo itself.

Comments

@WYS-Stack
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What happened?

When I generate a picture and the picture progress bar is about to end, the webUI page will report the following error:

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs: query : shape=(1, 4096, 1, 512) (torch.float32) key : shape=(1, 4096, 1, 512) (torch.float32) value : shape=(1, 4096, 1, 512) (torch.float32) attn_bias : <class 'NoneType'> p : 0.0 cutlassF is not supported because: device=mps (supported: {'cuda'}) flshattF is not supported because: device=mps (supported: {'cuda'}) dtype=torch.float32 (supported: {torch.bfloat16, torch.float16}) max(query.shape[-1] != value.shape[-1]) > 128 Operator wasn't built - see python -m xformers.info for more info tritonflashattF is not supported because: device=mps (supported: {'cuda'}) dtype=torch.float32 (supported: {torch.bfloat16, torch.float16}) max(query.shape[-1] != value.shape[-1]) > 128 Operator wasn't built - see python -m xformers.info for more info triton is not available smallkF is not supported because: device=mps (supported: {'cpu', 'cuda'}) max(query.shape[-1] != value.shape[-1]) > 32 unsupported embed per head: 512

Steps to reproduce the problem

  1. Go to ....
  2. Press ....
  3. ...

What should have happened?

normal operation

Commit where the problem happens

89f9faa

What platforms do you use to access the UI ?

MacOS

What browsers do you use to access the UI ?

Microsoft Edge

Command Line Arguments

./webui.sh --xformers

List of extensions

Console logs

(stable-diffusion-webui) wanghan@bogon stable-diffusion-webui % ./webui.sh --xformers

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on wanghan user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Python 3.10.9 (main, Mar  8 2023, 04:44:30) [Clang 14.0.6 ]
Version: v1.2.1
Commit hash: 89f9faa63388756314e8a1d96cf86bf5e0663045
Fetching updates for K-diffusion...
Checking out commit for K-diffusion with hash: 51c9778f269cedb55a4d88c79c0246d35bdadb71...
Installing requirements

Installing requirements 1 for Infinite-Zoom



Launching Web UI with arguments: --xformers --skip-torch-cuda-test --upcast-sampling --no-half-vae --use-cpu interrogate
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.1.0.dev20230511 with CUDA None (you have 2.0.1)
    Python  3.10.9 (you have 3.10.9)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
=================================================================================
You are running xformers 0.0.16.
The program is tested to work with xformers 0.0.17.
To reinstall the desired version, run with commandline flag --reinstall-xformers.

Use --skip-version-check commandline argument to disable this check.
=================================================================================
dirname:  /Users/wanghan/Desktop/code/stable-diffusion-webui/localizations
localizations:  {'zh_CN': '/Users/wanghan/Desktop/code/stable-diffusion-webui/localizations/zh_CN.json'}
ControlNet v1.1.173
ControlNet v1.1.173
Image Browser: ImageReward is not installed, cannot be used.
Loading weights [6ce0161689] from /Users/wanghan/Desktop/code/stable-diffusion-webui/models/Stable-diffusion/官方/v1-5-pruned-emaonly.safetensors
Creating model from config: /Users/wanghan/Desktop/code/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 5.0s (import torch: 0.9s, import gradio: 0.8s, import ldm: 0.3s, other imports: 0.7s, load scripts: 0.8s, create ui: 1.3s, gradio launch: 0.1s).
Applying cross attention optimization (InvokeAI).
Textual inversion embeddings loaded(3): bad-picture-chill-75v, pureerosface_v1, Style-Winter
Model loaded in 9.7s (load weights from disk: 0.3s, create model: 1.0s, apply weights to model: 5.4s, apply half(): 2.2s, move model to device: 0.6s).
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:18<00:00,  1.07it/s]
Error completing request█████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:15<00:00,  1.23it/s]
Arguments: ('task(sdcobvq71tr3pnt)', 'a girl,', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, 'MultiDiffusion', False, True, 1024, 1024, 96, 96, 48, 4, 'None', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 512, 64, True, True, True, False, <controlnet.py.UiControlNetUnit object at 0x28ab2d960>, <controlnet.py.UiControlNetUnit object at 0x28ab2d1e0>, <controlnet.py.UiControlNetUnit object at 0x28ab2d150>, False, 1, 0.15, False, 'OUT', ['OUT'], 5, 0, 'Bilinear', False, 'Bilinear', False, 'Lerp', '', '', False, False, None, True, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, None, False, None, False, None, False, 50) {}
Traceback (most recent call last):
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/modules/call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/modules/processing.py", line 526, in process_images
    res = process_images_inner(p)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/modules/processing.py", line 682, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/modules/processing.py", line 682, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/modules/processing.py", line 444, in decode_first_stage
    x = model.decode_first_stage(x)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/modules/sd_hijack_utils.py", line 26, in __call__
    return self.__sub_func(self.__orig_func, *args, **kwargs)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/modules/sd_hijack_unet.py", line 76, in <lambda>
    first_stage_sub = lambda orig_func, self, x, **kwargs: orig_func(self, x.to(devices.dtype_vae), **kwargs)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/model.py", line 631, in forward
    h = self.mid.attn_1(h)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/model.py", line 258, in forward
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op=self.attention_op)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 197, in memory_efficient_attention
    return _memory_efficient_attention(
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 293, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 309, in _memory_efficient_attention_forward
    op = _dispatch_fw(inp)
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 95, in _dispatch_fw
    return _run_priority_list(
  File "/Users/wanghan/Desktop/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 70, in _run_priority_list
    raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 4096, 1, 512) (torch.float32)
     key         : shape=(1, 4096, 1, 512) (torch.float32)
     value       : shape=(1, 4096, 1, 512) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`cutlassF` is not supported because:
    device=mps (supported: {'cuda'})
`flshattF` is not supported because:
    device=mps (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    max(query.shape[-1] != value.shape[-1]) > 128
`tritonflashattF` is not supported because:
    device=mps (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    max(query.shape[-1] != value.shape[-1]) > 128
    triton is not available
`smallkF` is not supported because:
    device=mps (supported: {'cuda', 'cpu'})
    max(query.shape[-1] != value.shape[-1]) > 32
    unsupported embed per head: 512

Additional information

My operating system:MacOS 13.3.1
pytorch version:2.0.0 / 2.0.1(I've tried it all before)
xformers version:0.0.16
python version:3.10.9

@WYS-Stack WYS-Stack added the bug-report Report of a bug, yet to be confirmed label May 16, 2023
@Sakura-Luna Sakura-Luna added duplicate not-an-issue This issue is not with the repo itself. and removed bug-report Report of a bug, yet to be confirmed labels May 16, 2023
@Sakura-Luna
Copy link
Collaborator

xformers only supports Nvidia GPUs.

@Sakura-Luna Sakura-Luna closed this as not planned Won't fix, can't repro, duplicate, stale May 16, 2023
@WYS-Stack
Copy link
Author

xformers仅支持Nvidia GPU。

I understand. Do you have any other optimization and acceleration plans on Mac

@Sakura-Luna
Copy link
Collaborator

Discussions of macOS are focused here #5461.

@arvin-chou
Copy link

xformers仅支持Nvidia GPU。

I understand. Do you have any other optimization and acceleration plans on Mac

dumb hack:
install xformers==0.0.20.dev539
and force let 'make_attn' use CPU(red rectangle) - modify repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/model.py
image
image
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not-an-issue This issue is not with the repo itself.
Projects
None yet
Development

No branches or pull requests

3 participants