Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: Input types are not broadcast compatible #1890

Closed
Karsten385 opened this issue Oct 7, 2022 · 32 comments
Closed

error: Input types are not broadcast compatible #1890

Karsten385 opened this issue Oct 7, 2022 · 32 comments
Labels
bug-report Report of a bug, yet to be confirmed

Comments

@Karsten385
Copy link

Karsten385 commented Oct 7, 2022

Describe the bug
When attempting to generate an image batch, I get this error:

Batch 1 out of 4: 0%| | 0/30 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible LLVM ERROR: Failed to infer result type(s). zsh: abort python webui.py (web-ui) karsten@MacBook-Pro-2020 stable-diffusion-webui % /Users/karsten/miniconda3/envs/web-ui/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
This happens whenever I try and generate a batch of images.

Expected behavior
I'd expect it to run smoothly?

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Mac OSX Monterey, Running on a 2020 M1 MacBook Pro
  • Google Chrome
  • Commit revision - 2995107
@Karsten385 Karsten385 added the bug-report Report of a bug, yet to be confirmed label Oct 7, 2022
@demetrio
Copy link

demetrio commented Oct 9, 2022

I'm having a similar problem. I'm using Python 3.10.6 on a Mac M1. This is the error that I'm getting:

loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
./webui.sh: line 141: 27885 Abort trap: 6           "${python_cmd}" "${LAUNCH_SCRIPT}"
/Users/{usename}/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

I modified the following files to be able to use this project, stable-diffusion-webui.

// launch.py
torch_command = os.environ.get('TORCH_COMMAND', "pip install torch==1.12.1 torchvision==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu113")
commandline_args = os.environ.get('COMMANDLINE_ARGS', "--skip-torch-cuda-test")
// requirements.txt
diffusers==0.4.1
// instead of functorch==0.2.1
git+https://github.com/pytorch/functorch@release/0.2
// webui-user.sh
# Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"
export COMMANDLINE_ARGS="--skip-torch-cuda-test"
// webui.sh
# Do not reinstall existing pip packages on Debian/Ubuntu
export PIP_IGNORE_INSTALLED=0
export PYTORCH_ENABLE_MPS_FALLBACK=1

@amitsangani
Copy link

Were you able to resolve this issue?

@demetrio
Copy link

@amitsangani Unfortunately, no.
After searching I found other ways to install it, and even tried with newer versions that supposedly solve this problem, but I couldn't get it to work.

@algebris
Copy link

algebris commented Jan 28, 2023

I've got the same issue. I'm using M1 Mac. Trying to train hypernetwork. Run-time error.

loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<1x4096x640xf16>' and 'tensor<*xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1]    8281 abort      ./webui.sh
/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
{
    "datetime": "2023-01-29 02:31:36",
    "model_name": "v1-5-pruned-emaonly",
    "model_hash": "cc6cb27103",
    "num_of_dataset_images": 21,
    "layer_structure": [
        1.0,
        2.0,
        1.0
    ],
    "activation_func": "linear",
    "weight_init": "Normal",
    "add_layer_norm": false,
    "use_dropout": false,
    "hypernetwork_name": "alg",
    "learn_rate": "0.00005",
    "batch_size": 1,
    "gradient_step": 1,
    "data_root": "/Users/alg/Downloads/sl-model/preprocessed",
    "log_directory": "textual_inversion/2023-01-29/sl",
    "training_width": 512,
    "training_height": 512,
    "steps": 2000,
    "clip_grad_mode": "disabled",
    "clip_grad_value": "0.1",
    "latent_sampling_method": "once",
    "create_image_every": 100,
    "save_hypernetwork_every": 100,
    "template_file": "/Users/alg/Sites/stable-diffusion-webui/textual_inversion_templates/style_filewords.txt",
    "initial_step": 0
}

@duendevl
Copy link

Same error here

@alexshk
Copy link

alexshk commented Jan 31, 2023

Same issue. Mac M1

@jvsteiner
Copy link

same here - happens when trying to use a prepared hypernetwork

@TaciteOFF
Copy link

I'm on M1 Pro and I have error too when I try to train an embedding:

(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:228:0: error: 'mps.add' op requires the same element type for all operands and results
(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:228:0: note: see current operation: %5 = "mps.add"(%4, %arg2) : (tensor<1x4096x320xf32>, tensor<xf16>) -> tensor<xf32>
zsh: segmentation fault  ./webui.sh

@TheBrain0110
Copy link

Same here, but only when using a hypernetwork. I can run a standard model and generate images just fine, but when I try and add a hypernetwork I get the error.

(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:228:0: error: 'mps.add' op requires the same element type for all operands and results
(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:228:0: note: see current operation: %5 = "mps.add"(%4, %arg2) : (tensor<2x4096x640xf16>, tensor<*xf32>) -> tensor<*xf16>
[1]    59615 segmentation fault  ./webui.sh

The above is with the default torch 1.12.1
Interestingly, I tried upgrading to the latest torch 2.0.0.dev20230203 Torch nightly to see if that would help, and it gives a similar but slightly different error:

loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<2x4096x640xf16>' and 'tensor<640xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1]    61695 abort      ./webui.sh

@TheBrain0110
Copy link

Hey @brkirch you seem to know a thing or two about MPS, any thoughts on this?

@TheBrain0110
Copy link

Ok, so adding --no-half to $COMMANDLINE_ARGS does fix hypernetworks for me. I'd guess the type error comes from the half-precision conversions being applied to the main model but not the hypernetwork addon?

Anyway, I suppose I can live with the performance penalty that --no-half adds. It bring memory usage up from ~8GB to ~14GB, and reduces speed from ~3.4it/s to ~2.8it/s. Not great, but not terrible.

@harry-pham-wise
Copy link

Confirm that adding --no-half to $COMMANDLINE_ARGS fixed this issue for me on Mac M1.

@NathanSiemers
Copy link

In diffusers' stable diffusion, a recent upgrade started giving me this error. I had to remove torch calls that requested fp16
torch_dtype=torch.float16, revision="fp16"

@JumpIntoCoding
Copy link

same error.
To create a public link, set share=True in launch().
Textual inversion embeddings loaded(1): MetaTiger
Preprocessing [Image 11/12]: 100%|██████████████| 12/12 [00:01<00:00, 11.68it/s]
Training at rate of 0.005 until step 100
Preparing dataset...
100%|███████████████████████████████████████████| 10/10 [00:05<00:00, 1.76it/s]
0%| | 0/100000 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/810eba08-405a-11ed-86e9-6af958a02716/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x4096x320xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort /Users/a26911/Desktop/AI/NovelAI/stable-diffusion-webui/webui.sh
/Users/a26911/miniforge3/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

@pjq
Copy link

pjq commented Feb 20, 2023

@JumpIntoCoding After add --no-half, it works

./webui.sh --share --skip-torch-cuda-test --no-half
Launching Web UI with arguments: --share --no-half --skip-version-check

It shows the similar error previously

Applying cross attention optimization (InvokeAI).
Textual inversion embeddings loaded(0):
Model loaded in 2.3s (create model: 0.6s, apply weights to model: 0.5s, apply half(): 0.5s, move model to device: 0.4s).
Running on local URL:  http://127.0.0.1:7860
  0%|                                                                                        | 0/20 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/0aa643d0-625a-11ed-b319-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1]    24944 abort      ./webui.sh --share --skip-torch-cuda-test
/opt/homebrew/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@JumpIntoCoding
Copy link

@pjq wow, I guess its working now after I add --no-half.
The training process has been initiated on my Mac M1 (apple silicon).
A big thank you ( ̄︶ ̄)

@fork4jm
Copy link

fork4jm commented May 10, 2023

./webui.sh --no-half, it works

@Moltes74
Copy link

Thanks a lot !! @fork4jm

@ShujiMishima
Copy link

./webui.sh --no-half, it works

It works for me too. Thank you!

@XFLOWofficial
Copy link

./webui.sh --no-half, it works

thanks a lot. question for re launch 1111 do I have to proceed all the steps in the command every time I relaunch 1111?

@ianscrivener
Copy link

A1111 is a again crashing on MacOS M2.. relating to memory/RAM. 😞

@alex-the-programmer
Copy link

adding --no-half fixed the issue for me. Side topic: There's no a PyTorch optimized for Apple Silicon GPU. Is there a way to use that version on mac?

@ianscrivener
Copy link

ianscrivener commented Jun 28, 2023

@alex-the-programmer
There IS a PyTorch optimized for Apple Silicon GPU!

See https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md. That is the setup I use for Python GPU/NPU work - a conda version for MacOS that supports Metal GPU + the latest nightly Pytorch build.

By default, A1111 manages the Python environment itself using venv... though I have a bash script that I periodically run to get the latest nightly Pytorch. The A1111 venv setup DOES enable MacOS GPUs.

Though the A1111 way of updating with MacOS Pytorch nightly would be to edit webui-user.sh with this line:
export TORCH_COMMAND="pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu --upgrade"

You can see the GPUs being used when you run A1111 - I use https://github.com/tlkh/asitop

@ianscrivener
Copy link

ianscrivener commented Jun 28, 2023

That said - the new A1111 version v1.4.0 is completely crashing out Python 😞

@taddyb
Copy link

taddyb commented Jun 30, 2023

Not sure if this is related since the IDE is different and I'm not using the same code, but I've been getting the same error with my MacBook air M1 8GB RAM when developing a CPU Pytorch application on my local machine. To be specific, the IDE was pycharm and the error was:

/Users/taddbindas/miniconda3/envs/lgar/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

To fix this problem, I went to my debugger settings and made sure Collect run-time types of information for code insight was enabled. This fixed the issue.

Here is a screen shot of the panel:

image

@shanginn shanginn mentioned this issue Jul 16, 2023
4 tasks
@catboxanon
Copy link
Collaborator

Closing as stale.

@nafets33
Copy link

nafets33 commented Sep 7, 2023

sorry for silly question but how I do invoke the command? ./webui.sh --share --skip-torch-cuda-test --no-half
?

@alexshk
Copy link

alexshk commented Sep 7, 2023

You can put those in ./webui-macos-env.sh

@ShujiMishima
Copy link

./webui.sh --no-half, it works

thanks a lot. question for re launch 1111 do I have to proceed all the steps in the command every time I relaunch 1111?

1111 ver.1.6 has an automatic Web UI launch function.

nne998 pushed a commit to fjteam/stable-diffusion-webui that referenced this issue Sep 26, 2023
* Port DW Pose preprocessor (#1856)

* ➕ Add dependencies

* 🚧 wip

* 🚧 wip

* 🚧 download models

* 🚧 Minor fixes

* 🔧 update gitignore

* 🐛 Fix normalization issue

* 🚧 load DW model only when DW preprocessor is selected

* ✅ Change test config

* 🎨 nits

* 🐛 Fix A1111 safe torch issue

📝 v1.1.235 (AUTOMATIC1111#1859)

Revert "Port DW Pose preprocessor (#1856)" (AUTOMATIC1111#1860)

This reverts commit 0d3310f.

Reland "Port DW Pose preprocessor" (AUTOMATIC1111#1861)

* Revert "Revert "Port DW Pose preprocessor (#1856)" (AUTOMATIC1111#1860)"

This reverts commit 17e100e.

* 🐛 Fix install.py

📝 v1.1.236 (AUTOMATIC1111#1862)

:bug: Delay import of mmpose (AUTOMATIC1111#1866)

:memo: v1.1.237 (AUTOMATIC1111#1868)

:bug: Fix all keypoints invalid issue (AUTOMATIC1111#1871)

:bug: lazy import

* :construction: update test expectation

* :construction: Switch to onnx

* :construction: solve onnx package issue

* :wrench: Check cuda in more efficient way

* :art: Format code
nne998 pushed a commit to fjteam/stable-diffusion-webui that referenced this issue Sep 26, 2023
@lemonsz15
Copy link

./webui-macos-env.sh

no such file

@lemonsz15
Copy link

sorry for silly question but how I do invoke the command? ./webui.sh --share --skip-torch-cuda-test --no-half ?

have you found out any solution for this ? In facing the same issue.

@codefatherdev
Copy link

@alex-the-programmer There IS a PyTorch optimized for Apple Silicon GPU!

See https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md. That is the setup I use for Python GPU/NPU work - a conda version for MacOS that supports Metal GPU + the latest nightly Pytorch build.

By default, A1111 manages the Python environment itself using venv... though I have a bash script that I periodically run to get the latest nightly Pytorch. The A1111 venv setup DOES enable MacOS GPUs.

Though the A1111 way of updating with MacOS Pytorch nightly would be to edit webui-user.sh with this line: export TORCH_COMMAND="pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu --upgrade"

You can see the GPUs being used when you run A1111 - I use https://github.com/tlkh/asitop

works for me, thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Report of a bug, yet to be confirmed
Projects
None yet
Development

No branches or pull requests