StableDiffusion3Pipeline RuntimeError: c10::Half != c10::BFloat16 #11117

DataSailor · 2025-03-19T09:58:48Z

Describe the bug

I tried to use Stable Diffusion 3.5 medium to generate images and copied their official script.

When I tried to run it, I got an error:
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16.

When I changed torch_dtype=torch.bfloat16 in the script to torch_dtype=torch.float16, the script ran correctly.

This is very strange, because there are no torch.Tensor inputs in the script.

Reproduction

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-medium", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
    "A capybara holding a sign that reads Hello World",
    num_inference_steps=40,
    guidance_scale=4.5,
).images[0]
image.save("capybara.png")

Logs

Traceback (most recent call last):
  File "/path/to/project/try.py", line 7, in <module>
    image = pipe(
            ^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 969, in __call__
    ) = self.encode_prompt(
        ^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 437, in encode_prompt
    prompt_embed, pooled_prompt_embed = self._get_clip_prompt_embeds(
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 325, in _get_clip_prompt_embeds
    prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 1490, in forward
    text_embeds = self.text_projection(pooled_output)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 117, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16

System Info

- 🤗 Diffusers version: 0.32.2
- Platform: Linux-6.11.0-19-generic-x86_64-with-glibc2.39
- Running on Google Colab?: No
- Python version: 3.11.11
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.29.3
- Transformers version: 4.49.0
- Accelerate version: 1.5.2
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.5.3
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 3090, 24576 MiB
NVIDIA GeForce RTX 3090, 24576 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Who can help?

No response

The text was updated successfully, but these errors were encountered:

MrTom34 · 2025-03-19T11:08:18Z

i guess the clip text encoder's dtype of StableDiffusion3Pipeline is set to torch.float16 in default.you can try to check the conponent of pipeline to find why

ishan-modi · 2025-03-19T13:25:32Z

Please track this PR, once it is available in latest release you should be good to go. If it is urgent you can grab latest changes by pip install on the main branch

DataSailor added the bug label Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StableDiffusion3Pipeline RuntimeError: c10::Half != c10::BFloat16 #11117

StableDiffusion3Pipeline RuntimeError: c10::Half != c10::BFloat16 #11117

DataSailor commented Mar 19, 2025

MrTom34 commented Mar 19, 2025

ishan-modi commented Mar 19, 2025

StableDiffusion3Pipeline RuntimeError: c10::Half != c10::BFloat16 #11117

StableDiffusion3Pipeline RuntimeError: c10::Half != c10::BFloat16 #11117

Comments

DataSailor commented Mar 19, 2025

Describe the bug

Reproduction

Logs

System Info

Who can help?

MrTom34 commented Mar 19, 2025

ishan-modi commented Mar 19, 2025