Output of randn_tensor changes based on dtype, making seeds non-reproducible if the model precision is changed #11056

city96 · 2025-03-14T02:37:41Z

Describe the bug

Using diffusers.utils.torch_utils.randn_tensor to create noise will create different random noise depending on the tensor dtype even if a generator with a manual seed is passed, meaning seeds are not reproducible between dtypes.

For example, the prepare_latents call in the auraflow pipeline passes the dtype directly to the randn_tensor function. Running this model in BF16, FP16 and FP32 will result in completely different images.

Unsure if this is intended behavior. Generating the seed in FP32 then converting to the target dtype would mean better portability between environments. Alternatively, letting the user specify the behavior by manually passing a dtype themselves would also fix this.

Reproduction

Minimal example that creates a tensor that is equivalent to a 1024x1024 image in SDXL latent space. Interestingly enough the output of this function does match for very small tensors.

import torch
import diffusers
from diffusers.utils.torch_utils import randn_tensor

test_shape = (1, 4, 128, 128)
test_dtypes = [torch.float32, torch.float16, torch.bfloat16]

for dtype in test_dtypes:
    generator = torch.Generator(device="cpu").manual_seed(22)
    noise = diffusers.utils.torch_utils.randn_tensor(test_shape, generator=generator, device=torch.device("cpu"), dtype=dtype)
    print(f"Random {dtype} tensor: {noise.flatten()[:8]}") # print first 8 elements for testing

For image model test, the default auraflow pipeline was used, with the model loaded as a gguf file. Changing the dtype in the pipeline arg + the two args for the transformer causes the output to be different. Editing the pipeline to have the noise generation always happen in FP32 makes the outputs match.

Logs

Output of the above example, consistent between multiple systems:


python noise_test.py
Random torch.float32 tensor: tensor([ 0.3920,  0.0734, -0.0045, -0.0535, -0.0589,  0.6002,  2.0421,  1.3273])
Random torch.float16 tensor: tensor([ 0.2698, -0.3406,  0.1014,  0.0960, -0.6147,  0.6489,  0.0311, -0.5171],
       dtype=torch.float16)
Random torch.bfloat16 tensor: tensor([-0.5195, -1.1797,  0.4219,  1.2891,  0.6602, -0.9141,  2.3906,  2.2188],
       dtype=torch.bfloat16)

System Info

Tested on latest diffusers installed from git on windows with torch 2.1.1+cu121.
Verified on diffusers 0.32.2 with torch 2.7.0.dev20250302+cu126

Who can help?

@yiyixuxu @sayakpaul

The text was updated successfully, but these errors were encountered:

sayakpaul · 2025-03-14T03:33:26Z

Thanks for sharing this detailed observation.

Just to add my 2 cents, I don't think it's an issue, though.

Generating the seed in FP32 then converting to the target dtype would mean better portability between environments.

Perhaps, it's better understood with the context in which the prepare_latents() method is used. We directly initialize the tensor on the requested device and dtype when latents aren't supplied through the call method of a pipeline.

And then when a user wants to specify latents they have full control in how that should be generated.

Editing the noise generation to always happen in FP32 might be a breaking change in terms of quality assertions. So, I am not very sure about that.

I will let @yiyixuxu further comment.

city96 · 2025-03-14T07:05:44Z

I do agree that changing the default behavior would be a breaking change, and probably doesn't really make sense unless there are other reasons to do so.

I was originally trying to get matching outputs with ComfyUI (which always uses FP32 CPU noise). I think what was mostly tripping me up was that that the function hint for the randn_tensor has the line of "If CPU generators are passed, the tensor is always created on the CPU." (which is also hinted to on the reproducible pipelines page with a similar line: "To avoid this issue, Diffusers has a randn_tensor() function for creating random noise on the CPU, and then moving the tensor to a GPU if necessary.").

The process also being affected by the model's dtype when passing a CPU generator doesn't seem immediately obvious.

(Maybe forcing FP32 if the user manually passes a CPU generator would make sense? In any case, thank you for the advice of manually passing the noisy latents in, that should help with testing in the future lol)

sayakpaul · 2025-03-14T07:53:52Z

Thanks for the further comments. I agree there seems to be some inconsistencies in developer expectations. Perhaps clarifying that the dtype is directly used in the torch.randn() function would make it exactly the same as what we're doing?

city96 added the bug label Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output of randn_tensor changes based on dtype, making seeds non-reproducible if the model precision is changed #11056

Output of randn_tensor changes based on dtype, making seeds non-reproducible if the model precision is changed #11056

city96 commented Mar 14, 2025

sayakpaul commented Mar 14, 2025

city96 commented Mar 14, 2025

sayakpaul commented Mar 14, 2025

Output of randn_tensor changes based on dtype, making seeds non-reproducible if the model precision is changed #11056

Output of randn_tensor changes based on dtype, making seeds non-reproducible if the model precision is changed #11056

Comments

city96 commented Mar 14, 2025

Describe the bug

Reproduction

Logs

System Info

Who can help?

sayakpaul commented Mar 14, 2025

city96 commented Mar 14, 2025

sayakpaul commented Mar 14, 2025