You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using diffusers.utils.torch_utils.randn_tensor to create noise will create different random noise depending on the tensor dtype even if a generator with a manual seed is passed, meaning seeds are not reproducible between dtypes.
For example, the prepare_latents call in the auraflow pipeline passes the dtype directly to the randn_tensor function. Running this model in BF16, FP16 and FP32 will result in completely different images.
Unsure if this is intended behavior. Generating the seed in FP32 then converting to the target dtype would mean better portability between environments. Alternatively, letting the user specify the behavior by manually passing a dtype themselves would also fix this.
Reproduction
Minimal example that creates a tensor that is equivalent to a 1024x1024 image in SDXL latent space. Interestingly enough the output of this function does match for very small tensors.
importtorchimportdiffusersfromdiffusers.utils.torch_utilsimportrandn_tensortest_shape= (1, 4, 128, 128)
test_dtypes= [torch.float32, torch.float16, torch.bfloat16]
fordtypeintest_dtypes:
generator=torch.Generator(device="cpu").manual_seed(22)
noise=diffusers.utils.torch_utils.randn_tensor(test_shape, generator=generator, device=torch.device("cpu"), dtype=dtype)
print(f"Random {dtype} tensor: {noise.flatten()[:8]}") # print first 8 elements for testing
For image model test, the default auraflow pipeline was used, with the model loaded as a gguf file. Changing the dtype in the pipeline arg + the two args for the transformer causes the output to be different. Editing the pipeline to have the noise generation always happen in FP32 makes the outputs match.
Logs
Output of the above example, consistent between multiple systems:
python noise_test.py
Random torch.float32 tensor: tensor([ 0.3920, 0.0734, -0.0045, -0.0535, -0.0589, 0.6002, 2.0421, 1.3273])
Random torch.float16 tensor: tensor([ 0.2698, -0.3406, 0.1014, 0.0960, -0.6147, 0.6489, 0.0311, -0.5171],
dtype=torch.float16)
Random torch.bfloat16 tensor: tensor([-0.5195, -1.1797, 0.4219, 1.2891, 0.6602, -0.9141, 2.3906, 2.2188],
dtype=torch.bfloat16)
System Info
Tested on latest diffusers installed from git on windows with torch 2.1.1+cu121.
Verified on diffusers 0.32.2 with torch 2.7.0.dev20250302+cu126
Just to add my 2 cents, I don't think it's an issue, though.
Generating the seed in FP32 then converting to the target dtype would mean better portability between environments.
Perhaps, it's better understood with the context in which the prepare_latents() method is used. We directly initialize the tensor on the requested device and dtype when latents aren't supplied through the call method of a pipeline.
And then when a user wants to specify latents they have full control in how that should be generated.
Editing the noise generation to always happen in FP32 might be a breaking change in terms of quality assertions. So, I am not very sure about that.
I do agree that changing the default behavior would be a breaking change, and probably doesn't really make sense unless there are other reasons to do so.
I was originally trying to get matching outputs with ComfyUI (which always uses FP32 CPU noise). I think what was mostly tripping me up was that that the function hint for the randn_tensor has the line of "If CPU generators are passed, the tensor is always created on the CPU." (which is also hinted to on the reproducible pipelines page with a similar line: "To avoid this issue, Diffusers has a randn_tensor() function for creating random noise on the CPU, and then moving the tensor to a GPU if necessary.").
The process also being affected by the model's dtype when passing a CPU generator doesn't seem immediately obvious.
(Maybe forcing FP32 if the user manually passes a CPU generator would make sense? In any case, thank you for the advice of manually passing the noisy latents in, that should help with testing in the future lol)
Thanks for the further comments. I agree there seems to be some inconsistencies in developer expectations. Perhaps clarifying that the dtype is directly used in the torch.randn() function would make it exactly the same as what we're doing?
Describe the bug
Using
diffusers.utils.torch_utils.randn_tensor
to create noise will create different random noise depending on the tensor dtype even if a generator with a manual seed is passed, meaning seeds are not reproducible between dtypes.For example, the prepare_latents call in the auraflow pipeline passes the dtype directly to the randn_tensor function. Running this model in BF16, FP16 and FP32 will result in completely different images.
Unsure if this is intended behavior. Generating the seed in FP32 then converting to the target dtype would mean better portability between environments. Alternatively, letting the user specify the behavior by manually passing a dtype themselves would also fix this.
Reproduction
Minimal example that creates a tensor that is equivalent to a 1024x1024 image in SDXL latent space. Interestingly enough the output of this function does match for very small tensors.
For image model test, the default auraflow pipeline was used, with the model loaded as a gguf file. Changing the dtype in the pipeline arg + the two args for the transformer causes the output to be different. Editing the pipeline to have the noise generation always happen in FP32 makes the outputs match.
Logs
System Info
Tested on latest diffusers installed from git on windows with torch
2.1.1+cu121
.Verified on diffusers
0.32.2
with torch2.7.0.dev20250302+cu126
Who can help?
@yiyixuxu @sayakpaul
The text was updated successfully, but these errors were encountered: