Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output of randn_tensor changes based on dtype, making seeds non-reproducible if the model precision is changed #11056

Open
city96 opened this issue Mar 14, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@city96
Copy link

city96 commented Mar 14, 2025

Describe the bug

Using diffusers.utils.torch_utils.randn_tensor to create noise will create different random noise depending on the tensor dtype even if a generator with a manual seed is passed, meaning seeds are not reproducible between dtypes.

For example, the prepare_latents call in the auraflow pipeline passes the dtype directly to the randn_tensor function. Running this model in BF16, FP16 and FP32 will result in completely different images.

Unsure if this is intended behavior. Generating the seed in FP32 then converting to the target dtype would mean better portability between environments. Alternatively, letting the user specify the behavior by manually passing a dtype themselves would also fix this.

Reproduction

Minimal example that creates a tensor that is equivalent to a 1024x1024 image in SDXL latent space. Interestingly enough the output of this function does match for very small tensors.

import torch
import diffusers
from diffusers.utils.torch_utils import randn_tensor

test_shape = (1, 4, 128, 128)
test_dtypes = [torch.float32, torch.float16, torch.bfloat16]

for dtype in test_dtypes:
    generator = torch.Generator(device="cpu").manual_seed(22)
    noise = diffusers.utils.torch_utils.randn_tensor(test_shape, generator=generator, device=torch.device("cpu"), dtype=dtype)
    print(f"Random {dtype} tensor: {noise.flatten()[:8]}") # print first 8 elements for testing

For image model test, the default auraflow pipeline was used, with the model loaded as a gguf file. Changing the dtype in the pipeline arg + the two args for the transformer causes the output to be different. Editing the pipeline to have the noise generation always happen in FP32 makes the outputs match.

Logs

Output of the above example, consistent between multiple systems:


python noise_test.py
Random torch.float32 tensor: tensor([ 0.3920,  0.0734, -0.0045, -0.0535, -0.0589,  0.6002,  2.0421,  1.3273])
Random torch.float16 tensor: tensor([ 0.2698, -0.3406,  0.1014,  0.0960, -0.6147,  0.6489,  0.0311, -0.5171],
       dtype=torch.float16)
Random torch.bfloat16 tensor: tensor([-0.5195, -1.1797,  0.4219,  1.2891,  0.6602, -0.9141,  2.3906,  2.2188],
       dtype=torch.bfloat16)

System Info

Tested on latest diffusers installed from git on windows with torch 2.1.1+cu121.
Verified on diffusers 0.32.2 with torch 2.7.0.dev20250302+cu126

Who can help?

@yiyixuxu @sayakpaul

@city96 city96 added the bug Something isn't working label Mar 14, 2025
@sayakpaul
Copy link
Member

Thanks for sharing this detailed observation.

Just to add my 2 cents, I don't think it's an issue, though.

Generating the seed in FP32 then converting to the target dtype would mean better portability between environments.

Perhaps, it's better understood with the context in which the prepare_latents() method is used. We directly initialize the tensor on the requested device and dtype when latents aren't supplied through the call method of a pipeline.

And then when a user wants to specify latents they have full control in how that should be generated.

Editing the noise generation to always happen in FP32 might be a breaking change in terms of quality assertions. So, I am not very sure about that.

I will let @yiyixuxu further comment.

@city96
Copy link
Author

city96 commented Mar 14, 2025

I do agree that changing the default behavior would be a breaking change, and probably doesn't really make sense unless there are other reasons to do so.

I was originally trying to get matching outputs with ComfyUI (which always uses FP32 CPU noise). I think what was mostly tripping me up was that that the function hint for the randn_tensor has the line of "If CPU generators are passed, the tensor is always created on the CPU." (which is also hinted to on the reproducible pipelines page with a similar line: "To avoid this issue, Diffusers has a randn_tensor() function for creating random noise on the CPU, and then moving the tensor to a GPU if necessary.").

The process also being affected by the model's dtype when passing a CPU generator doesn't seem immediately obvious.

(Maybe forcing FP32 if the user manually passes a CPU generator would make sense? In any case, thank you for the advice of manually passing the noisy latents in, that should help with testing in the future lol)

@sayakpaul
Copy link
Member

Thanks for the further comments. I agree there seems to be some inconsistencies in developer expectations. Perhaps clarifying that the dtype is directly used in the torch.randn() function would make it exactly the same as what we're doing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants