Validation images not matching ComfyUI #409

komninoschatzipapas · 2024-05-17T17:02:06Z

I'm curious as to why the validation images output are not matching the ones I'm getting from ComfyUI when using the same parameters.

And then running train_sdxl.py with the following arguments and having added the line I mentioned on the other issue to perform validation on step 0:

  --validation_seed=42 \
  --validation_guidance=5 \
  --validation_guidance_rescale=0.0 \
  --validation_num_inference_steps=35 \
  --validation_negative_prompt="" \
  --validation_resolution=832x1216 \
  --validation_noise_scheduler="ddim"

Yields:

We perform inference using ComfyUI on our production service so ideally I would be able to have those two be in sync but I have not found a way to do this yet.

The text was updated successfully, but these errors were encountered:

bghira · 2024-05-17T17:21:02Z

i've never used the validations pipeline that way, so i'm not sure that it would work at all. you can run the inference scripts in the toolkit to try directly via diffusers.

bghira · 2024-05-17T17:21:46Z

you could also try setting a negative prompt on both tools.

komninoschatzipapas · 2024-05-17T18:30:36Z

I used DDPM since there wasn't a DDIM script:

# Use Pytorch 2!
import torch
from diffusers import (
    StableDiffusionPipeline,
    DiffusionPipeline,
    AutoencoderKL,
    UNet2DConditionModel,
    DDPMScheduler,
)
from transformers import CLIPTextModel

# Any model currently on Huggingface Hub.
# model_id = 'junglerally/digital-diffusion'
# model_id = 'ptx0/realism-engine'
# model_id = 'ptx0/artius_v21'
# model_id = 'ptx0/pseudo-journey'
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipeline = DiffusionPipeline.from_pretrained(model_id)

# Optimize!
pipeline.unet = torch.compile(pipeline.unet)
scheduler = DDPMScheduler.from_pretrained(model_id, subfolder="scheduler")

# Remove this if you get an error.
torch.set_float32_matmul_precision("high")

pipeline.to("cuda")
prompts = {
    "car": "car",
}
for shortname, prompt in prompts.items():
    # old prompt: ''
    image = pipeline(
        prompt=prompt,
        negative_prompt="",
        num_inference_steps=35,
        generator=torch.Generator(device="cuda").manual_seed(42),
        width=832,
        height=1216,
        guidance_scale=5,
    ).images[0]
    image.save(f"./{shortname}_nobetas.png", format="PNG")

Yields:

ComfyUI:

I'm not sure what I'm missing but its crazy how different these outputs are with seemingly the same model & parameters. My concern is that we're using ComfyUI for inference but judging our models based on the validation images generated by diffusers which don't seem to be generated in the same manner.

I'm going to close this issue as it's related to diffusers & ComfyUI but if you're able to give some input on this situation it would be greatly appreciated!

bghira · 2024-05-17T19:24:16Z

i haven't really played around with the base SDXL model in almost a year, it is possibly something in their upstream model config. i had to open a few issue reports so far for some of these problems, and a couple of them just remain perpetually unresolved

edit: i will try and reproduce this problem locally.

bghira · 2024-05-17T20:07:09Z

here's my result, though I had to modify the script slightly for MPS, replacing cuda -> mps, and removing the torch.compile() because MPS does not support it

bghira · 2024-05-17T20:07:59Z

# Use Pytorch 2!
import torch
from diffusers import (
    StableDiffusionPipeline,
    DiffusionPipeline,
    AutoencoderKL,
    UNet2DConditionModel,
    DDPMScheduler,
)
from transformers import CLIPTextModel

# Any model currently on Huggingface Hub.
# model_id = 'junglerally/digital-diffusion'
# model_id = 'ptx0/realism-engine'
# model_id = 'ptx0/artius_v21'
# model_id = 'ptx0/pseudo-journey'
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipeline = DiffusionPipeline.from_pretrained(model_id)

# Optimize!
# pipeline.unet = torch.compile(pipeline.unet)
scheduler = DDPMScheduler.from_pretrained(model_id, subfolder="scheduler")

# Remove this if you get an error.
torch.set_float32_matmul_precision("high")

pipeline.to(dtype=torch.float16, device="mps")
prompts = {
    "car": "car",
}
for shortname, prompt in prompts.items():
    # old prompt: ''
    image = pipeline(
        prompt=prompt,
        negative_prompt="",
        num_inference_steps=35,
        generator=torch.Generator(device="mps").manual_seed(42),
        width=832,
        height=1216,
        guidance_scale=5,
    ).images[0]
    image.save(f"./{shortname}_nobetas.png", format="PNG")

komninoschatzipapas · 2024-05-17T20:27:59Z

Check comfyanonymous/ComfyUI#3507

If you were to change torch.Generator(device="mps").manual_seed(42) to torch.Generator(device="cpu").manual_seed(42), you should get the image I got from my ComfyUI flow above.

Is there a possibility we could integrate this into the codebase for generating the validation images?

bghira · 2024-05-17T22:21:19Z

no, because generator on CPU means all Tensors have to be copied from CPU during generation time. it even tells you this when you try it.

bghira · 2024-05-17T22:26:01Z

using CPU seed on the mac doesn't seem to reproduce the original images you gave still, for what it's worth

komninoschatzipapas · 2024-05-18T14:39:27Z

I actually just tried it on my Studio M1 Ultra and I'm getting the same results as you.

So in order to be in sync with ComfyUI, you need to generate the seed on the CPU and be on a Linux system.

I'm relieved it's the seed that was different and not some other parameter that would be harder to adjust.

bghira · 2024-05-18T14:43:40Z

so basically no reason to switch since it's just a different seed being tested :-) you can probably sample using GPU seeds on ComfyUI as well.

komninoschatzipapas · 2024-05-20T13:34:40Z

i haven't really played around with the base SDXL model in almost a year, it is possibly something in their upstream model config. i had to open a few issue reports so far for some of these problems, and a couple of them just remain perpetually unresolved

edit: i will try and reproduce this problem locally.

I'm curious what you're fine-tuning off. I found checkpoints like juggernaut overcook pretty easily so I generally go off base.

bghira · 2024-05-20T13:42:27Z

my own model series trained on a different noise schedule with the same u-net layout as SDXL, and same text encoder, but Made by Ollins' fp16 VAE.

komninoschatzipapas closed this as completed May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation images not matching ComfyUI #409

Validation images not matching ComfyUI #409

komninoschatzipapas commented May 17, 2024

bghira commented May 17, 2024

bghira commented May 17, 2024

komninoschatzipapas commented May 17, 2024

bghira commented May 17, 2024 •

edited

Loading

bghira commented May 17, 2024

bghira commented May 17, 2024

komninoschatzipapas commented May 17, 2024

bghira commented May 17, 2024

bghira commented May 17, 2024

komninoschatzipapas commented May 18, 2024

bghira commented May 18, 2024

komninoschatzipapas commented May 20, 2024

bghira commented May 20, 2024

Validation images not matching ComfyUI #409

Validation images not matching ComfyUI #409

Comments

komninoschatzipapas commented May 17, 2024

bghira commented May 17, 2024

bghira commented May 17, 2024

komninoschatzipapas commented May 17, 2024

bghira commented May 17, 2024 • edited Loading

bghira commented May 17, 2024

bghira commented May 17, 2024

komninoschatzipapas commented May 17, 2024

bghira commented May 17, 2024

bghira commented May 17, 2024

komninoschatzipapas commented May 18, 2024

bghira commented May 18, 2024

komninoschatzipapas commented May 20, 2024

bghira commented May 20, 2024

bghira commented May 17, 2024 •

edited

Loading