Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize the sampling of DPM-Solver++2M by a stabilizing trick #43

Open
LuChengTHU opened this issue Nov 6, 2022 · 22 comments
Open

Stabilize the sampling of DPM-Solver++2M by a stabilizing trick #43

LuChengTHU opened this issue Nov 6, 2022 · 22 comments

Comments

@LuChengTHU
Copy link

LuChengTHU commented Nov 6, 2022

Hi Katherine,

Thank you for your great work on supporting DPM-Solver++, and I've found that it has been used in stable-diffusion-webui and has a great performance: AUTOMATIC1111/stable-diffusion-webui#4304. Thank you for your contribution again!

However, the sampling by DPM-Solver++2M with steps <= 10 often suffers from instability issues (the image quality is much worse than DDIM). In my recent experience, I found that it is due to the non-Lipschitzness near t=0.
(In fact, the score function has numerical issues for t near 0, and it has been revealed in many previous papers, such as CLD and SoftTruncation. )

Therefore, in my recent PR to diffusers, I further added a new "stabilizing" trick to reduce such instability by using lower-order solvers at the final steps (e.g., for 2nd-order DPM-Solver++, I used DPM-Solver++2M at the first N-1 steps and DDIM at the final step.) I find it can greatly stabilize the sampling by DPM-Solver++2M. Please check this PR for details:
huggingface/diffusers#1132

Excuse me for my frequent issues, but could you please further support this "stabilizing" trick in k-diffusion, so that other projects, such as stable-diffusion-webui can further support it? Thank you very much!

@LuChengTHU
Copy link
Author

@LuChengTHU LuChengTHU changed the title Stabilize the sampling of DPM-Solver++2M by a denoising trick Stabilize the sampling of DPM-Solver++2M by a stabilizing trick Nov 6, 2022
@crowsonkb
Copy link
Owner

What do Stable Diffusion samples look like with and without this trick? I tried it and it seemed to produce really blurry samples instead at say NFE=5... have you considered restricting the range of r instead, so that you aren't combining denoised images with Adams-Bashforth coefficients more extreme than 2, -1? The problem (if we're talking about the same one, which is why I requested sample images) seems mostly to occur with the default Stable Diffusion schedule and not the Karras schedule which has less extreme second derivatives in log sigma space.

@LuChengTHU
Copy link
Author

I also did not use the default schedule. I just used torch.linspace(0, 999, steps + 1). Here are examples for DPM-Solver++2M:

5 step without trick:
5-step-without_trick

5 step with trick:
5-trick

@LuChengTHU
Copy link
Author

What do Stable Diffusion samples look like with and without this trick? I tried it and it seemed to produce really blurry samples instead at say NFE=5... have you considered restricting the range of r instead, so that you aren't combining denoised images with Adams-Bashforth coefficients more extreme than 2, -1? The problem (if we're talking about the same one, which is why I requested sample images) seems mostly to occur with the default Stable Diffusion schedule and not the Karras schedule which has less extreme second derivatives in log sigma space.

Interesting suggestion. I will try to tune r further.

@LuChengTHU
Copy link
Author

After trying, I still don't know how to improve it further... and I still think this may be due to the non-Lipschitzness for t near to 0, and second-order solvers are more unstable than first-order solvers in such cases.

Do you also observe such an unstable problem? (i.e. unstable pixels)

@crowsonkb
Copy link
Owner

Oh it mostly seems to happen at higher guidance scales than I typically use... that's why I hadn't been seeing it.

@hentailord85ez
Copy link

hentailord85ez commented Nov 6, 2022

Hi, I wonder if this is related but using this implementation I've found that with a low number of steps (say < 10), it is better to use the sigmas of a higher step count (1 or 2 more), while still stopping after S steps.

I hacked some code to break out of the sampling loop at a specified index, and tried changing some parameters when using with Stable Diffusion.

DPM Solver++ (2M) (Karras scheduling):

Woman eating apple

7 guidance scale

S = 6
6

S = 7, killed after 6 steps
7-killed-6

S = 8, killed after 6 steps
8-killed-6

It got worse at S = 9 and killing at 6 steps.
9-killed-6

Euler also had some changes, but less drastic.

S = 6
6

S = 7, killed at 6
7

S = 8, killed at 6
8

@Birch-san
Copy link

Birch-san commented Nov 7, 2022

for what it's worth:
I settled on these params for 5-step DPM-Solver++ sampling via Karras…

rho=9
sigma_max=7.0796
sigma_min=0.1072
sigmas=[7.0796, 2.9408, 1.1084, 0.3709, 0.1072, 0.0000]
cfg_scale=7.5

https://twitter.com/Birchlabs/status/1589038086633455616

this is k-diffusion dpmpp_2m without the new stabilising trick, and without skipping the denoise-to-0.

here's the sigmas SD supports (0.0292 to 14.6146):
https://gist.github.com/Birch-san/6cd1574e51871a5e2b88d59f0f3d4fd3

@LuChengTHU
Copy link
Author

LuChengTHU commented Nov 7, 2022

Hi, I wonder if this is related but using this implementation I've found that with a low number of steps (say < 10), it is better to use the sigmas of a higher step count (1 or 2 more), while still stopping after S steps.

I hacked some code to break out of the sampling loop at a specified index, and tried changing some parameters when using with Stable Diffusion.

Euler also had some changes, but less drastic.

Interesting! What does "still stopping after S steps." means?

@LuChengTHU
Copy link
Author

for what it's worth: I settled on these params for 5-step DPM-Solver++ sampling via Karras…

rho=9
sigma_max=7.0796
sigma_min=0.1072
sigmas=[7.0796, 2.9408, 1.1084, 0.3709, 0.1072, 0.0000]
cfg_scale=7.5

https://twitter.com/Birchlabs/status/1589038086633455616

this is k-diffusion dpmpp_2m without the new stabilising trick, and without skipping the denoise-to-0.

here's the sigmas SD supports (0.0292 to 14.6146): https://gist.github.com/Birch-san/6cd1574e51871a5e2b88d59f0f3d4fd3

So amazing, I will go deeper into it

@crowsonkb
Copy link
Owner

Interesting! What does "still stopping after S steps." means?

I think it means skipping the last step, you can do it better by sigmas = torch.cat([sigmas[:-2], sigmas[-1:]]) so you step with DPM to the next to last step then follow it up with a longer denoising step to sigma=0.

@crowsonkb
Copy link
Owner

for what it's worth: I settled on these params for 5-step DPM-Solver++ sampling via Karras…

If you want to do one more function evaluation have you tried doing a DPM++ 2S step as the first step to warm up the linear multistep method, instead of an Euler step?

@Birch-san
Copy link

If you want to do one more function evaluation have you tried doing a DPM++ 2S step as the first step to warm up the linear multistep method, instead of an Euler step?

I haven't! I've only used sample_dpmpp_2m (DPM++ 2M) so far.
I think I don't understand how to do that. you're talking about mixing samplers?
do you mean "run the first sigma, 7.0796, through sample_dpmpp_2s_ancestral" then "run the remaining sigmas through sample_dpmpp_2m"?

@crowsonkb
Copy link
Owner

crowsonkb commented Nov 7, 2022

You have to paste the 2S step code into the 2M sampler because you need to save its denoised to init old_denoised in the 2M sampler... 2M does an Euler step as its first step and saves its denoised to old_denoised so it can begin doing second order steps.

@Birch-san
Copy link

Birch-san commented Nov 7, 2022

are these the right way around? trying to relate the suggestion to the code and the pieces aren't fitting for me.

2M does an Euler step as its first step

I see a comment on 2S that it uses Euler in the base case, but none of the code in 2M resembles a Euler step to my eye.

# Euler method

2M […] so it can begin doing second order steps.

again, I see evidence in 2S of second-order stepping (the x_2 variable and the comment describing "Ancestral sampling with DPM-Solver++(2S) second-order steps"). yet I don't see any code resembling that practice in the 2M sampler.

@crowsonkb
Copy link
Owner

It's this branch of the conditional:

x = (sigma_fn(t_next) / sigma_fn(t)) * x - (-h).expm1() * denoised
. It's a first-order DPM-Solver++ step, which as per the DPM-Solver++ paper is equivalent to an Euler (DDIM) step. It's done for the first step and for the step to 0. In 2S, the Euler step is only done for the last step. So you need to do a 2S step for the first step but still do the first-order step for the last step.

Birch-san added a commit to Birch-san/k-diffusion that referenced this issue Nov 10, 2022
@Birch-san
Copy link

hmm like this?

06d31fd
image

here's what I get when I decode the latents from each step. second column is the one that'll be impacted by the change.

original sample_dpmpp_2m:

modified sample_dpmpp_2m (use a 2S step for first step):

hm… I guess with this modification: the second column has a few more features defined than we had in the original algorithm. is that indication of success?

@Birch-san
Copy link

Birch-san commented Nov 10, 2022

I tried lifting sigma_max from 7.0796 to the full 14.6146, making the schedule:
14.6146, 5.3742, 1.7435, 0.4814, 0.1072, 0.0000
(sigma_min is still 0.1072, rho is still 9)

original sample_dpmpp_2m:

modified sample_dpmpp_2m (use a 2S step for first step):

the modified algorithm seems to give a slightly less derpy result (it has a noise, eyebrows, stylized mouth, extra chest shading) — coping better with the extreme sparsity of sigmas to sample from.

when we're doing low-step sampling: every step counts. so even though it costs another model call: I guess this lets us get a better result out of that first sigma?
perhaps this provides a path to getting a good-looking 4-step sample (albeit with 5 model calls).

@crowsonkb
Copy link
Owner

hmm like this?

That's what I meant to do, yes :)

Birch-san added a commit to Birch-san/k-diffusion that referenced this issue Nov 29, 2022
mcmonkey4eva added a commit to mcmonkey4eva/stable-diffusion-webui that referenced this issue Dec 16, 2022
DPM2 a and DPM2 a Karras samplers are both affected by an issue described by AUTOMATIC1111#3483 and can be resolved by a workaround suggested by the k-diffusion author at crowsonkb/k-diffusion#43 (comment)
@mcmonkey4eva
Copy link

mcmonkey4eva commented Dec 16, 2022

The issue described by @hentailord85ez I can replicate consistently on sample_dpm_2_ancestral - second-to-last step has significant better quality than last.


As you can see, step 9/10 is clear, and step 10/10 is corrupted.

and the suggested fix:

@crowsonkb

I think it means skipping the last step, you can do it better by sigmas = torch.cat([sigmas[:-2], sigmas[-1:]]) so you step with DPM to the next to last step then follow it up with a longer denoising step to sigma=0.

Reliably improves quality of outputs, by which I mean avoids the corruption step.

I'm not seeing the same issue for sample_dpmpp_2m or other samplers.

EDIT: It affects sample_dpm_2 as well, just much more subtly.

Could this fix (or an equivalent alternative) be implemented directly into k-diffusion, or does this need to be handled by the caller?

@crowsonkb
Copy link
Owner

@mcmonkey4eva I think I could include a modified get_sigmas_karras() that it avoided the second to last sigma... maybe generating a ramp that was one step longer then manually chopping off the second to last step.

Oncorporation pushed a commit to Oncorporation/stable-diffusion-webui that referenced this issue Jan 2, 2023
DPM2 a and DPM2 a Karras samplers are both affected by an issue described by AUTOMATIC1111#3483 and can be resolved by a workaround suggested by the k-diffusion author at crowsonkb/k-diffusion#43 (comment)
Birch-san added a commit to Birch-san/k-diffusion that referenced this issue May 14, 2023
@LuChengTHU
Copy link
Author

Hi @crowsonkb , I hope you are doing well :)

Recently the community finds that when using DPM++2M or DPM++2M SDE for SDXL with steps < 50 (e.g., steps = 25), the generated samples will always have apparent artifacts, especially for SDE solvers such as DPM++2M SDE.

For example, a cat with DPM++2M SDE and steps=25:

image

And there are more examples here: huggingface/diffusers#5433 , for both Diffusers library (my implementation) and stable-diffusion-webui (with your implemented k-diffusioin).

After carefully checking the reason, I find that it is also because of the instability for $t$ near to $0$, and we need to change the final step from 2nd-order solver to 1st-order solver. By applying such a simple trick, the sample quality with DPM++2M SDE will become much better:

image

Because SDXL is a powerful and promising model and the SDE solver is widely used, could you please also support this simple trick for k-diffusion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants