Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Refiner switchover should be controlled by (fraction of) training timesteps and not by fraction of sampling #14970

Closed
1 task done
drhead opened this issue Feb 19, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@drhead
Copy link
Contributor

drhead commented Feb 19, 2024

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Currently, refiner switchover is controlled by a fraction of the generation process. So, if you generate for 50 steps, and have refiner switchover at 0.8 (the recommended value for refiners trained like the SDXL refiner), the main model will generate for 40 steps, and the refiner will load after that and complete the last 10. This works fine where you are using txt2img with the default sampling schedule (i.e. not Karras or Exponential).

However, this is not aligned with how the refiner is trained -- the refiner is trained on the last 200 timesteps which is not always aligned with what happens at 80% through the sampling process. There are a few situations where this setup will result in the refiner being called too early or too late:

  1. Using different sampling schedules, especially with Zero Terminal SNR rescaling, will cause the refiner to be called too early. For a 50 step Karras schedule, refiner switchover would need to happen at 0.88 to not call it too early.
  2. Using inpainting/img2img will in almost every case cause the refiner to be called too late. The correct switchover point will change whenever you change the denoising strength, and can be very tedious to manage.

The more reliable way to handle this would be to configure refiners for the highest timestep they were trained on and switch when we are about to process a timestep that falls below that, which is usually the last 200 timesteps. With this, the refiner model would never need to be tweaked for any change in configuration because it will only be called for timesteps it was trained on.

There is a potential corner case (which I have not tested in webui) with second-order samplers that Diffusers caught and fixed, described here. Fixes that should ensure this isn't a problem would be either a) deciding to switch to the refiner only when both timesteps called during the sampler step are below 200, or b) implementing the refiner as a model wrapper so that it is impossible to call the refiner model on timesteps that are out of range. Second solution would be a more faithful implementation for how ensemble of expert models should work (i.e. this would give correct results if you had a refiner model trained on timesteps below 200 and a main model trained on timesteps at or above 200, where other solutions might not work).

Proposed workflow

  1. Click "Refiner" checkbox and expand the box.
  2. "Switch at" slider is either changed purely under the hood or is changed to a scale of timesteps from 1000 to 1. Tooltip changed to something to the effect of: "fraction of model's trained timesteps when the switch to the refiner model should happen; for most dedicated refiner models this should be set to 0.8 and left alone."
  3. When generating, no matter what I do past that point, the refiner should never be called with a timestep greater than or equal to 200.

Additional information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants