Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using other SDXL turbo models to optimize the generation speed #45

Open
DmitryVN opened this issue Feb 28, 2024 · 13 comments
Open

Using other SDXL turbo models to optimize the generation speed #45

DmitryVN opened this issue Feb 28, 2024 · 13 comments

Comments

@DmitryVN
Copy link

Is it possible to implement support for Turbo SDXL models or Lightning SDXL or TensorRT?

@gallojorge
Copy link

if it is possible and simple just put in the file that is in the folder option SUPIR_v0.yaml edit the line at the end by the name of your model either turbo or lighting and you can do the test, to me it worked with 24 steps and the model juggernautXL_v9Rdphoto2Lightning.safetensors

@gallojorge
Copy link

Is it possible to implement support for Turbo SDXL models or Lightning SDXL or TensorRT?

if it is possible and simple just put in the file that is in the folder option SUPIR_v0.yaml edit the line at the end by the name of your model either turbo or lighting and you can do the test, to me it worked with 24 steps and the model juggernautXL_v9Rdphoto2Lightning.safetensors

@Gutianpei
Copy link

if it is possible and simple just put in the file that is in the folder option SUPIR_v0.yaml edit the line at the end by the name of your model either turbo or lighting and you can do the test, to me it worked with 24 steps and the model juggernautXL_v9Rdphoto2Lightning.safetensors

Have you tried to compare the generation between SDXL and the SDXL-lightning model in 24 steps? I can get decent upscaling resule with SDXL in 24 steps too, and I think the purpose of using SDXL-lightning should result in 1-5 steps with same quality as a 30 steps SDXL model.

@gallojorge
Copy link

if it is possible and simple just put in the file that is in the folder option SUPIR_v0.yaml edit the line at the end by the name of your model either turbo or lighting and you can do the test, to me it worked with 24 steps and the model juggernautXL_v9Rdphoto2Lightning.safetensors

Have you tried to compare the generation between SDXL and the SDXL-lightning model in 24 steps? I can get decent upscaling resule with SDXL in 24 steps too, and I think the purpose of using SDXL-lightning should result in 1-5 steps with same quality as a 30 steps SDXL model.

https://imgsli.com/MjQzNzI3 new modifications, I changed the code so that it can be done in 8 steps with the juggernautXL_v9Rdphoto2Lightning.safetensors model.

@gallojorge
Copy link

the calculation time with an acceptable quality is 24 seconds with a 3090 fe

@gallojorge
Copy link

https://imgsli.com/MjQzOTM5 in this example use the same model with 8 steps and uspcaling at a factor of 3x ,time 60 segundos

@gallojorge
Copy link

https://imgsli.com/MjQzOTQy factor to 4x,time 150 seg

@Fanghua-Yu
Copy link
Owner

I tried to replace SDXL UNet with Juggernaut_RunDiffusionPhoto2_Lightning_4Steps.safetensors.
With a diffusion step of 8 and CFG in 1.5-2, it shows an acceptable quality.
Currently, I am trying to switch sampler to DPM++ SDE Karras. (as suggested by Juggernaut-XL-Lightning)

@JarJarBeatyourattitude
Copy link

Were you able to make any progress with the DPM++ SDE Karras integration?

SDXL lightning seems like a fantastic option for SUPIR.

I noticed in the paper that the sampler was modified from EDM to be better at restoration,so I'm guessing it's not as simple as just swapping the existing sampler with default DPM++ SDE Karras.

@JarJarBeatyourattitude
Copy link

I was also thinking that supporting fp8 for SDXL could help with speed by lowering VRAM consumption, since tiling slows down the process a lot.

For example, right now on a 3090 the highest resolution I can go without tiling is around 3 million pixels. Supporting fp8 would allow me to get to 6 million pixels without tiling and 12 million pixels with just two tiles (assuming correct aspect ratio) rather than 4.

I'm not sure how difficult to implement this is though.

Here's the pull request for fp8 implementation in Auto1111: AUTOMATIC1111/stable-diffusion-webui#14031

@JarJarBeatyourattitude
Copy link

Although, I guess to really lower VRAM consumption that much you'd need fp8 support for both SDXL and the SUPIR model itself.

@isidentical
Copy link

Inference with SUPIR seems super slow, any reason not to use something like diffuser's UNET implementation as the backend so people can leverage the existing tooling around it to optimize?

@stalestar
Copy link

https://imgsli.com/MjQzOTQy factor to 4x,time 150 seg

if only run 8 steps, the picture have some artifacts with mosaic noise and colour noise, that like forward Insufficient.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants