Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does it support LCM Lora? The generated images are very poor #65

Closed
ldtgodlike opened this issue Dec 6, 2023 · 3 comments
Closed

Does it support LCM Lora? The generated images are very poor #65

ldtgodlike opened this issue Dec 6, 2023 · 3 comments

Comments

@ldtgodlike
Copy link

ldtgodlike commented Dec 6, 2023

I use stable-fast ==0.0.13.post3 to test a lcm lora, the result like this
output_lcm
but to use lcm lora in pure diffsuers is ok
output2

111

my code like this:

import torch
from diffusers import LCMScheduler, AutoPipelineForText2Image, DiffusionPipeline
from sfast.compilers.stable_diffusion_pipeline_compiler import (
    compile, CompilationConfig)
import numpy as np
from PIL import Image

base_model_path = "runwayml/stable-diffusion-v1-5"
lcm_path = "latent-consistency/lcm-lora-sdv1-5"


def load_model():
    model = DiffusionPipeline.from_pretrained(base_model_path,
                                              torch_dtype=torch.float16,
                                              safety_checker=None,
                                              use_safetensors=True)

    model.scheduler = LCMScheduler.from_config(model.scheduler.config)
    model.safety_checker = None
    model.to(torch.device('cuda'))
    #model.unet.load_attn_procs(lcm_path)
    model.load_lora_weights(lcm_path)
    model.fuse_lora()
    return model


def compile_model(model):
    config = CompilationConfig.Default()

    # xformers and Triton are suggested for achieving best performance.
    # It might be slow for Triton to generate, compile and fine-tune kernels.
    try:
        import xformers
        config.enable_xformers = True
    except ImportError:
        print('xformers not installed, skip')
    # NOTE:
    # When GPU VRAM is insufficient or the architecture is too old, Triton might be slow.
    # Disable Triton if you encounter this problem.
    try:
        import triton
        config.enable_triton = True
    except ImportError:
        print('Triton not installed, skip')
    # NOTE:
    # CUDA Graph is suggested for small batch sizes and small resolutions to reduce CPU overhead.
    # My implementation can handle dynamic shape with increased need for GPU memory.
    # But when your GPU VRAM is insufficient or the image resolution is high,
    # CUDA Graph could cause less efficient VRAM utilization and slow down the inference,
    # especially when on Windows or WSL which has the "shared VRAM" mechanism.
    # If you meet problems related to it, you should disable it.
    config.enable_cuda_graph = True

    model = compile(model, config)
    return model


def main():
    prompt = "a rendering of a living room with a couch and a tv"
    negative_prompt = "ugly,logo,pixelated,lowres,text,word,cropped,low quality,normal quality,username,watermark,signature,blurry,soft,NSFW,painting,cartoon,hang,occluded objects,Fisheye View"

    model = load_model()
    model = compile_model(model)

    kwarg_inputs = dict(
        prompt=prompt,
        negative_prompt=negative_prompt,
        width=768,
        height=512,
        num_inference_steps=7,
        num_images_per_prompt=1,
        guidance_scale=1.5,
    )

    # NOTE: Warm it up.
    # The initial calls will trigger compilation and might be very slow.
    # After that, it should be very fast.
    for _ in range(3):
        output_image = model(**kwarg_inputs).images[0]

    # Let's see it!
    # Note: Progress bar might work incorrectly due to the async nature of CUDA.

    img_total = []
    for i in range(2):
        output_image = model(
            prompt=prompt,
            negative_prompt=negative_prompt,
            width=768,
            height=512,
            num_inference_steps=7,
            num_images_per_prompt=6,
            # generator=generators
        ).images

        img_row = []
        for img in output_image:
            img_row.append(np.asarray(img))
        img = np.hstack(img_row)
        img_total.append(img)
    image = np.vstack(img_total)
    # cv2.putText(image,prompt,(40,50),cv2.FONT_HERSHEY_SIMPLEX,2,(0,0,255),3)

    image = Image.fromarray(image)
    image.save("./output_lcm.png")


if __name__ == '__main__':
    main()
@chengzeyi
Copy link
Owner

@ldtgodlike If I don't call compile_model in your script, the output image is still broken. Seems like other things are wrong.

@chengzeyi
Copy link
Owner

chengzeyi commented Dec 6, 2023

@ldtgodlike You need to disable guidance scale while using lcm lora

# disable guidance_scale by passing 0
image = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0).images[0]

https://huggingface.co/latent-consistency/lcm-lora-sdv1-5

@ldtgodlike
Copy link
Author

@chengzeyi It works, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants