Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce the results of speedup #57

Open
649459021 opened this issue Mar 13, 2024 · 3 comments
Open

Cannot reproduce the results of speedup #57

649459021 opened this issue Mar 13, 2024 · 3 comments

Comments

@649459021
Copy link

I implemented a script to test the acceleration by myself and tested it on 3090GPU, but it could not achieve the acceleration effect of Table 4 in the paper.

import torch, tomesd, random, time
import numpy as np
from diffusers import StableDiffusionPipeline, DDIMScheduler, PNDMScheduler 

def seed_everything(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    random.seed(seed)
    np.random.seed(seed)

seed = 2024
seed_everything(seed)
batch_size = 1
num_inference_steps = 50
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
# pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.scheduler = PNDMScheduler.from_config(pipe.scheduler.config)

prompt = "a photo of an astronaut riding a horse on mars"
generator = torch.Generator().manual_seed(seed)

# warmup gpu for testing time
print('Warm up of the gpu')
for i in range(2):
    image = pipe([prompt] * batch_size, num_inference_steps=num_inference_steps)
#-------------------


# origin pipeline
start_time = time.time()
image = pipe([prompt] * batch_size, num_inference_steps=num_inference_steps, generator=generator).images[0] 
end_time = time.time()
print("Origin Pipeline: {:.3f} seconds".format(end_time-start_time))
image.save("results/orgin.png")

# Apply ToMe with a 50% merging ratio
generator = torch.Generator().manual_seed(seed)
tomesd.apply_patch(pipe, ratio=0.5) # Can also use pipe.unet in place of pipe here

start_time = time.time()
image = pipe([prompt] * batch_size, num_inference_steps=num_inference_steps, generator=generator).images[0] 
end_time = time.time()
print("ToMe: {:.3f} seconds".format(end_time-start_time))
image.save("results/orgin_ToMe.png")

The output from this script is:

Origin Pipeline: 2.728 seconds
ToMe: 2.581 seconds

In Table 4, the speedup is almost doubled when r=50. Why is this the case in my test?
Could you give me some advice?

@dbolya
Copy link
Owner

dbolya commented Mar 13, 2024

  1. You might not be giving your GPU enough work. Try increasing the batch size and image size and see what happens.
  2. The results in the paper were done using the original stable diffusion repo, not diffusers. Diffusers itself has implemented a bunch of different optimizations that eat into the speed-up of ToMe, but you should still see gains at bigger image sizes.

@DiosMuerto
Copy link

Does this still work in 2024? with the xl or Lightning models?

@Ting-Justin-Jiang
Copy link

Tested on XL, and accidentally found that merging middle blocks (1024 dimensions) give much better results (although no big speedups)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants